<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                ## 創建外部表 ```sql create external table track_info( ip string, country string, province string, city string, url string, time string, page string ) partitioned by (day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' location '/project/trackinfo/'; ``` > location后面的路徑指的是hdfs上面的路徑。 * 創建數據輸入文件夾 ``` hdfs dfs -mkdir -p /project/input/raw/ ``` * 數據上傳到hdfs ``` hdfs dfs -put track.data /project/input/raw/ ``` * 現在要用之前的項目中ETLApp類進行數據清洗。 * 先把類中固定的地址設置為運行參數 ~~~ Path outputPath = new Path(args[1]); FileInputFormat.setInputPaths(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); ~~~ * 還有一個地方,由于使用了ip解析庫,所以先要把ip庫文件上傳,然后更改代碼里面的地址。 ![](https://img.kancloud.cn/ad/72/ad724575e5b4d361d12a2b7f91521fb2_733x171.png) * 然后打包,注意不要使用idea命令行,會提示找不到mvn。 ``` wangyijiadeMacBook-Air:~ bizzbee$ cd /Users/bizzbee/IdeaProjects/hadooptrainv2 wangyijiadeMacBook-Air:hadooptrainv2 bizzbee$ mvn clean package -DskipTests ``` * 然后上傳服務器 ``` wangyijiadeMacBook-Air:target bizzbee$ scp hadoop-train-v2-1.0-SNAPSHOT.jar bizzbee@192.168.31.249:~/work/ ``` * 執行(參數分別是輸入輸出路徑) ``` [bizzbee@bizzbee ~]$ hadoop jar /home/bizzbee/work/hadoop-train-v2-1.0-SNAPSHOT.jar com.bizzbee.bigdata.hadoop.mr.project.mr.ETLApp hdfs://bizzbee:8020/project/input/raw/track.data hdfs://bizzbee:8020/project/output/result ``` *可以查看到執行成功了 ``` http://192.168.31.249:50070/explorer.html#/project/output/result ``` * 然后把完成etl的數據加載到track_info表中。 ``` hive> LOAD DATA INPATH 'hdfs://bizzbee:8020/project/output/result' OVERWRITE INTO TABLE track_info partition(day='2019-11-12'); ``` * 在hive中使用sql統計每個省的流量。 ``` select province,count(*) as cnt from track_info where day='2019-11-12' group by province ; ``` ![](https://img.kancloud.cn/30/de/30de37e088213f317acb5711dbcdafc9_182x308.png) * 創建每個沈的統計數據表。 ```sql create table track_info_province_stat( province string, cnt bigint ) partitioned by (day string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; ``` * 把之前查詢的結果作為數據導入省份表。 ```sql insert overwrite table track_info_province_stat partition(day='2019-11-12') select province,count(*) as cnt from track_info where day='2019-11-12' group by province ; ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看