<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                **分桶表:** 作為分區表的一種優化手段,可以進一步提高查詢效率和提供抽樣效率。 **分區表:** 提供了一個隔離數據和優化查詢的便利方式,不過并非所有的數據都可形成合理的分區,尤其是需要確定合適大小的分區劃分方式(有的數據分區數據過大,有的很少,即我們常說的數據傾斜。 <br/> **分桶規則:** 按照分桶字段值的 hash 值除以分桶的個數進行取余 `$ bucketId=column.hashcode $` % `$ bucket.num $` 對數據進行分桶。 <br/> **開啟Hive的分桶開關:** ```sql -- 在當前會話中設置 -- 分桶只有動態分桶,所以只能在當前會話中設置 0: jdbc:hive2://hadoop101:10000> set hive.enforce.bucketing=true; ``` 此開關打開之后,會自動根據 bucket 個數自動分配 Reduce task 的個數,Reduce 個數與 bucket 個數一致。 此外, Reduce 的個數還可以通過`mapred.reduce.tasks`進行設置,但是這方法不推薦在 Hive 分桶中使用。 ```sql 0: jdbc:hive2://hadoop101:10000> set mapred.reduce.tasks=4; ``` <br/> **創建分桶表:** * 分桶字段只能是建表中已有的字段; * 使用分桶擁有更高的查詢處理效率; * 使用分桶使抽樣更高效; * 分桶數最好是 `$ 2^n $`; * 分桶表只能是`insert`方式加載數據; (1) 示例數據 `peo_bucket.txt` ```xml -- 示例數據 peo_bucket.txt 10,ACCOUNTING,1700 20,RESEARCH,1800 30,SALES,1900 40,OPERATIONS,1700 ``` (2) 創建非分桶表,并加載數據 ```sql create table if not exists peo( id int, name string, age int ) row format delimited fields terminated by ','; load data local inpath "/hdatas/peo_bucket.txt" into table peo; 0: jdbc:hive2://hadoop101:10000> select * from peo; +---------+-------------+----------+--+ | peo.id | peo.name | peo.age | +---------+-------------+----------+--+ | 10 | ACCOUNTING | 1700 | | 20 | RESEARCH | 1800 | | 30 | SALES | 1900 | | 40 | OPERATIONS | 1700 | +---------+-------------+----------+--+ ``` (3) 創建分桶表并insert peo表的數據 ```sql create table bucket3 ( bid int, bname string, bage int ) -- 創建了4個分桶 clustered by(bage) into 4 buckets row format delimited fields terminated by ','; insert into table bucket3 select id, name, age from peo; 0: jdbc:hive2://hadoop101:10000> select * from bucket3; +--------------+----------------+---------------+--+ | bucket3.bid | bucket3.bname | bucket3.bage | +--------------+----------------+---------------+--+ | 40 | OPERATIONS | 1700 | | 30 | SALES | 1900 | | 20 | RESEARCH | 1800 | | 10 | ACCOUNTING | 1700 | +--------------+----------------+---------------+--+ ``` 在hdfs存儲的路徑如下,我當前使用的數據庫為hivedb2: ``` hivedb2.db/bucket3/ ``` ![](https://img.kancloud.cn/ff/4d/ff4d87ae131ab62611a35f1efcb8cee6_1503x420.png)
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看