創建分桶表 · Hadoop2.x

**分桶表：** 作為分區表的一種優化手段，可以進一步提高查詢效率和提供抽樣效率。 **分區表：** 提供了一個隔離數據和優化查詢的便利方式，不過并非所有的數據都可形成合理的分區，尤其是需要確定合適大小的分區劃分方式(有的數據分區數據過大，有的很少，即我們常說的數據傾斜。 <br/> **分桶規則：** 按照分桶字段值的 hash 值除以分桶的個數進行取余 `$ bucketId=column.hashcode $` % `$ bucket.num $` 對數據進行分桶。 <br/> **開啟Hive的分桶開關：** ```sql -- 在當前會話中設置 -- 分桶只有動態分桶，所以只能在當前會話中設置 0: jdbc:hive2://hadoop101:10000> set hive.enforce.bucketing=true; ``` 此開關打開之后，會自動根據 bucket 個數自動分配 Reduce task 的個數，Reduce 個數與 bucket 個數一致。此外， Reduce 的個數還可以通過`mapred.reduce.tasks`進行設置，但是這方法不推薦在 Hive 分桶中使用。 ```sql 0: jdbc:hive2://hadoop101:10000> set mapred.reduce.tasks=4; ``` <br/> **創建分桶表：** * 分桶字段只能是建表中已有的字段； * 使用分桶擁有更高的查詢處理效率； * 使用分桶使抽樣更高效； * 分桶數最好是 `$ 2^n $`； * 分桶表只能是`insert`方式加載數據； (1) 示例數據 `peo_bucket.txt` ```xml -- 示例數據 peo_bucket.txt 10,ACCOUNTING,1700 20,RESEARCH,1800 30,SALES,1900 40,OPERATIONS,1700 ``` (2) 創建非分桶表，并加載數據 ```sql create table if not exists peo( id int, name string, age int ) row format delimited fields terminated by ','; load data local inpath "/hdatas/peo_bucket.txt" into table peo; 0: jdbc:hive2://hadoop101:10000> select * from peo; +---------+-------------+----------+--+ | peo.id | peo.name | peo.age | +---------+-------------+----------+--+ | 10 | ACCOUNTING | 1700 | | 20 | RESEARCH | 1800 | | 30 | SALES | 1900 | | 40 | OPERATIONS | 1700 | +---------+-------------+----------+--+ ``` (3) 創建分桶表并insert peo表的數據 ```sql create table bucket3 ( bid int, bname string, bage int ) -- 創建了4個分桶 clustered by(bage) into 4 buckets row format delimited fields terminated by ','; insert into table bucket3 select id, name, age from peo; 0: jdbc:hive2://hadoop101:10000> select * from bucket3; +--------------+----------------+---------------+--+ | bucket3.bid | bucket3.bname | bucket3.bage | +--------------+----------------+---------------+--+ | 40 | OPERATIONS | 1700 | | 30 | SALES | 1900 | | 20 | RESEARCH | 1800 | | 10 | ACCOUNTING | 1700 | +--------------+----------------+---------------+--+ ``` 在hdfs存儲的路徑如下，我當前使用的數據庫為hivedb2： ``` hivedb2.db/bucket3/ ``` ![](https://img.kancloud.cn/ff/4d/ff4d87ae131ab62611a35f1efcb8cee6_1503x420.png)