<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                企業??AI智能體構建引擎,智能編排和調試,一鍵部署,支持知識庫和私有化部署方案 廣告
                [TOC] # 1. RC&ORC介紹 **RC:** RC(Record Columnar)由Facebook開源 1、 存儲行集合,并在集合中以列格式存儲行數據 2、 引入輕量級索引,允許跳過不相關的行塊 3、 可分割:允許并行處理行集合 4、 可壓縮<br/> **ORC:** ORC(Optimized Row Columnar),RC優化版。 ![](https://img.kancloud.cn/12/d0/12d0cbdd03ded7497b679d6602d38eaf_628x244.png) 如果初始文件大小為585GB,采用RC存儲后可以壓縮到505GB,采用ORC后可以壓縮到131GB,當然數據是不會丟失的。<br/> **RCFile存儲結構:** 1、 集行存儲與列存儲的優點于一身; 2、 設計思想與Parquet類似,先按行水平切割為多個行組,再對每個行組內的數據按列存儲; :-: ![](https://img.kancloud.cn/df/2c/df2cdd26932b3187e5c2395d9de576b2_550x336.png) RC設計思想 ![](https://img.kancloud.cn/d1/05/d1059322774e4d7fbcda7ce022cebaea_601x649.png) RCFile存儲格式 **Stripe:** >1、 每個ORC文件首先會被橫向切分成多個Stripe; 2、 每個stripe默認的大小是250MB; 3、 每個stripe由多組(Row Groups)行數據組成; **IndexData:** >1、 保存了該stripe上數據的位置,總行數; **RowData:** >1、 以stream的形式保存數據; **Stripe Footer:** >1、 包含該stripe統計結果:Max,Min,count等信息; **FileFooter:** >1、 該表的統計結果; 2、 各個Stripe的位置信息; **Postscript:** >3、 該表的行數,壓縮參數,壓縮大小,列等信息; <br/> # 2. Java讀寫ORCFile 在 *`pom.xml`* 中依然下面的依賴 ```xml <dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-core</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-mapreduce</artifactId> <version>1.5.1</version> </dependency> <dependency> <groupId>org.apache.orc</groupId> <artifactId>orc-tools</artifactId> <version>1.5.1</version> </dependency> ``` Java代碼示例: ```java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hive.ql.exec.vector.ColumnVector; import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector; import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch; import org.apache.orc.*; import java.io.IOException; public class ORCFileOps { private static Configuration conf = new Configuration(); private static String ORCPATH = "/tmp/orcfile.orc"; public static void main(String[] args) throws IOException { write(); read(); } public static void write() throws IOException { // 定義schema TypeDescription schema = TypeDescription.fromString("struct<x:int,y:int>"); // 創建writer Writer writer = OrcFile.createWriter(new Path(ORCPATH), OrcFile.writerOptions(conf).setSchema(schema)); // 寫文件 VectorizedRowBatch batch = schema.createRowBatch(); LongColumnVector x = (LongColumnVector) batch.cols[0]; LongColumnVector y = (LongColumnVector) batch.cols[1]; // 模擬10000行數據 for (int r = 0; r < 10000; ++r) { int row = batch.size++; x.vector[row] = r; y.vector[row] = r * 3; // 默認每個batch為1024行,如果滿了,則新起一個batch. if (batch.size == batch.getMaxSize()) { writer.addRowBatch(batch); batch.reset(); } } if (batch.size != 0) { writer.addRowBatch(batch); batch.reset(); } writer.close(); } public static void read() throws IOException { // 使用OrcFile創建Reader Reader reader = OrcFile.createReader(new Path(ORCPATH), OrcFile.readerOptions(conf)); // 讀取文件 RecordReader rows = reader.rows(); // 獲取schema信息 VectorizedRowBatch batch = reader.getSchema().createRowBatch(); // 輸出 while (rows.nextBatch(batch)) { System.out.println("================華麗的分割線======================"); System.out.println("本批次行數: " + batch.size); // 將orc類型轉化為Java基本類型 ColumnVector[] cols = batch.cols; LongColumnVector vx = (LongColumnVector) cols[0]; LongColumnVector vy = (LongColumnVector) cols[1]; long[] x = vx.vector; long[] y = vy.vector; // 打印 x 和 y for (int i = 0; i < batch.size; i++) { System.out.println(x[i] + ":" + y[i]); } } rows.close(); } } ``` <br/> # 3. 在Hive中使用ORC ```sql create external table user_orc_ext( name string, age int ) stored as orc; 0: jdbc:hive2://hadoop101:10000> select * from user_orc_ext; +--------------------+-------------------+--+ | user_orc_ext.name | user_orc_ext.age | +--------------------+-------------------+--+ +--------------------+-------------------+--+ 0: jdbc:hive2://hadoop101:10000> show create table user_orc_ext; +----------------------------------------------------+--+ | createtab_stmt | +----------------------------------------------------+--+ | CREATE EXTERNAL TABLE `user_orc_ext`( | | `name` string, | | `age` int) | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' | | LOCATION | | 'hdfs://hadoop101:9000/home/hadoop/hive/warehouse/hivebook.db/user_orc_ext' | | TBLPROPERTIES ( | | 'transient_lastDdlTime'='1609156760') | +----------------------------------------------------+--+ ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看