<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                企業??AI智能體構建引擎,智能編排和調試,一鍵部署,支持知識庫和私有化部署方案 廣告
                * Parquet文件:是一種流行的列式存儲格式,以二進制存儲,文件中包含數據與元數。 * Parquet是Spark默認的存儲格式。 ```scala import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Row, SparkSession} import org.apache.spark.sql.types.{ArrayType, IntegerType, StringType, StructField, StructType} object ParquetSource { def main(args: Array[String]): Unit = { val spark:SparkSession = SparkSession.builder() .master("local[4]") .appName(this.getClass.getName) .getOrCreate() val sc:SparkContext = spark.sparkContext import spark.implicits._ /***** Spark SQL 寫 parquet文件 *****/ // 1. 定義數據結構(Schema) val schema = StructType(Array( StructField("name", StringType), StructField("favorite_color", StringType), StructField("favorite_numbers", ArrayType(IntegerType)) )) // 2. 創建DataFrame val rdd = sc.parallelize(List(("Alyssa", null, Array(3, 9, 15, 20)), ("Ben", "red", null))) val rowRDD:RDD[Row] = rdd.map(p=>Row(p._1, p._2, p._3)) val df1 = spark.createDataFrame(rowRDD, schema) // 3. 將數據寫入.parquet文件中 // 文件已經存在則報錯 // 在E:\hadoop\output\目錄下會創建多個類似于 // part-00003-3c959916-6a4a-410a-bca9-e1bd56953107.c000.snappy.parquet的文件 df1.write.partitionBy("name").parquet("file:///E:\\hadoop\\output") // 也可以根據某一個字段指定寫入到哪個分區 // df1.write.partitionBy("name").parquet("file:///E:\\hadoop\\output") /***** Spark SQL 讀 parquet文件 *****/ val df2 = spark.read.parquet("file:///E:\\hadoop\\output") df2.show() // +--------------+----------------+------+ // |favorite_color|favorite_numbers| name| // +--------------+----------------+------+ // | null| [3, 9, 15, 20]|Alyssa| // | red| null| Ben| // +--------------+----------------+------+ df2.printSchema() // root // |-- favorite_color: string (nullable = true) // |-- favorite_numbers: array (nullable = true) // | |-- element: integer (containsNull = true) // |-- name: string (nullable = true) } } ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看