RDD創建DataFrame · Hadoop2.x

（1）示例數據：`people.txt` ```txt Michael,29 Andy,30 Justin,19 ``` （2）示例代碼 ```scala import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} import org.apache.spark.sql.{DataFrame, Row, SparkSession} object RDDtoDataFrame { case class People(name:String, age:Int) def main(args: Array[String]): Unit = { val spark:SparkSession = SparkSession.builder() .master("local[4]") .appName(this.getClass.getName) .getOrCreate() val sc:SparkContext = spark.sparkContext import spark.implicits._ /***** 方式1：將RDD切割，然后關聯case class，最后轉換成DataFrame *****/ val peopleRDD:RDD[String] = sc.textFile("file:///E:\\hadoop\\input\\people.txt") // 對RDD切割并關聯到case class val peopleDF:DataFrame = peopleRDD .map(_.split(",")) .map(x=>People(x(0), x(1).toInt)) .toDF() peopleDF.show() // +-------+---+ // | name|age| // +-------+---+ // |Michael| 29| // | Andy| 30| // | Justin| 19| // +-------+---+ // 創建臨時表 peopleDF.createOrReplaceTempView("people") spark.sql("select * from people where name='Andy'").show() // +----+---+ // |name|age| // +----+---+ // |Andy| 30| // +----+---+ /***** 方式2：將RDD通過和Schema信息關聯, 得到DataFrame *****/ // 1. 通過StructType構建Schema // StructFile(字段名, 字段類型, 字段的值是否可以為null)，默認為true可以為null val schema = StructType(Array( StructField("name", StringType, true), StructField("age", IntegerType, true) )) // 2. 將每行字符串切割,切割成Array, 然后將其轉化為RDD[Row]類型 val peopleRowRDD:RDD[Row] = peopleRDD .map(_.split(",")) .map(x=>Row(x(0), x(1).toInt)) // 3. 將Row類型的RDD和Schema信息關聯, 創建一個DataFrame val df:DataFrame = spark.createDataFrame(peopleRowRDD, schema) df.createOrReplaceTempView("people2") spark.sql("select * from people2").show() // +-------+---+ // | name|age| // +-------+---+ // |Michael| 29| // | Andy| 30| // | Justin| 19| // +-------+---+ } } ```