<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                Scala中在`class`關鍵字前加上`case`關鍵字,這個類就成為了樣例類,樣例類和普通類區別: * 不需要new可以直接生成對象。 * 默認實現序列化接口。 * 默認自動覆蓋 toString()、equals()、hashCode() 。 **1. 直接通過case class創建Dataset** ```scala import org.apache.spark.SparkContext import org.apache.spark.sql.{Dataset, SparkSession} object CreateDataSetByCaseClass { case class Point(label:String, x:Double, y:Double) case class Category(id:Long, name:String) def main(args: Array[String]): Unit = { val spark:SparkSession = SparkSession.builder() .master("local[4]") .appName(this.getClass.getName) .getOrCreate() val sc:SparkContext = spark.sparkContext import spark.implicits._ // 通過Point創建一個Dataset val points:Dataset[Point] = Seq(Point("bar", 2.6, 3.5), Point("foo", 4.0, 3.7)).toDS() // 通過Category創建一個Dataset val categories:Dataset[Category] = Seq(Category(1, "bar"), Category(2, "foo")).toDS() // 進行join連接 val joins = points.join(categories, points("label") === categories("name")) points.show() // +-----+---+---+ // |label| x| y| // +-----+---+---+ // | bar|2.6|3.5| // | foo|4.0|3.7| // +-----+---+---+ categories.show() // +---+----+ // | id|name| // +---+----+ // | 1| bar| // | 2| foo| // +---+----+ joins.show() // +-----+---+---+---+----+ // |label| x| y| id|name| // +-----+---+---+---+----+ // | bar|2.6|3.5| 1| bar| // | foo|4.0|3.7| 2| foo| // +-----+---+---+---+----+ } } ``` <br/> **2. 在開發中的常用寫法是先創建RDD,然后RDD與case class進行關聯來創建Dataset.** ```scala import org.apache.spark.SparkContext import org.apache.spark.rdd.RDD import org.apache.spark.sql.{Dataset, SparkSession} object CreateDataSetByCaseClass { case class Point(label:String, x:Double, y:Double) case class Category(id:Long, name:String) def main(args: Array[String]): Unit = { val spark:SparkSession = SparkSession.builder() .master("local[4]") .appName(this.getClass.getName) .getOrCreate() val sc:SparkContext = spark.sparkContext import spark.implicits._ // 先創建RDD val pointsRdd:RDD[(String, Double, Double)] = sc.parallelize(List(("bar",2.6,3.5),("foo",4.0,3.7))) val categoriesRdd: RDD[(Int, String)] = sc.parallelize(List((1,"bar"),(2,"foo"))) // 兩個RDD和樣例類進行關聯 val pointsDs:Dataset[Point] = pointsRdd.map(x=>Point(x._1, x._2, x._3)).toDS() val categoriesDs:Dataset[Category] = categoriesRdd.map(x=>Category(x._1, x._2)).toDS() // 進行join val joinDs = pointsDs.join(categoriesDs, pointsDs("label") === categoriesDs("name")) pointsDs.show() categoriesDs.show() joinDs.show() } } ```
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看