圖構造者 · Spark 編程指南簡體中文版

# 圖構造者 GraphX提供了幾種方式從RDD或者磁盤上的頂點和邊集合構造圖。默認情況下，沒有哪個圖構造者為圖的邊重新分區，而是把邊保留在默認的分區中（例如HDFS中它們的原始塊）。[Graph.groupEdges](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph@groupEdges((ED,ED)?ED):Graph[VD,ED])需要重新分區圖，因為它假定相同的邊將會被分配到同一個分區，所以你必須在調用groupEdges之前調用[Graph.partitionBy](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph@partitionBy(PartitionStrategy):Graph[VD,ED]) ~~~ object GraphLoader { def edgeListFile( sc: SparkContext, path: String, canonicalOrientation: Boolean = false, minEdgePartitions: Int = 1) : Graph[Int, Int] } ~~~ [GraphLoader.edgeListFile](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.GraphLoader$@edgeListFile(SparkContext,String,Boolean,Int):Graph[Int,Int])提供了一個方式從磁盤上的邊列表中加載一個圖。它解析如下形式（源頂點ID，目標頂點ID）的連接表，跳過以`#`開頭的注釋行。 ~~~ # This is a comment 2 1 4 1 1 2 ~~~ 它從指定的邊創建一個圖，自動地創建邊提及的所有頂點。所有的頂點和邊的屬性默認都是1。`canonicalOrientation`參數允許重定向正方向(srcId < dstId)的邊。這在[connected components](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.lib.ConnectedComponents$)算法中需要用到。`minEdgePartitions`參數指定生成的邊分區的最少數量。邊分區可能比指定的分區更多，例如，一個HDFS文件包含更多的塊。 ~~~ object Graph { def apply[VD, ED]( vertices: RDD[(VertexId, VD)], edges: RDD[Edge[ED]], defaultVertexAttr: VD = null) : Graph[VD, ED] def fromEdges[VD, ED]( edges: RDD[Edge[ED]], defaultValue: VD): Graph[VD, ED] def fromEdgeTuples[VD]( rawEdges: RDD[(VertexId, VertexId)], defaultValue: VD, uniqueEdges: Option[PartitionStrategy] = None): Graph[VD, Int] } ~~~ [Graph.apply](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph$@apply[VD,ED](RDD[(VertexId,VD)],RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED])允許從頂點和邊的RDD上創建一個圖。重復的頂點可以任意的選擇其中一個，在邊RDD中而不是在頂點RDD中發現的頂點分配默認的屬性。 [Graph.fromEdges](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdges[VD,ED](RDD[Edge[ED]],VD)(ClassTag[VD],ClassTag[ED]):Graph[VD,ED])允許僅僅從一個邊RDD上創建一個圖，它自動地創建邊提及的頂點，并分配這些頂點默認的值。 [Graph.fromEdgeTuples](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.Graph$@fromEdgeTuples[VD](RDD[(VertexId,VertexId)],VD,Option[PartitionStrategy])(ClassTag[VD]):Graph[VD,Int])允許僅僅從一個邊元組組成的RDD上創建一個圖。分配給邊的值為1。它自動地創建邊提及的頂點，并分配這些頂點默認的值。它還支持刪除邊。為了刪除邊，需要傳遞一個[PartitionStrategy](https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.graphx.PartitionStrategy)為值的`Some`作為`uniqueEdges`參數（如uniqueEdges = Some(PartitionStrategy.RandomVertexCut)）。分配相同的邊到同一個分區從而使它們可以被刪除，一個分區策略是必須的。