例子 · Spark 編程指南簡體中文版

# 例子假定我們想從一些文本文件中構建一個圖，限制這個圖包含重要的關系和用戶，并且在子圖上運行page-rank，最后返回與top用戶相關的屬性。可以通過如下方式實現。 ```scala // Connect to the Spark cluster val sc = new SparkContext("spark://master.amplab.org", "research") // Load my user data and parse into tuples of user id and attribute list val users = (sc.textFile("graphx/data/users.txt") .map(line => line.split(",")).map( parts => (parts.head.toLong, parts.tail) )) // Parse the edge data which is already in userId -> userId format val followerGraph = GraphLoader.edgeListFile(sc, "graphx/data/followers.txt") // Attach the user attributes val graph = followerGraph.outerJoinVertices(users) { case (uid, deg, Some(attrList)) => attrList // Some users may not have attributes so we set them as empty case (uid, deg, None) => Array.empty[String] } // Restrict the graph to users with usernames and names val subgraph = graph.subgraph(vpred = (vid, attr) => attr.size == 2) // Compute the PageRank val pagerankGraph = subgraph.pageRank(0.001) // Get the attributes of the top pagerank users val userInfoWithPageRank = subgraph.outerJoinVertices(pagerankGraph.vertices) { case (uid, attrList, Some(pr)) => (pr, attrList.toList) case (uid, attrList, None) => (0.0, attrList.toList) } println(userInfoWithPageRank.vertices.top(5)(Ordering.by(_._2._1)).mkString("\n")) ```