協同過濾常被應用于推薦系統,這些技術旨在補充用戶-商品關聯矩陣中所缺失的部分。
<br/>
MLlib 當前支持基于模型的協同過濾,其中用戶和商品通過一小組隱語義因子進行表達,并且這些因子也用于預測缺失的元素。
<br/>
**1. 協同過濾**
(1)數據`$SPARK_HOME/data/mllib/als/test.data`
```txt
用戶id,商品id,評分
1,1,5.0
1,2,1.0
1,3,5.0
1,4,1.0
2,1,5.0
```
(2)代碼
```scala
import org.apache.spark.mllib.recommendation.{ALS, MatrixFactorizationModel, Rating}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object ALSAlgorithm {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]")
.setAppName(this.getClass.getName)
val sc: SparkContext = SparkContext.getOrCreate(conf)
val data: RDD[String] = sc.textFile("F:/mllib/test.data")
val ratings: RDD[Rating] = data.map(_.split(",") match {
case Array(user, item, rate) => Rating(user.toInt, item.toInt, rate.toDouble)
})
// 迭代次數
val numIterations = 20
// 訓練模型
val model: MatrixFactorizationModel = ALS.train(ratings, 1, numIterations, 0.01)
// 可以將模型保存
// model.save(sc , "F:/mllib/model/als")
// 加載模型
// val model:MatrixFactorizationModel = MatrixFactorizationModel.load(sc, "F:/mllib/model/als")
val usersProducts = ratings.map { case Rating(user, product, rate) => Tuple2(user, product) }
// 模型預測
val predictions: RDD[Tuple2[Tuple2[Int, Int], Double]] = model.predict(usersProducts).map {
case Rating(user, product, rate) => ((user, product), rate)
}
val ratesAndPreds: RDD[Tuple2[Tuple2[Int, Int], Tuple2[Double, Double]]] = ratings.map {
case Rating(user, product, rate) => ((user, product), rate)
}.join(predictions)
// 通過計算預測出的評分的均方差來評估這個推薦模型
val MSE = ratesAndPreds.map {
case ((user, product), (r1, r2)) => math.pow(r1 - r2, 2)
}.reduce(_ + _) / ratesAndPreds.count()
println(MSE) // 4.000268960412628
}
}
```