<ruby id="bdb3f"></ruby>

    <p id="bdb3f"><cite id="bdb3f"></cite></p>

      <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
        <p id="bdb3f"><cite id="bdb3f"></cite></p>

          <pre id="bdb3f"></pre>
          <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

          <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
          <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

          <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                <ruby id="bdb3f"></ruby>

                合規國際互聯網加速 OSASE為企業客戶提供高速穩定SD-WAN國際加速解決方案。 廣告
                SVD(Singular Value Decomposition)奇異值分解,可以用來簡化數據,去除噪聲,提高算法的結果。 ###一、SVD與推薦系統 下圖由餐館的菜和品菜師對這些菜的意見組成,品菜師可以采用1到5之間的任意一個整數來對菜評級,如果品菜師沒有嘗過某道菜,則評級為0 ![](https://box.kancloud.cn/2016-02-25_56ceab81a66e3.jpg) 建立一個新文件svdRec.py并加入如下代碼: ~~~ def loadExData(): return[[0, 0, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1], [1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]] ~~~ ~~~ >>> import svdRec >>> Data=svdRec.loadExData() >>> Data [[0, 0, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1], [1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [5, 5, 5, 0, 0], [1, 1, 1, 0, 0]] >>> U,Sigma,VT=linalg.svd(Data) >>> Sigma array([ 9.64365076e+00, 5.29150262e+00, 8.05799147e-16, 2.43883353e-16, 2.07518106e-17]) ~~~ 我們可以發現得到的特征值,前兩個比其他的值大很多,所以可以將最后三個值去掉,因為他們的影響很小。 可以看出上圖中前三個人,喜歡烤牛肉和手撕豬肉,這些菜都是美式燒烤餐館才有的菜,這兩個特征值可以分別對應到美食BBQ和日式食品兩類食品上,所以可以認為這三個人屬于一類用戶,下面四個人屬于一類用戶,這樣推薦就很簡單了。 建立一個新文件svdRec.py并加入如下代碼: ~~~ def loadExData(): return[[1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [1, 1, 1, 0, 0], [5, 5, 5, 0, 0], [1, 1, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1]] ~~~ SVD分解: ~~~ >>> reload(svdRec) <module 'svdRec' from 'svdRec.py'> >>> Data=svdRec.loadExData() >>> Data [[1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [1, 1, 1, 0, 0], [5, 5, 5, 0, 0], [1, 1, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1]] >>> U,Sigma,VT=linalg.svd(Data) >>> Sigma array([ 9.72140007e+00, 5.29397912e+00, 6.84226362e-01, 1.67441533e-15, 3.39639411e-16]) ~~~ 我們可以發現得到的特征值,前3個比其他的值大很多,所以可以將最后2個值去掉,因為他們的影響很小。 ![](https://box.kancloud.cn/2016-02-25_56ceab81be07b.jpg) ![](https://box.kancloud.cn/2016-02-25_56ceab81e7cf7.jpg) ![](https://box.kancloud.cn/2016-02-25_56ceab8205c63.jpg) 上面例子就可以將原始數據用如下結果近似: ![](https://box.kancloud.cn/2016-02-25_56ceab821891e.jpg) ###二、基于協同過濾的推薦引擎 協同過濾(collaborative filtering)是通過將用戶與其他用戶的數據進行對比來實現推薦的。 1.相似度計算 ![](https://box.kancloud.cn/2016-02-25_56ceab82322cd.jpg) ~~~ from numpy import * from numpy import linalg as la def eulidSim(inA,inB): return 1.0/(1.0+la.norm(inA,inB)) def pearsSim(inA,inB): if len(inA<3):return 1.0 return 0.5+0.5*corrcoef(inA,inB,rowvar=0)[0][1] def cosSim(inA,inB): num=float(inA.T*inB) denom=la.norm(inA)*la.norm(inB) return 0.5+0.5*(num/denom) ~~~ 2.基于物品的相似度與基于用戶的相似度 當用戶數目很多時,采用基于物品的相似度計算方法更好。 3.示例:基于物品相似度的餐館菜肴推薦引擎 ![](https://box.kancloud.cn/2016-02-25_56ceab825f61a.jpg) ~~~ from numpy import * from numpy import linalg as la def loadExData(): return[[1, 1, 1, 0, 0], [2, 2, 2, 0, 0], [1, 1, 1, 0, 0], [5, 5, 5, 0, 0], [1, 1, 0, 2, 2], [0, 0, 0, 3, 3], [0, 0, 0, 1, 1]] def loadExData2(): return[[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5], [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3], [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0], [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0], [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0], [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0], [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1], [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4], [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2], [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0], [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]] def ecludSim(inA,inB): return 1.0/(1.0 + la.norm(inA - inB)) def pearsSim(inA,inB): if len(inA) < 3 : return 1.0 return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1] def cosSim(inA,inB): num = float(inA.T*inB) denom = la.norm(inA)*la.norm(inB) return 0.5+0.5*(num/denom) #計算在給定相似度計算方法的條件下,用戶對物品的估計評分值 #standEst()函數中:參數dataMat表示數據矩陣,user表示用戶編號,simMeas表示相似度計算方法,item表示物品編號 def standEst(dataMat,user,simMeas,item): n=shape(dataMat)[1] #shape用于求矩陣的行列 simTotal=0.0; ratSimTotal=0.0 for j in range(n): userRating=dataMat[user,j] if userRating==0:continue #若某個物品評分值為0,表示用戶未對物品評分,則跳過,繼續遍歷下一個物品 #尋找兩個用戶都評分的物品 overLap=nonzero(logical_and(dataMat[:,item].A>0,dataMat[:,j].A>0))[0] if len(overLap)==0:similarity=0 else: similarity=simMeas(dataMat[overLap,item],dataMat[overLap,j]) #print'the %d and%d similarity is: %f' %(item,j,similarity) simTotal+=similarity ratSimTotal+=similarity*userRating if simTotal==0: return 0 else: return ratSimTotal/simTotal def recommend(dataMat,user,N=3,simMeas=cosSim,estMethod=standEst): #尋找未評級的物品 unratedItems=nonzero(dataMat[user,:].A==0)[1] if len(unratedItems)==0: return 'you rated everything' itemScores=[] for item in unratedItems: estimatedScore=estMethod(dataMat,user,simMeas,item) #對每一個未評分物品,調用standEst()來產生該物品的預測得分 itemScores.append((item,estimatedScore)) #該物品的編號和估計得分值放入一個元素列表itemScores中 #對itemScores進行從大到小排序,返回前N個未評分物品 return sorted(itemScores,key=lambda jj:jj[1],reverse=True)[:N] def svdEst(dataMat, user, simMeas, item): n = shape(dataMat)[1] simTotal = 0.0; ratSimTotal = 0.0 U,Sigma,VT = la.svd(dataMat) Sig4 = mat(eye(4)*Sigma[:4]) #arrange Sig4 into a diagonal matrix xformedItems = dataMat.T * U[:,:4] * Sig4.I #create transformed items for j in range(n): userRating = dataMat[user,j] if userRating == 0 or j==item: continue similarity = simMeas(xformedItems[item,:].T,\ xformedItems[j,:].T) print 'the %d and %d similarity is: %f' % (item, j, similarity) simTotal += similarity ratSimTotal += similarity * userRating if simTotal == 0: return 0 else: return ratSimTotal/simTotal ~~~ 其中dataMat[:,item].A,表示找出item列,因為是matrix,用.A轉成array,logical_and,其實就是找出最item列和j列都>0,只有都大于0才會是true,nonzero會給出其中不為0的index。 進行SVD分解: ~~~ >>>from numpy import linalg as la >>> U,Sigma,VT=la.svd(mat(svdRec.loadExData2())) >>> Sigma array([ 1.38487021e+01, 1.15944583e+01, 1.10219767e+01, 5.31737732e+00, 4.55477815e+00, 2.69935136e+00, 1.53799905e+00, 6.46087828e-01, 4.45444850e-01, 9.86019201e-02, 9.96558169e-17]) ~~~ 如何決定r?有個定量的方法是看多少個奇異值可以達到90%的能量,其實和PCA一樣,由于奇異值其實是等于data×dataT特征值的平方根,所以總能量就是特征值的和 ~~~ >>> Sig2=Sigma**2 >>> sum(Sig2) 541.99999999999932 ~~~ 而取到前4個時,發現總能量大于90%,因此r=4 ~~~ >>> sum(Sig2[:3]) 500.50028912757909 ~~~ SVD分解的關鍵在于,降低了user的維度,從n變到了4 ~~~ def svdEst(dataMat, user, simMeas, item): n = shape(dataMat)[1] simTotal = 0.0; ratSimTotal = 0.0 U,Sigma,VT = la.svd(dataMat) Sig4 = mat(eye(4)*Sigma[:4]) #arrange Sig4 into a diagonal matrix xformedItems = dataMat.T * U[:,:4] * Sig4.I #create transformed items for j in range(n): userRating = dataMat[user,j] if userRating == 0 or j==item: continue similarity = simMeas(xformedItems[item,:].T,\ xformedItems[j,:].T) print 'the %d and %d similarity is: %f' % (item, j, similarity) simTotal += similarity ratSimTotal += similarity * userRating if simTotal == 0: return 0 else: return ratSimTotal/simTotal ~~~ 其中關鍵一步,dataMat.T * U[:,:4] * Sig4.I 將m×n的dataMat用特征值縮放轉換為n×4的item和user類的矩陣 ~~~ >>> myMat=mat(svdRec.loadExData2()) >>> myMat matrix([[0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 5], [0, 0, 0, 3, 0, 4, 0, 0, 0, 0, 3], [0, 0, 0, 0, 4, 0, 0, 1, 0, 4, 0], [3, 3, 4, 0, 0, 0, 0, 2, 2, 0, 0], [5, 4, 5, 0, 0, 0, 0, 5, 5, 0, 0], [0, 0, 0, 0, 5, 0, 1, 0, 0, 5, 0], [4, 3, 4, 0, 0, 0, 0, 5, 5, 0, 1], [0, 0, 0, 4, 0, 4, 0, 0, 0, 0, 4], [0, 0, 0, 2, 0, 2, 5, 0, 0, 1, 2], [0, 0, 0, 0, 5, 0, 0, 0, 0, 4, 0], [1, 0, 0, 0, 0, 0, 0, 1, 2, 0, 0]]) >>> svdRec.recommend(myMat,1,estMethod=svdRec.svdEst) the 0 and 3 similarity is: 0.490950 the 0 and 5 similarity is: 0.484274 the 0 and 10 similarity is: 0.512755 the 1 and 3 similarity is: 0.491294 the 1 and 5 similarity is: 0.481516 the 1 and 10 similarity is: 0.509709 the 2 and 3 similarity is: 0.491573 the 2 and 5 similarity is: 0.482346 the 2 and 10 similarity is: 0.510584 the 4 and 3 similarity is: 0.450495 the 4 and 5 similarity is: 0.506795 the 4 and 10 similarity is: 0.512896 the 6 and 3 similarity is: 0.743699 the 6 and 5 similarity is: 0.468366 the 6 and 10 similarity is: 0.439465 the 7 and 3 similarity is: 0.482175 the 7 and 5 similarity is: 0.494716 the 7 and 10 similarity is: 0.524970 the 8 and 3 similarity is: 0.491307 the 8 and 5 similarity is: 0.491228 the 8 and 10 similarity is: 0.520290 the 9 and 3 similarity is: 0.522379 the 9 and 5 similarity is: 0.496130 the 9 and 10 similarity is: 0.493617 [(4, 3.3447149384692283), (7, 3.3294020724526967), (9, 3.328100876390069)] ~~~
                  <ruby id="bdb3f"></ruby>

                  <p id="bdb3f"><cite id="bdb3f"></cite></p>

                    <p id="bdb3f"><cite id="bdb3f"><th id="bdb3f"></th></cite></p><p id="bdb3f"></p>
                      <p id="bdb3f"><cite id="bdb3f"></cite></p>

                        <pre id="bdb3f"></pre>
                        <pre id="bdb3f"><del id="bdb3f"><thead id="bdb3f"></thead></del></pre>

                        <ruby id="bdb3f"><mark id="bdb3f"></mark></ruby><ruby id="bdb3f"></ruby>
                        <pre id="bdb3f"><pre id="bdb3f"><mark id="bdb3f"></mark></pre></pre><output id="bdb3f"></output><p id="bdb3f"></p><p id="bdb3f"></p>

                        <pre id="bdb3f"><del id="bdb3f"><progress id="bdb3f"></progress></del></pre>

                              <ruby id="bdb3f"></ruby>

                              哎呀哎呀视频在线观看