1.1. 廣義線性模型 · sklearn中文文檔

# 1.1. 廣義線性模型校驗者: [@專業吹牛逼的小明](https://github.com/apachecn/scikit-learn-doc-zh) [@Gladiator](https://github.com/apachecn/scikit-learn-doc-zh) 翻譯者: [@瓜牛](https://github.com/apachecn/scikit-learn-doc-zh) [@年紀大了反應慢了](https://github.com/apachecn/scikit-learn-doc-zh) [@Hazekiah](https://github.com/apachecn/scikit-learn-doc-zh) [@BWM-蜜蜂](https://github.com/apachecn/scikit-learn-doc-zh) 下面是一組用于回歸的方法，其中目標期望值 y是輸入變量 x 的線性組合。在數學概念中，如果 ![\hat{y}](https://box.kancloud.cn/277d247a09c0ccb4240fe50a4806934e_9x17.jpg) 是預測值 value. ![\hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p](https://box.kancloud.cn/b94f365d399219835108a2c25659d53c_254x20.jpg) 在整個模塊中，我們定義向量 ![w = (w_1,..., w_p)](https://box.kancloud.cn/9fd7f1fba485530176053a416289f429_121x20.jpg) 作為 `coef_` 定義 ![w_0](https://box.kancloud.cn/6233c879490b6ce96c680688ae6618c0_19x11.jpg) 作為 `intercept_`. 如果需要使用廣義線性模型進行分類，請參閱 [logistic 回歸](#logistic-regression) . [logistic 回歸](#logistic-regression). ## 1.1.1. 普通最小二乘法 [`LinearRegression`](generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression "sklearn.linear_model.LinearRegression") 適合一個帶有系數 ![w = (w_1, ..., w_p)](https://box.kancloud.cn/9fd7f1fba485530176053a416289f429_121x20.jpg) 的線性模型,使得數據集實際觀測數據和預測數據（估計值）之間的殘差平方和最小。其數學表達式為: ![\underset{w}{min\,} {|| X w - y||_2}^2](https://box.kancloud.cn/9c67fff4fa39efc6503a72ec376ed5e7_131x29.jpg) [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_ols_0011.png](https://box.kancloud.cn/befce407d9e45b7dbcedd39546005038_566x424.jpg)](../auto_examples/linear_model/plot_ols.html) [`LinearRegression`](generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression "sklearn.linear_model.LinearRegression") 會調用 `fit` 方法來擬合數組 X, y，并且將線性模型的系數 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 存儲在其成員變量 [``](#id3)coef\_``中: ``` >>> from sklearn import linear_model >>> reg = linear_model.LinearRegression() >>> reg.fit ([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False) >>> reg.coef_ array([ 0.5, 0.5]) ``` 然而，對于普通最小二乘的系數估計問題，其依賴于模型各項的相互獨立性。當各項是相關的，且設計矩陣 ![X](https://box.kancloud.cn/b422495cc9601a331610a3a428f9133b_16x12.jpg) 的各列近似線性相關，那么，設計矩陣會趨向于奇異矩陣，這會導致最小二乘估計對于隨機誤差非常敏感，產生很大的方差。例如，在沒有實驗設計的情況下收集到的數據，這種多重共線性(multicollinearity) 的情況可能真的會出現。舉例: - [Linear Regression Example](../auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py) ### 1.1.1.1. 普通最小二乘法復雜度該方法使用 X 的奇異值分解來計算最小二乘解。如果 X 是一個 size 為 (n, p) 的矩陣，設 ![n \geq p](https://box.kancloud.cn/787f68219bef53baf879b79eca466d4a_44x16.jpg) ，則該方法花費的成本為 ![O(n p^2)](https://box.kancloud.cn/3af4a7fc8e1694d40bbe5de783fb575b_54x19.jpg) ## 1.1.2. 嶺回歸 [`Ridge`](generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge "sklearn.linear_model.Ridge") 回歸通過對系數的大小施加懲罰來解決 [普通最小二乘法](#ordinary-least-squares) (普通最小二乘)的一些問題。嶺系數最小化一個帶罰項的殘差平方和， ![\underset{w}{min\,} {{|| X w - y||_2}^2 + \alpha {||w||_2}^2}](https://box.kancloud.cn/042147b369f9f6818f0a2e05256bdfda_211x29.jpg) 其中， ![\alpha \geq 0](https://box.kancloud.cn/c6b9e9ba9051269011c9af151a5d6dee_45x15.jpg) 是控制收縮量復雜性的參數： ![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg), 的值越大，收縮量越大，因此系數對共線性變得更加魯棒。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_ridge_path_0011.png](https://box.kancloud.cn/5799ccf8aba203b3efc25400a28c8a3c_566x424.jpg)](../auto_examples/linear_model/plot_ridge_path.html) 與其他線性模型一樣， [`Ridge`](generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge "sklearn.linear_model.Ridge") 采用 `fit` 將采用其 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 存儲在其 `coef_` 成員中: ``` >>> from sklearn import linear_model >>> reg = linear_model.Ridge (alpha = .5) >>> reg.fit ([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) Ridge(alpha=0.5, copy_X=True, fit_intercept=True, max_iter=None, normalize=False, random_state=None, solver='auto', tol=0.001) >>> reg.coef_ array([ 0.34545455, 0.34545455]) >>> reg.intercept_ 0.13636... ``` 舉例: - :ref:[`](#id7)sphx\_glr\_auto\_examples\_linear\_model\_plot\_ridge\_path.py`( 作為正則化的函數，繪制嶺系數 ) - :ref:[`](#id9)sphx\_glr\_auto\_examples\_text\_document\_classification\_20newsgroups.py`( 使用稀疏特征的文本文檔分類 ) ### 1.1.2.1. 嶺回歸的復雜度這種方法與 [普通最小二乘法](#ordinary-least-squares) (普通最小二乘方法)的復雜度是相同的. ### 1.1.2.2. 設置正則化參數：廣義交叉驗證 [`RidgeCV`](generated/sklearn.linear_model.RidgeCV.html#sklearn.linear_model.RidgeCV "sklearn.linear_model.RidgeCV") 通過內置的 Alpha 參數的交叉驗證來實現嶺回歸。該對象與 GridSearchCV 的使用方法相同，只是它默認為 Generalized Cross-Validation(廣義交叉驗證 GCV)，這是一種有效的留一驗證方法（LOO-CV）: ``` >>> from sklearn import linear_model >>> reg = linear_model.RidgeCV(alphas=[0.1, 1.0, 10.0]) >>> reg.fit([[0, 0], [0, 0], [1, 1]], [0, .1, 1]) RidgeCV(alphas=[0.1, 1.0, 10.0], cv=None, fit_intercept=True, scoring=None, normalize=False) >>> reg.alpha_ 0.1 ``` 參考 - “Notes on Regularized Least Squares”, Rifkin & Lippert ([technical report](http://cbcl.mit.edu/projects/cbcl/publications/ps/MIT-CSAIL-TR-2007-025.pdf), [course slides](http://www.mit.edu/~9.520/spring07/Classes/rlsslides.pdf)). ## 1.1.3. Lasso The [`Lasso`](generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso "sklearn.linear_model.Lasso") 是估計稀疏系數的線性模型。它在一些情況下是有用的，因為它傾向于使用具有較少參數值的情況，有效地減少給定解決方案所依賴變量的數量。因此，Lasso及其變體是壓縮感知領域的基礎。在一定條件下，它可以恢復一組非零權重的精確集 (見 [Compressive sensing: tomography reconstruction with L1 prior (Lasso)](../auto_examples/applications/plot_tomography_l1_reconstruction.html#sphx-glr-auto-examples-applications-plot-tomography-l1-reconstruction-py)). 在數學上，它由一個線性模型組成，以 ![\ell_1](https://box.kancloud.cn/0fc8b02b257a34b4beb292b164e4bb5f_14x17.jpg) 為準。其目標函數的最小化是: ![\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}](https://box.kancloud.cn/922147496b8ea1964469861ad0932a2f_268x43.jpg) lasso estimate 解決了加上罰項 ![\alpha ||w||_1](https://box.kancloud.cn/30ba077554bd4beb9e386eddc8382aca_51x19.jpg) 的最小二乘法的最小化，其中， ![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg) 是一個常數， ![||w||_1](https://box.kancloud.cn/a49be8ec9eb304e241e2854419d7c976_38x19.jpg) 是參數向量的 ![\ell_1](https://box.kancloud.cn/0fc8b02b257a34b4beb292b164e4bb5f_14x17.jpg)-norm 范數。 [`Lasso`](generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso "sklearn.linear_model.Lasso") 類的實現使用了 coordinate descent （坐標下降算法）來擬合系數。查看 [最小角回歸](#least-angle-regression) 用于另一個實現: ``` >>> from sklearn import linear_model >>> reg = linear_model.Lasso(alpha = 0.1) >>> reg.fit([[0, 0], [1, 1]], [0, 1]) Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000, normalize=False, positive=False, precompute=False, random_state=None, selection='cyclic', tol=0.0001, warm_start=False) >>> reg.predict([[1, 1]]) array([ 0.8]) ``` 對于較低級別的任務，同樣有用的是:func:lasso\_path。它能夠通過搜索所有可能的路徑上的值來計算系數。舉例: - [Lasso and Elastic Net for Sparse Signals](../auto_examples/linear_model/plot_lasso_and_elasticnet.html#sphx-glr-auto-examples-linear-model-plot-lasso-and-elasticnet-py) (稀疏信號的套索和彈性網) - [Compressive sensing: tomography reconstruction with L1 prior (Lasso)](../auto_examples/applications/plot_tomography_l1_reconstruction.html#sphx-glr-auto-examples-applications-plot-tomography-l1-reconstruction-py) (壓縮感知：L1先驗(Lasso)的斷層掃描重建) Note **Feature selection with Lasso(使用 Lasso 進行 Feature 的選擇)** 由于 Lasso 回歸產生稀疏模型，因此可以用于執行特征選擇，詳見 [基于 L1 的特征選取](feature_selection.html#l1-feature-selection) (基于L1的特征選擇). ### 1.1.3.1. 設置正則化參數 > `alpha` 參數控制估計系數的稀疏度。 #### 1.1.3.1.1. 使用交叉驗證 scikit-learn 通過交叉驗證來公開設置 Lasso `alpha` 參數的對象: [`LassoCV`](generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV "sklearn.linear_model.LassoCV") and [`LassoLarsCV`](generated/sklearn.linear_model.LassoLarsCV.html#sklearn.linear_model.LassoLarsCV "sklearn.linear_model.LassoLarsCV")。 [`LassoLarsCV`](generated/sklearn.linear_model.LassoLarsCV.html#sklearn.linear_model.LassoLarsCV "sklearn.linear_model.LassoLarsCV") 是基于下面解釋的 :ref:[`](#id16)least\_angle\_regression`(最小角度回歸)算法。對于具有許多線性回歸的高維數據集， [`LassoCV`](generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV "sklearn.linear_model.LassoCV") 最常見。然而，[`LassoLarsCV`](generated/sklearn.linear_model.LassoLarsCV.html#sklearn.linear_model.LassoLarsCV "sklearn.linear_model.LassoLarsCV") 在尋找 alpha parameter 參數值上更具有優勢，而且如果樣本數量與特征數量相比非常小時，通常 [`LassoLarsCV`](generated/sklearn.linear_model.LassoLarsCV.html#sklearn.linear_model.LassoLarsCV "sklearn.linear_model.LassoLarsCV") 比 [`LassoCV`](generated/sklearn.linear_model.LassoCV.html#sklearn.linear_model.LassoCV "sklearn.linear_model.LassoCV") 要快。 **[![lasso_cv_1](https://box.kancloud.cn/cfbe4932ab0a9af24155a6ebd93ab4c1_566x424.jpg)](../auto_examples/linear_model/plot_lasso_model_selection.html) [![lasso_cv_2](https://box.kancloud.cn/1c0ee0008edad5427cd927982e978a5a_566x424.jpg)](../auto_examples/linear_model/plot_lasso_model_selection.html)** #### 1.1.3.1.2. 基于信息標準的模型選擇有多種選擇時，估計器 [`LassoLarsIC`](generated/sklearn.linear_model.LassoLarsIC.html#sklearn.linear_model.LassoLarsIC "sklearn.linear_model.LassoLarsIC") 建議使用 Akaike information criterion （Akaike 信息準則）（AIC）和 Bayes Information criterion （貝葉斯信息準則）（BIC）。當使用 k-fold 交叉驗證時，正則化路徑只計算一次而不是k + 1次，所以找到α的最優值是一種計算上更便宜的替代方法。然而，這樣的標準需要對解決方案的自由度進行適當的估計，對于大樣本（漸近結果）導出，并假設模型是正確的，即數據實際上是由該模型生成的。當問題嚴重受限（比樣本更多的特征）時，他們也傾向于打破。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_lasso_model_selection_0011.png](https://box.kancloud.cn/07832f3b061961ca0059df31c401c715_566x424.jpg)](../auto_examples/linear_model/plot_lasso_model_selection.html) 舉例: - :ref:[`](#id19)sphx\_glr\_auto\_examples\_linear\_model\_plot\_lasso\_model\_selection.py`(Lasso 型號選擇：交叉驗證/AIC/BIC) #### 1.1.3.1.3. 與 SVM 的正則化參數的比較根據估計器和模型優化的精確目標函數，在 `alpha` 和 SVM 的正則化參數之間是等值的,其中 `C` 是通過 `alpha = 1 / C` 或者 `alpha = 1 / (n_samples * C)` 得到的。 ## 1.1.4. 多任務 Lasso > [`MultiTaskLasso`](generated/sklearn.linear_model.MultiTaskLasso.html#sklearn.linear_model.MultiTaskLasso "sklearn.linear_model.MultiTaskLasso") 是一個估計多元回歸稀疏系數的線性模型： `y` 是一個 `(n_samples, n_tasks)` 的二維數組，其約束條件和其他回歸問題（也稱為任務）是一樣的，都是所選的特征值。下圖比較了通過使用簡單的 Lasso 或 MultiTaskLasso 得到的 W 中非零的位置。 Lasso 估計分散的產生著非零值，而 MultiTaskLasso 的所有列都是非零的。 **[![multi_task_lasso_1](https://box.kancloud.cn/8d43f8d684a1cd3f4559f339c1bf99f3_566x354.jpg)](../auto_examples/linear_model/plot_multi_task_lasso_support.html) [![multi_task_lasso_2](https://box.kancloud.cn/113632f3e8a98cf96e45275e8a53283a_566x424.jpg)](../auto_examples/linear_model/plot_multi_task_lasso_support.html)** **擬合 time-series model ( 時間序列模型 )，強制任何活動的功能始終處于活動狀態。** 舉例: - [Joint feature selection with multi-task Lasso](../auto_examples/linear_model/plot_multi_task_lasso_support.html#sphx-glr-auto-examples-linear-model-plot-multi-task-lasso-support-py) (聯合功能選擇與多任務Lasso) 在數學上，它由一個線性模型組成，以混合的 ![\ell_1](https://box.kancloud.cn/0fc8b02b257a34b4beb292b164e4bb5f_14x17.jpg)![\ell_2](https://box.kancloud.cn/fdebc3bef7f3c2dfd95c1e6922ae5a76_14x16.jpg) 作為正則化器進行訓練。目標函數最小化是： ![\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X W - Y||_{Fro} ^ 2 + \alpha ||W||_{21}}](https://box.kancloud.cn/6bcc73b1c3ddfb2e5808180b648f2255_307x43.jpg) 其中 ![Fro](https://box.kancloud.cn/5abcb528f8e28ea508ac559d7c240bfe_32x12.jpg) 表示 Frobenius 標準： ![||A||_{Fro} = \sqrt{\sum_{ij} a_{ij}^2}](https://box.kancloud.cn/a82b14482d40f13d008f6f609c364c20_147x56.jpg) 并且 ![\ell_1](https://box.kancloud.cn/0fc8b02b257a34b4beb292b164e4bb5f_14x17.jpg)![\ell_2](https://box.kancloud.cn/fdebc3bef7f3c2dfd95c1e6922ae5a76_14x16.jpg) 讀取為: ![||A||_{2 1} = \sum_i \sqrt{\sum_j a_{ij}^2}](https://box.kancloud.cn/172560f5cb98b3caf44189501b6eb2fe_167x56.jpg) [`MultiTaskLasso`](generated/sklearn.linear_model.MultiTaskLasso.html#sklearn.linear_model.MultiTaskLasso "sklearn.linear_model.MultiTaskLasso") 類中的實現使用了坐標下降作為擬合系數的算法。 ## 1.1.5. 彈性網絡 `彈性網絡` 是一種使用L1,L2范數作為先驗正則項訓練的線性回歸模型。這種組合允許學習到一個只有少量參數是非零稀疏的模型，就像 [`Lasso`](generated/sklearn.linear_model.Lasso.html#sklearn.linear_model.Lasso "sklearn.linear_model.Lasso") 一樣, 但是它仍然保持一些像 [`Ridge`](generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge "sklearn.linear_model.Ridge") 的正則性質。我們可利用 `l1_ratio` 參數控制L1和L2的凸組合。彈性網絡在很多特征互相聯系的情況下是非常有用的。Lasso很可能只隨機考慮這些特征中的一個，而彈性網絡更傾向于選擇兩個。在實踐中，Lasso 和 Ridge 之間權衡的一個優勢是它允許在循環過程（Under rotate）中繼承 Ridge 的穩定性。在這里，最小化的目標函數是 ![\underset{w}{min\,} { \frac{1}{2n_{samples}} ||X w - y||_2 ^ 2 + \alpha \rho ||w||_1 + \frac{\alpha(1-\rho)}{2} ||w||_2 ^ 2}](https://box.kancloud.cn/996ca389ae93c4640566fa969f69eeaa_409x45.jpg) [`ElasticNetCV`](generated/sklearn.linear_model.ElasticNetCV.html#sklearn.linear_model.ElasticNetCV "sklearn.linear_model.ElasticNetCV") 類可以通過交叉驗證來設置參數 `alpha` (![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg)) 和 `l1_ratio` (![\rho](https://box.kancloud.cn/1420e98a48886e621b4ae0010ab7680b_9x12.jpg)) 。 Examples: - [Lasso and Elastic Net for Sparse Signals](../auto_examples/linear_model/plot_lasso_and_elasticnet.html#sphx-glr-auto-examples-linear-model-plot-lasso-and-elasticnet-py) - [Lasso and Elastic Net](../auto_examples/linear_model/plot_lasso_coordinate_descent_path.html#sphx-glr-auto-examples-linear-model-plot-lasso-coordinate-descent-path-py) ## 1.1.6. 多任務彈性網絡 > [`MultiTaskElasticNet`](generated/sklearn.linear_model.MultiTaskElasticNet.html#sklearn.linear_model.MultiTaskElasticNet "sklearn.linear_model.MultiTaskElasticNet") 是一個對多回歸問題估算稀疏參數的彈性網絡: `Y` 是一個二維數組，形狀是 `(n_samples,n_tasks)`。其限制條件是和其他回歸問題一樣，是選擇的特征，也稱為 tasks.。從數學上來說，它包含一個用 ![\ell_1](https://box.kancloud.cn/0fc8b02b257a34b4beb292b164e4bb5f_14x17.jpg)![\ell_2](https://box.kancloud.cn/fdebc3bef7f3c2dfd95c1e6922ae5a76_14x16.jpg) 先驗 and ![\ell_2](https://box.kancloud.cn/fdebc3bef7f3c2dfd95c1e6922ae5a76_14x16.jpg) 先驗為正則項訓練的線性模型目標函數就是最小化: ![\underset{W}{min\,} { \frac{1}{2n_{samples}} ||X W - Y||_{Fro}^2 + \alpha \rho ||W||_{2 1} + \frac{\alpha(1-\rho)}{2} ||W||_{Fro}^2}](https://box.kancloud.cn/f88ac8d812ec3ef804dce6d5829bd5d4_472x45.jpg) 在 [`MultiTaskElasticNet`](generated/sklearn.linear_model.MultiTaskElasticNet.html#sklearn.linear_model.MultiTaskElasticNet "sklearn.linear_model.MultiTaskElasticNet") 類中的實現采用了坐標下降法求解參數。在 [`MultiTaskElasticNetCV`](generated/sklearn.linear_model.MultiTaskElasticNetCV.html#sklearn.linear_model.MultiTaskElasticNetCV "sklearn.linear_model.MultiTaskElasticNetCV") 中可以通過交叉驗證來設置參數 `alpha` (![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg)) 和 `l1_ratio` (![\rho](https://box.kancloud.cn/1420e98a48886e621b4ae0010ab7680b_9x12.jpg)) 。 ## 1.1.7. 最小角回歸最小角回歸 (LARS) 是對高維數據的回歸算法，由Bradley Efron, Trevor Hastie, Iain Johnstone 和 Robert Tibshirani開發完成。 LARS和逐步回歸很像。在每一步，它尋找與響應最有關聯的預測。當有很多預測由相同的關聯時，它沒有繼續利用相同的預測，而是在這些預測中找出應該等角的方向。 LARS的優點: > - 當p >> n，該算法數值運算上非常有效。(例如當維度的數目遠超點的個數) > - 它在計算上和前向選擇一樣快，和普通最小二乘法有相同的運算復雜度。 > - 它產生了一個完整的分段線性的解決路徑，在交叉驗證或者其他相似的微調模型的方法上非常有用。 > - 如果兩個變量對響應幾乎有相等的聯系，則它們的系數應該有相似的增長率。因此這個算法和我們直覺上的判斷一樣，而且還更加穩定。 > - 它也很容易改變，為其他估算器提供解，比如Lasso。 LARS的缺點: > - 因為LARS是建立在循環擬合剩余變量上的，所以它對噪聲非常敏感。這個問題，在2004年統計年鑒的文章由Weisberg詳細討論。 LARS模型可以在 [`Lars`](generated/sklearn.linear_model.Lars.html#sklearn.linear_model.Lars "sklearn.linear_model.Lars") ，或者它的底層實現 [`lars_path`](generated/sklearn.linear_model.lars_path.html#sklearn.linear_model.lars_path "sklearn.linear_model.lars_path") 中被使用。 ## 1.1.8. LARS Lasso [`LassoLars`](generated/sklearn.linear_model.LassoLars.html#sklearn.linear_model.LassoLars "sklearn.linear_model.LassoLars") 是一個使用LARS算法的lasso模型，不同于基于坐標下降法的實現，它可以得到一個精確解，也就是一個關于自身參數標準化后的一個分段線性解。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_lasso_lars_0011.png](https://box.kancloud.cn/6d03ea8640ca368dc7b1e2bce3242efa_566x424.jpg)](../auto_examples/linear_model/plot_lasso_lars.html) ``` >>> from sklearn import linear_model >>> reg = linear_model.LassoLars(alpha=.1) >>> reg.fit([[0, 0], [1, 1]], [0, 1]) LassoLars(alpha=0.1, copy_X=True, eps=..., fit_intercept=True, fit_path=True, max_iter=500, normalize=True, positive=False, precompute='auto', verbose=False) >>> reg.coef_ array([ 0.717157..., 0. ]) ``` 例子: - [Lasso path using LARS](../auto_examples/linear_model/plot_lasso_lars.html#sphx-glr-auto-examples-linear-model-plot-lasso-lars-py) Lars算法提供了一個可以幾乎無代價的給出正則化系數的完整路徑，因此常利用函數 [`lars_path`](generated/sklearn.linear_model.lars_path.html#sklearn.linear_model.lars_path "sklearn.linear_model.lars_path") 來取回路徑。 ### 1.1.8.1. 數學表達式該算法和逐步回歸非常相似，但是它沒有在每一步包含變量，它估計的參數是根據與其他剩余變量的聯系來增加的。該算法沒有給出一個向量的結果，而是在LARS的解中，對每一個變量進行總體變量的L1正則化后顯示的一條曲線。完全的參數路徑存在``coef\_path\_``下。它的尺寸是 (n\_features, max\_features+1)。其中第一列通常是全0列。參考文獻: - Original Algorithm is detailed in the paper [Least Angle Regression](http://www-stat.stanford.edu/~hastie/Papers/LARS/LeastAngle_2002.pdf)by Hastie et al. ## 1.1.9. 正交匹配追蹤法（OMP） `OrthogonalMatchingPursuit(正交匹配追蹤法)` 和 `orthogonal_mp(正交匹配追蹤)`使用了OMP算法近似擬合了一個帶限制的線性模型，該限制限制了模型的非0系數(例：L0范數)。就像最小角回歸一樣，作為一個前向特征選擇方法，正交匹配追蹤法可以近似一個固定非0元素的最優向量解: ![\text{arg\,min\,} ||y - X\gamma||_2^2 \text{ subject to } \ ||\gamma||_0 \leq n_{nonzero\_coefs}](https://box.kancloud.cn/29a7fdf220ef04daa60fd4a3ba851487_398x22.jpg) 正交匹配追蹤法也可以不用特定的非0參數元素個數做限制，而是利用別的特定函數定義其損失函數。這個可以表示為: ![\text{arg\,min\,} ||\gamma||_0 \text{ subject to } ||y-X\gamma||_2^2 \ \leq \text{tol}](https://box.kancloud.cn/f4265c6bb7ba3dda4704fb679c917875_326x21.jpg) OMP是基于每一步的貪心算法，其每一步元素都是與當前殘差高度相關的。它跟較為簡單的匹配追蹤（MP）很相似，但是相比MP更好，在每一次迭代中，可以利用正交投影到之前選擇的字典元素重新計算殘差。例子: - [Orthogonal Matching Pursuit](../auto_examples/linear_model/plot_omp.html#sphx-glr-auto-examples-linear-model-plot-omp-py) 參考文獻: - <http://www.cs.technion.ac.il/~ronrubin/Publications/KSVD-OMP-v2.pdf> - [Matching pursuits with time-frequency dictionaries](http://blanche.polytechnique.fr/~mallat/papiers/MallatPursuit93.pdf), S. G. Mallat, Z. Zhang, ## 1.1.10. 貝葉斯回歸貝葉斯回歸可以用于在預估階段的參數正則化: 正則化參數的選擇不是通過人為的選擇，而是通過手動調節數據值來實現。上述過程可以通過引入 [無信息先驗](https://en.wikipedia.org/wiki/Non-informative_prior#Uninformative_priors)于模型中的超參數來完成。在嶺回歸中使用的 ![\ell_{2}](https://box.kancloud.cn/fdebc3bef7f3c2dfd95c1e6922ae5a76_14x16.jpg) 正則項相當于在 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 為高斯先驗條件下，且此先驗的精確度為 ![\lambda^{-1}](https://box.kancloud.cn/b8c7c8a07a761b25f93a664f0ebacb40_26x15.jpg)求最大后驗估計。在這里，我們沒有手工調參數lambda，而是讓他作為一個變量，通過數據中估計得到。為了得到一個全概率模型，輸出 ![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg) 也被認為是關于 ![X w](https://box.kancloud.cn/d563adeffa3e446908923bf9f35e6892_29x12.jpg):的高斯分布。 ![p(y|X,w,\alpha) = \mathcal{N}(y|X w,\alpha)](https://box.kancloud.cn/57131ddba5e66d257c0a0f0a2c4baa65_213x20.jpg) Alpha 在這里也是作為一個變量，通過數據中估計得到. 貝葉斯回歸有如下幾個優點: > - 它能根據已有的數據進行改變。 > - 它能在估計過程中引入正則項。貝葉斯回歸有如下缺點: > - 它的推斷過程是非常耗時的。參考文獻 - 一個對于貝葉斯方法的很好的介紹 C. Bishop: Pattern Recognition and Machine learning [\*](#id29)詳細介紹原創算法的一本書 [`](#id31)Bayesian learning for neuralnetworks` by Radford M. Neal ### 1.1.10.1. 貝葉斯嶺回歸 > `貝葉斯嶺回歸` 利用概率模型估算了上述的回歸問題，其先驗參數 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 是由以下球面高斯公式得出的： ![p(w|\lambda) = \mathcal{N}(w|0,\lambda^{-1}\bold{I_{p}})](https://box.kancloud.cn/9850699e7c567059d046f6c8e072ba1b_187x22.jpg) 先驗參數 ![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg) 和 ![\lambda](https://box.kancloud.cn/df246b6e42ab5c40b46b45e348319a89_10x12.jpg) 一般是服從 gamma 分布 <https://en.wikipedia.org/wiki/Gamma\_distribution> , 這個分布與高斯成共軛先驗關系。得到的模型一般稱為 *貝葉斯嶺回歸*, 并且這個與傳統的 [`Ridge`](generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge "sklearn.linear_model.Ridge") 非常相似。參數 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg), ![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg) 和 ![\lambda](https://box.kancloud.cn/df246b6e42ab5c40b46b45e348319a89_10x12.jpg) 是在模型擬合的時候一起被估算出來的。剩下的超參數就是 gamma 分布的先驗了。 ![\alpha](https://box.kancloud.cn/4e17a26ba4b90c226c2bc40e5a1a833a_11x8.jpg) 和 ![\lambda](https://box.kancloud.cn/df246b6e42ab5c40b46b45e348319a89_10x12.jpg) 。它們通常被選擇為 *沒有信息量* 。模型參數的估計一般利用 *最大似然對數估計法* 。默認 ![\alpha_1 = \alpha_2 = \lambda_1 = \lambda_2 = 10^{-6}](https://box.kancloud.cn/a5de7405725ebd5a3ab29b6b5361b501_203x19.jpg). [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_bayesian_ridge_0011.png](https://box.kancloud.cn/ff12e7c02758a6417269583d391b1d15_566x472.jpg)](../auto_examples/linear_model/plot_bayesian_ridge.html) 貝葉斯嶺回歸用來解決回歸問題: ``` >>> from sklearn import linear_model >>> X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]] >>> Y = [0., 1., 2., 3.] >>> reg = linear_model.BayesianRidge() >>> reg.fit(X, Y) BayesianRidge(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True, fit_intercept=True, lambda_1=1e-06, lambda_2=1e-06, n_iter=300, normalize=False, tol=0.001, verbose=False) ``` 在模型訓練完成后，可以用來預測新值: ``` >>> reg.predict ([[1, 0.]]) array([ 0.50000013]) ``` 權值 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 可以被這樣訪問: ``` >>> reg.coef_ array([ 0.49999993, 0.49999993]) ``` 由于貝葉斯框架的緣故，權值與 [普通最小二乘法](#ordinary-least-squares) 產生的不太一樣。但是，貝葉斯嶺回歸對病態問題（ill-posed）的魯棒性要更好。例子s: - [Bayesian Ridge Regression](../auto_examples/linear_model/plot_bayesian_ridge.html#sphx-glr-auto-examples-linear-model-plot-bayesian-ridge-py) 參考文獻 - 更多細節可以參考 [Bayesian Interpolation](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.27.9072&rep=rep1&type=pdf)by MacKay, David J. C. ### 1.1.10.2. 主動相關決策理論 - ARD > `主動相關決策理論` 和貝葉斯嶺回歸非常相似，但是會導致一個更加稀疏的權重 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg)[\[1\]](#id38) [\[2\]](#id39)。 `主動相關決策理論` 提出了一個不同于 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 的先驗假設。具體來說，就是弱化了高斯分布為球形的假設。它采用的是關于 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 軸平行的橢圓高斯分布。也就是說，每個權值 ![w_{i}](https://box.kancloud.cn/3ea92178b943698eb51f4a61771ecf7e_17x11.jpg) 精確度來自于一個中心在0點，精度為 ![\lambda_{i}](https://box.kancloud.cn/96d3f8b7b151a2b3c84a3ff2f9e1f0f5_14x15.jpg) 的分布中采樣得到的。 ![p(w|\lambda) = \mathcal{N}(w|0,A^{-1})](https://box.kancloud.cn/dc6991552e9cde1aeb2e3df5a7f7b5a3_173x21.jpg) 并且 ![diag \; (A) = \lambda = \{\lambda_{1},...,\lambda_{p}\}](https://box.kancloud.cn/88e0bef06525a481645dfee1c5a5c4fe_209x20.jpg). 與貝葉斯嶺回歸不同，每個 ![w_{i}](https://box.kancloud.cn/3ea92178b943698eb51f4a61771ecf7e_17x11.jpg) 都有一個標準差 ![\lambda_i](https://box.kancloud.cn/96d3f8b7b151a2b3c84a3ff2f9e1f0f5_14x15.jpg) 。所有的關于方差的系數 ![\lambda_i](https://box.kancloud.cn/96d3f8b7b151a2b3c84a3ff2f9e1f0f5_14x15.jpg) 和由給定的超參數 ![\lambda_1](https://box.kancloud.cn/0e169270bb04c2dbc2fc243a30d0641e_16x16.jpg) 、 ![\lambda_2](https://box.kancloud.cn/10794e278447b8582bc93646d72bf7ee_16x15.jpg)由相同的gamma分布。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_ard_0011.png](https://box.kancloud.cn/c44eb2ea0d431e73b08e8ea0f904e019_566x472.jpg)](../auto_examples/linear_model/plot_ard.html) ARD 也被稱為 *稀疏貝葉斯學習* 或 *相關向量機* [\[3\]](#id40) [\[4\]](#id41). 示例: - [Automatic Relevance Determination Regression (ARD)](../auto_examples/linear_model/plot_ard.html#sphx-glr-auto-examples-linear-model-plot-ard-py) 參考文獻: [\[1\]](#id34)Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 7.2.1[\[2\]](#id35)David Wipf and Srikantan Nagarajan: [A new view of automatic relevance determination](http://papers.nips.cc/paper/3372-a-new-view-of-automatic-relevance-determination.pdf)[\[3\]](#id36)Michael E. Tipping: [Sparse Bayesian Learning and the Relevance Vector Machine](http://www.jmlr.org/papers/volume1/tipping01a/tipping01a.pdf)[\[4\]](#id37)Tristan Fletcher: [Relevance Vector Machines explained](http://www.tristanfletcher.co.uk/RVM%20Explained.pdf) ## 1.1.11. logistic 回歸 logistic 回歸，雖然名字里有 “回歸” 二字，但實際上是解決分類問題的一類線性模型。在某些文獻中，logistic 回歸又被稱作 logit regression（logit 回歸），maximum-entropy classification(MaxEnt，最大熵分類)，或 log-linear classifier（線性對數分類器）。該模型利用函數 [logistic function](https://en.wikipedia.org/wiki/Logistic_function) 將單次試驗（single trial）的輸出轉化并描述為概率。 scikit-learn 中 logistic 回歸在 [`LogisticRegression`](generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression "sklearn.linear_model.LogisticRegression") 類中實現了二元（binary）、一對余（one-vs-rest）及多元 logistic 回歸，并帶有可選的 L1 和 L2 正則化。若視為一優化問題，帶 L2 罰項的二分類 logistic 回歸要最小化以下代價函數（cost function）： ![\underset{w, c}{min\,} \frac{1}{2}w^T w + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1) .](https://box.kancloud.cn/5a4ac7ae760b2453d14352c9c09a4aa0_378x51.jpg) 類似地，帶 L1 正則的 logistic 回歸需要求解下式： ![\underset{w, c}{min\,} \|w\|_1 + C \sum_{i=1}^n \log(\exp(- y_i (X_i^T w + c)) + 1) .](https://box.kancloud.cn/3a882786aaa3ec3d5666ab18d3f4ca22_368x51.jpg) 在 [`LogisticRegression`](generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression "sklearn.linear_model.LogisticRegression") 類中實現了這些求解器: “liblinear”, “newton-cg”, “lbfgs”, “sag” 和 “saga”。 “liblinear” 應用了坐標下降算法（Coordinate Descent, CD），并基于 scikit-learn 內附的高性能C++庫 [LIBLINEAR library](http://www.csie.ntu.edu.tw/~cjlin/liblinear/) 實現。不過CD算法訓練的模型不是真正意義上的多分類模型，而是基于 “one-vs-rest” 思想分解了這個優化問題，為每個類別都訓練了一個二元分類器。因為實現在底層使用該求解器的 [`LogisticRegression`](generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression "sklearn.linear_model.LogisticRegression") 實例對象表面上看是一個多元分類器。 [`sklearn.svm.l1_min_c`](generated/sklearn.svm.l1_min_c.html#sklearn.svm.l1_min_c "sklearn.svm.l1_min_c") 可以計算使用 L1 罰項時 C 的下界，以避免模型為空（即全部特征分量的權重為零）。 “lbfgs”, “sag” 和 “newton-cg” solvers （求解器）只支持 L2 罰項，對某些高維數據收斂更快。這些求解器的參數 [`](#id42)multi\_class`設為 “multinomial” 即可訓練一個真正的多元 logistic 回歸 [\[5\]](#id47)，其預測的概率比默認的 “one-vs-rest” 設定更為準確。 “sag” 求解器基于平均隨機梯度下降算法（Stochastic Average Gradient descent） [\[6\]](#id48)。在大數據集上的表現更快，大數據集指樣本量大且特征數多。 “saga” solver [\[7\]](#id49) 是 “sag” 的一類變體，它支持非平滑（non-smooth）的 L1 正則選項 `penalty="l1"` 。因此對于稀疏多元 logistic 回歸，往往選用該求解器。一言以蔽之，選用求解器可遵循如下規則: CaseSolverL1正則“liblinear” or “saga”多元損失（multinomial loss）“lbfgs”, “sag”, “saga” or “newton-cg”大數據集（n\_samples）“sag” or “saga”“saga” 一般都是最佳的選擇，但出于一些歷史遺留原因默認的是 “liblinear”。對于大數據集，還可以用 [`SGDClassifier`](generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier "sklearn.linear_model.SGDClassifier") ，并使用對數損失（’log’ loss）示例： - [L1 Penalty and Sparsity in Logistic Regression](../auto_examples/linear_model/plot_logistic_l1_l2_sparsity.html#sphx-glr-auto-examples-linear-model-plot-logistic-l1-l2-sparsity-py) - [Path with L1- Logistic Regression](../auto_examples/linear_model/plot_logistic_path.html#sphx-glr-auto-examples-linear-model-plot-logistic-path-py) - [Plot multinomial and One-vs-Rest Logistic Regression](../auto_examples/linear_model/plot_logistic_multinomial.html#sphx-glr-auto-examples-linear-model-plot-logistic-multinomial-py) - [Multiclass sparse logisitic regression on newgroups20](../auto_examples/linear_model/plot_sparse_logistic_regression_20newsgroups.html#sphx-glr-auto-examples-linear-model-plot-sparse-logistic-regression-20newsgroups-py) - [MNIST classfification using multinomial logistic + L1](../auto_examples/linear_model/plot_sparse_logistic_regression_mnist.html#sphx-glr-auto-examples-linear-model-plot-sparse-logistic-regression-mnist-py) 與 liblinear 的區別: 當 `fit_intercept=False` 、回歸得到的 `coef_` 以及待預測的數據為零時， [`LogisticRegression`](generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression "sklearn.linear_model.LogisticRegression") 用 `solver=liblinear`及 `LinearSVC` 與直接使用外部liblinear庫預測得分會有差異。這是因為，對于 `decision_function` 為零的樣本， [`LogisticRegression`](generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression "sklearn.linear_model.LogisticRegression") 和 `LinearSVC`將預測為負類，而liblinear預測為正類。注意，設定了 `fit_intercept=False` ，又有很多樣本使得 `decision_function` 為零的模型，很可能會欠擬合，其表現往往比較差。建議您設置 `fit_intercept=True` 并增大 `intercept_scaling`。 Note **利用稀疏 logistic 回歸（sparse logisitic regression）進行特征選擇** > 帶 L1 罰項的 logistic 回歸將得到稀疏模型（sparse model），相當于進行了特征選擇（feature selection），詳情參見 [基于 L1 的特征選取](feature_selection.html#l1-feature-selection) 。 [`LogisticRegressionCV`](generated/sklearn.linear_model.LogisticRegressionCV.html#sklearn.linear_model.LogisticRegressionCV "sklearn.linear_model.LogisticRegressionCV") 對 logistic 回歸的實現內置了交叉驗證（cross-validation），可以找出最優的參數 C。”newton-cg”, “sag”, “saga” 和 “lbfgs” 在高維數據上更快，因為采用了熱啟動（warm-starting）。在多分類設定下，若 multi\_class 設為”ovr”，會為每類求一個最佳的C值；若 multi\_class 設為”multinomial”，會通過交叉熵損失（cross-entropy loss）求出一個最佳 C 值。參考文獻： [\[5\]](#id44)Christopher M. Bishop: Pattern Recognition and Machine Learning, Chapter 4.3.4[\[6\]](#id45)Mark Schmidt, Nicolas Le Roux, and Francis Bach: [Minimizing Finite Sums with the Stochastic Average Gradient.](https://hal.inria.fr/hal-00860051/document)[\[7\]](#id46)Aaron Defazio, Francis Bach, Simon Lacoste-Julien: [SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives.](https://arxiv.org/abs/1407.0202) ## 1.1.12. 隨機梯度下降, SGD 隨機梯度下降是擬合線性模型的一個簡單而高效的方法。在樣本量（和特征數）很大時尤為有用。方法 `partial_fit` 可用于 online learning （在線學習）或基于 out-of-core learning （外存的學習） [`SGDClassifier`](generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier "sklearn.linear_model.SGDClassifier") 和 [`SGDRegressor`](generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor "sklearn.linear_model.SGDRegressor") 分別用于擬合分類問題和回歸問題的線性模型，可使用不同的（凸）損失函數，支持不同的罰項。例如，設定 `loss="log"` ，則 [`SGDClassifier`](generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier "sklearn.linear_model.SGDClassifier") 擬合一個邏輯斯蒂回歸模型，而 `loss="hinge"` 擬合線性支持向量機(SVM). 參考文獻 - [隨機梯度下降](sgd.html#sgd) ## 1.1.13. Perceptron（感知器） [`Perceptron`](generated/sklearn.linear_model.Perceptron.html#sklearn.linear_model.Perceptron "sklearn.linear_model.Perceptron") 是適用于 large scale learning（大規模學習）的一種簡單算法。默認地， > - 不需要設置學習率（learning rate）。 > - 不需要正則化處理。 > - 僅使用錯誤樣本更新模型。最后一點表明使用合頁損失（hinge loss）的感知機比SGD略快，所得模型更稀疏。 ## 1.1.14. Passive Aggressive Algorithms（被動攻擊算法）被動攻擊算法是大規模學習的一類算法。和感知機類似，它也不需要設置學習率，不過比感知機多出一個正則化參數 `C` 。對于分類問題， [`PassiveAggressiveClassifier`](generated/sklearn.linear_model.PassiveAggressiveClassifier.html#sklearn.linear_model.PassiveAggressiveClassifier "sklearn.linear_model.PassiveAggressiveClassifier") 可設定 `loss='hinge'` (PA-I)或 `loss='squared_hinge'` (PA-II)。對于回歸問題， [`PassiveAggressiveRegressor`](generated/sklearn.linear_model.PassiveAggressiveRegressor.html#sklearn.linear_model.PassiveAggressiveRegressor "sklearn.linear_model.PassiveAggressiveRegressor") 可設置 `loss='epsilon_insensitive'` (PA-I)或 `loss='squared_epsilon_insensitive'` (PA-II). 參考文獻： - [“Online Passive-Aggressive Algorithms”](http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf)K. Crammer, O. Dekel, J. Keshat, S. Shalev-Shwartz, Y. Singer - JMLR 7 (2006) ## 1.1.15. 穩健回歸（Robustness regression）: 處理離群點（outliers）和模型錯誤穩健回歸（robust regression）特別適用于回歸模型包含損壞數據（corrupt data）的情況，如離群點或模型中的錯誤。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_theilsen_0011.png](https://box.kancloud.cn/c7c49abe8118cc1dd1764d718c93ed64_566x424.jpg)](../auto_examples/linear_model/plot_theilsen.html) ### 1.1.15.1. 各種使用場景與相關概念處理包含離群點的數據時牢記以下幾點: - **離群值在X上還是在y方向上**? 離群值在y方向上離群值在X方向上[![y_outliers](https://box.kancloud.cn/e94db3a3e65eee675d0c947d054fc244_500x400.jpg)](../auto_examples/linear_model/plot_robust_fit.html)[![X_outliers](https://box.kancloud.cn/70f9f8cd29468133dd9704aac9d77f2e_500x400.jpg)](../auto_examples/linear_model/plot_robust_fit.html) - **離群點的比例 vs. 錯誤的量級（amplitude）** 離群點的數量很重要，離群程度也同樣重要。穩健擬合（robust fitting）的一個重要概念是崩潰點（breakdown point），即擬合模型（仍準確預測）所能承受的離群值最大比例。注意，在高維數據條件下（ n\_features 大），一般而言很難完成穩健擬合，很可能完全不起作用。 **折中：預測器的選擇** > Scikit-learn提供了三種穩健回歸的預測器（estimator）: [RANSAC](#ransac-regression) , [Theil Sen](#theil-sen-regression) 和 [HuberRegressor](#huber-regression) > > - [HuberRegressor](#huber-regression) 一般快于 [RANSAC](#ransac-regression) 和 [Theil Sen](#theil-sen-regression) ，除非樣本數很大，即 `n_samples` >> `n_features` 。這是因為 [RANSAC](#ransac-regression) 和 [Theil Sen](#theil-sen-regression)都是基于數據的較小子集進行擬合。但使用默認參數時， [Theil Sen](#theil-sen-regression)和 [RANSAC](#ransac-regression) 可能不如 [HuberRegressor](#huber-regression) 魯棒。 > - [RANSAC](#ransac-regression) 比 [Theil Sen](#theil-sen-regression) 更快，在樣本數量上的伸縮性（適應性）更好。 > - [RANSAC](#ransac-regression) 能更好地處理y方向的大值離群點（通常情況下）。 > - [Theil Sen](#theil-sen-regression) 能更好地處理x方向中等大小的離群點，但在高維情況下無法保證這一特點。實在決定不了的話，請使用 [RANSAC](#ransac-regression) ### 1.1.15.2. RANSAC：隨機抽樣一致性算法（RANdom SAmple Consensus）隨機抽樣一致性算法（RANdom SAmple Consensus, RANSAC）利用全體數據中局內點（inliers）的一個隨機子集擬合模型。 RANSAC是一種非確定性算法，以一定概率輸出一個可能的合理結果，依賴于迭代次數（參數 max\_trials ）。這種算法主要解決線性或非線性回歸問題，在計算機視覺攝影測量領域尤為流行。算法從全體樣本輸入中分出一個局內點集合，全體樣本可能由于測量錯誤或對數據的假設錯誤而含有噪點、離群點。最終的模型僅從這個局內點集合中得出。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_ransac_0011.png](https://box.kancloud.cn/2122591bfa44a833d719aa8ac9a28d2e_566x424.jpg)](../auto_examples/linear_model/plot_ransac.html) #### 1.1.15.2.1. 算法細節每輪迭代執行以下步驟: 1. 從原始數據中抽樣 `min_samples` 數量的隨機樣本，檢查數據是否合法（見 `is_data_valid` ）. 2. 用一個隨機子集擬合模型（ `base_estimator.fit` ）。檢查模型是否合法（見 `is_model_valid` ）。 3. 計算預測模型的殘差（residual），將全體數據分成局內點和離群點（ `base_estimator.predict(X) - y` ） > - 絕對殘差小于 `residual_threshold` 的全體數據認為是局內點。 1. 若局內點樣本數最大，保存當前模型為最佳模型。以免當前模型離群點數量恰好相等（而出現未定義情況），規定僅當數值大于當前最值時認為是最佳模型。上述步驟或者迭代到最大次數（ `max_trials` ），或者某些終止條件滿足時停下（見 `stop_n_inliers` 和 `stop_score` )。最終模型由之前確定的最佳模型的局內點樣本（一致性集合，consensus set）預測。函數 `is_data_valid` 和 `is_model_valid` 可以識別出隨機樣本子集中的退化組合（degenerate combinations）并予以丟棄（reject）。即便不需要考慮退化情況，也會使用 `is_data_valid` ，因為在擬合模型之前調用它能得到更高的計算性能。示例： - [Robust linear model estimation using RANSAC](../auto_examples/linear_model/plot_ransac.html#sphx-glr-auto-examples-linear-model-plot-ransac-py) - [Robust linear estimator fitting](../auto_examples/linear_model/plot_robust_fit.html#sphx-glr-auto-examples-linear-model-plot-robust-fit-py) 參考文獻： - <https://en.wikipedia.org/wiki/RANSAC> - [“Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography”](http://www.cs.columbia.edu/~belhumeur/courses/compPhoto/ransac.pdf)Martin A. Fischler and Robert C. Bolles - SRI International (1981) - [“Performance Evaluation of RANSAC Family”](http://www.bmva.org/bmvc/2009/Papers/Paper355/Paper355.pdf)Sunglok Choi, Taemin Kim and Wonpil Yu - BMVC (2009) ### 1.1.15.3. Theil-Sen 預估器: 廣義中值估計 [`TheilSenRegressor`](generated/sklearn.linear_model.TheilSenRegressor.html#sklearn.linear_model.TheilSenRegressor "sklearn.linear_model.TheilSenRegressor") 估計器：使用中位數在多個維度推廣，因此對多維離散值是有幫助，但問題是，隨著維數的增加，估計器的準確性在迅速下降。準確性的丟失，導致在高維上的估計值比不上普通的最小二乘法。示例: - [Theil-Sen Regression](../auto_examples/linear_model/plot_theilsen.html#sphx-glr-auto-examples-linear-model-plot-theilsen-py) - [Robust linear estimator fitting](../auto_examples/linear_model/plot_robust_fit.html#sphx-glr-auto-examples-linear-model-plot-robust-fit-py) 參考文獻: - [https://en.wikipedia.org/wiki/Theil%E2%80%93Sen\_estimator](https://en.wikipedia.org/wiki/Theil%E2%80%93Sen_estimator) #### 1.1.15.3.1. 算法理論細節 [`TheilSenRegressor`](generated/sklearn.linear_model.TheilSenRegressor.html#sklearn.linear_model.TheilSenRegressor "sklearn.linear_model.TheilSenRegressor") 媲美 [Ordinary Least Squares (OLS)](#ordinary-least-squares) （普通最小二乘法（OLS））漸近效率和無偏估計。在對比 OLS, Theil-Sen 是一種非參數方法，這意味著它沒有對底層數據的分布假設。由于 Theil-Sen 是基于中位數的估計，它是更適合的對損壞的數據。在單變量的設置，Theil-Sen 在一個簡單的線性回歸，這意味著它可以容忍任意損壞的數據高達 29.3% 的情況下，約 29.3% 的一個崩潰點。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_theilsen_0011.png](https://box.kancloud.cn/c7c49abe8118cc1dd1764d718c93ed64_566x424.jpg)](../auto_examples/linear_model/plot_theilsen.html) 在 scikit-learn 中 [`TheilSenRegressor`](generated/sklearn.linear_model.TheilSenRegressor.html#sklearn.linear_model.TheilSenRegressor "sklearn.linear_model.TheilSenRegressor") 實施如下的學習推廣到多元線性回歸模型 [\[8\]](#f1) 利用空間中這是一個概括的中位數多維度 [\[9\]](#f2) 。在時間復雜度和空間復雜度，根據 Theil-Sen 量表 ![\binom{n_{samples}}{n_{subsamples}}](https://box.kancloud.cn/66f46e5e971cce5351baab8ad950d52c_98x45.jpg) 這使得它不適用于大量樣本和特征的問題。因此，可以選擇一個亞群的大小來限制時間和空間復雜度，只考慮所有可能組合的隨機子集。示例: - [Theil-Sen Regression](../auto_examples/linear_model/plot_theilsen.html#sphx-glr-auto-examples-linear-model-plot-theilsen-py) 參考文獻: [\[8\]](#id54)Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang: [Theil-Sen Estimators in a Multiple Linear Regression Model.](http://home.olemiss.edu/~xdang/papers/MTSE.pdf)[\[9\]](#id55)1. K?rkk?inen and S. ?yr?m?: [On Computation of Spatial Median for Robust Data Mining.](http://users.jyu.fi/~samiayr/pdf/ayramo_eurogen05.pdf) ### 1.1.15.4. Huber 回歸 [`HuberRegressor`](generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor "sklearn.linear_model.HuberRegressor") 不同，因為它適用于 [`Ridge`](generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge "sklearn.linear_model.Ridge") 損耗的樣品被分類為離群值。如果這個樣品的絕對誤差小于某一閾值，樣品就分為一層。它不同于 [`TheilSenRegressor`](generated/sklearn.linear_model.TheilSenRegressor.html#sklearn.linear_model.TheilSenRegressor "sklearn.linear_model.TheilSenRegressor") 和 [`RANSACRegressor`](generated/sklearn.linear_model.RANSACRegressor.html#sklearn.linear_model.RANSACRegressor "sklearn.linear_model.RANSACRegressor") 因為它無法忽略對離群值的影響，但對它們的權重較小。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_huber_vs_ridge_001.png](https://box.kancloud.cn/3d3bb68a1054ab28b13c24aff9d9d2d7_566x424.jpg)](../auto_examples/linear_model/plot_huber_vs_ridge.html) 這個 [`HuberRegressor`](generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor "sklearn.linear_model.HuberRegressor") 最小化損失函數是由 ![\underset{w, \sigma}{min\,} {\sum_{i=1}^n\left(\sigma + H_m\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \alpha {||w||_2}^2}](https://box.kancloud.cn/16e701569ccacd7b9444ad86e7626a84_349x51.jpg) 其中 ![H_m(z) = \begin{cases} z^2, & \text {if } |z| < \epsilon, \\ 2\epsilon|z| - \epsilon^2, & \text{otherwise} \end{cases}](https://box.kancloud.cn/34d9ed5206997e110efb259afb301706_257x55.jpg) 建議設置參數 `epsilon` 為 1.35 以實現 95% 統計效率。 ### 1.1.15.5. 注意 [`HuberRegressor`](generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor "sklearn.linear_model.HuberRegressor") 與將損失設置為 huber 的 [`SGDRegressor`](generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor "sklearn.linear_model.SGDRegressor") 在以下方面的使用方式上是不同的。 - [`HuberRegressor`](generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor "sklearn.linear_model.HuberRegressor") 是標度不變性的. 一旦設置了 `epsilon` , 通過不同的值向上或向下縮放 `X` 和 `y` ，就會跟以前一樣對異常值產生同樣的鍵壯性。相比 [`SGDRegressor`](generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor "sklearn.linear_model.SGDRegressor") 其中 `epsilon` 在 `X` 和 `y` 是縮放的時候必須再次設置。 - [`HuberRegressor`](generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor "sklearn.linear_model.HuberRegressor") 應該更有效地使用在小樣本數據，同時 [`SGDRegressor`](generated/sklearn.linear_model.SGDRegressor.html#sklearn.linear_model.SGDRegressor "sklearn.linear_model.SGDRegressor") 需要在訓練數據的次數來產生相同的鍵壯性。示例: - [HuberRegressor vs Ridge on dataset with strong outliers](../auto_examples/linear_model/plot_huber_vs_ridge.html#sphx-glr-auto-examples-linear-model-plot-huber-vs-ridge-py) 參考文獻: - Peter J. Huber, Elvezio M. Ronchetti: Robust Statistics, Concomitant scale estimates, pg 172 另外，這個估計是不同于 R 實現的 Robust Regression (<http://www.ats.ucla.edu/stat/r/dae/rreg.htm>) 因為 R 不實現加權最小二乘實現每個樣本上給出多少剩余的基礎重量大于某一閾值。 ## 1.1.16. 多項式回歸：用基函數展開線性模型機器學習中一種常見的模式，是使用線性模型訓練數據的非線性函數。這種方法保持了一般快速的線性方法的性能，同時允許它們適應更廣泛的數據范圍。例如，可以通過構造系數的 **polynomial features** 來擴展一個簡單的線性回歸。在標準線性回歸的情況下，你可能有一個類似于二維數據的模型: ![\hat{y}(w, x) = w_0 + w_1 x_1 + w_2 x_2](https://box.kancloud.cn/3d3082c90bcd744b4af973bede46eb76_217x18.jpg) 如果我們想把拋物面擬合成數據而不是平面，我們可以結合二階多項式的特征，使模型看起來像這樣: ![\hat{y}(w, x) = w_0 + w_1 x_1 + w_2 x_2 + w_3 x_1 x_2 + w_4 x_1^2 + w_5 x_2^2](https://box.kancloud.cn/46fbcb188ceadd398e256b2b3522d36c_412x22.jpg) （這有時候是令人驚訝的）觀察，這還是 *still a linear model* : 看到這個，想象創造一個新的變量 ![z = [x_1, x_2, x_1 x_2, x_1^2, x_2^2]](https://box.kancloud.cn/69236e855ef60243971fb87a359d9c77_178x22.jpg) 有了這些數據的重新標記的數據，我們的問題就可以寫了。 ![\hat{y}(w, x) = w_0 + w_1 z_1 + w_2 z_2 + w_3 z_3 + w_4 z_4 + w_5 z_5](https://box.kancloud.cn/2ec34f853b96dc4b7c855a225847b059_385x18.jpg) 我們看到，導致 *polynomial regression* 是線性模型中的同一類，我們認為以上（即模型是線性），可以用同樣的方法解決。通過考慮在用這些基函數建立的高維空間中的線性擬合，該模型具有靈活性，可以適應更廣泛的數據范圍。這里是一個例子，應用這個想法，一維數據，使用不同程度的多項式特征: [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_polynomial_interpolation_0011.png](https://box.kancloud.cn/10adcc446008d37f749ea7ef110b6196_566x424.jpg)](../auto_examples/linear_model/plot_polynomial_interpolation.html) 這個圖是使用 [`PolynomialFeatures`](generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures "sklearn.preprocessing.PolynomialFeatures") 預創建。該預處理器將輸入數據矩陣轉換為給定度的新數據矩陣。它可以使用如下: ``` >>> from sklearn.preprocessing import PolynomialFeatures >>> import numpy as np >>> X = np.arange(6).reshape(3, 2) >>> X array([[0, 1], [2, 3], [4, 5]]) >>> poly = PolynomialFeatures(degree=2) >>> poly.fit_transform(X) array([[ 1., 0., 1., 0., 0., 1.], [ 1., 2., 3., 4., 6., 9.], [ 1., 4., 5., 16., 20., 25.]]) ``` `X` 的特征已經從 ![[x_1, x_2]](https://box.kancloud.cn/ffa0f7d57ac28833ed3809493d7d894f_49x18.jpg) 轉換到 ![[1, x_1, x_2, x_1^2, x_1 x_2, x_2^2]](https://box.kancloud.cn/bfb681246d9c4dfe138fde50b024fdeb_160x21.jpg), 并且現在可以用在任何線性模型。這種預處理可以通過 [Pipeline](pipeline.html#pipeline) 工具進行簡化。可以創建一個表示簡單多項式回歸的單個對象，并使用如下所示: ``` >>> from sklearn.preprocessing import PolynomialFeatures >>> from sklearn.linear_model import LinearRegression >>> from sklearn.pipeline import Pipeline >>> import numpy as np >>> model = Pipeline([('poly', PolynomialFeatures(degree=3)), ... ('linear', LinearRegression(fit_intercept=False))]) >>> # fit to an order-3 polynomial data >>> x = np.arange(5) >>> y = 3 - 2 * x + x ** 2 - x ** 3 >>> model = model.fit(x[:, np.newaxis], y) >>> model.named_steps['linear'].coef_ array([ 3., -2., 1., -1.]) ``` 利用多項式特征訓練的線性模型能夠準確地恢復輸入多項式系數。在某些情況下，沒有必要包含任何單個特征的更高的冪，但只需要在大多數 ![d](https://box.kancloud.cn/c7d8c62c9dba7f2dfd95aa73d579b8ae_10x13.jpg) 不同的特征上相乘的所謂 *interaction features（交互特征）* 。這些可以與設定的 `interaction_only=True` 的 [`PolynomialFeatures`](generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures "sklearn.preprocessing.PolynomialFeatures") 得到。例如，當處理布爾屬性， ![x_i^n = x_i](https://box.kancloud.cn/8d187440ed57fc8a278486521c7efa4e_57x17.jpg) 所有 ![n](https://box.kancloud.cn/ee463e4b2bbbc723c7017b00e6d51b41_11x8.jpg) ，因此是無用的；但 ![x_i x_j](https://box.kancloud.cn/4aea6014e18580e1538a271ea433b8ff_32x14.jpg) 代表兩布爾合取。這樣我們就可以用線性分類器解決異或問題: ``` >>> from sklearn.linear_model import Perceptron >>> from sklearn.preprocessing import PolynomialFeatures >>> import numpy as np >>> X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) >>> y = X[:, 0] ^ X[:, 1] >>> y array([0, 1, 1, 0]) >>> X = PolynomialFeatures(interaction_only=True).fit_transform(X).astype(int) >>> X array([[1, 0, 0, 0], [1, 0, 1, 0], [1, 1, 0, 0], [1, 1, 1, 1]]) >>> clf = Perceptron(fit_intercept=False, max_iter=10, tol=None, ... shuffle=False).fit(X, y) ``` 分類器的 “predictions” 是完美的: ``` >>> clf.predict(X) array([0, 1, 1, 0]) >>> clf.score(X, y) 1.0 ```