在 Python 中使用 XGBoost 調整梯度提升的學習率 · Machine Learning Mastery 博客文章翻譯

# 在 Python 中使用 XGBoost 調整梯度提升的學習率 > 原文： [https://machinelearningmastery.com/tune-learning-rate-for-gradient-boosting-with-xgboost-in-python/](https://machinelearningmastery.com/tune-learning-rate-for-gradient-boosting-with-xgboost-in-python/) 梯度提升決策樹的問題在于它們快速學習和過度訓練數據。在梯度提升模型中減慢學習速度的一種有效方法是使用學習速率，也稱為收縮（或 XGBoost 文檔中的 eta）。在這篇文章中，您將發現梯度提升中學習速率的影響以及如何使用 Python 中的 XGBoost 庫將其調整到機器學習問題上。閱讀這篇文章后你會知道： * 效果學習率對梯度提升模型有影響。 * 如何在您的機器上調整學習率來學習您的問題。 * 如何調整提升樹木數量和問題學習率之間的權衡。讓我們開始吧。 * **2017 年 1 月更新**：已更新，以反映 scikit-learn API 版本 0.18.1 中的更改??。 ![Tune Learning Rate for Gradient Boosting with XGBoost in Python](https://img.kancloud.cn/84/4a/844a16ac1f17b949fc1bfc7bf0881a1e_640x425.jpg) 在 Python 中使用 XGBoost 調整梯度提升的學習率照片由 [Robert Hertel](https://www.flickr.com/photos/roberthertel/14890278255/) 拍攝，保留一些權利。 ## 緩慢學習梯度提升與學習率梯度提升涉及按順序為模型創建和添加樹。創建新樹以從現有樹序列中校正預測中的殘差。效果是模型可以快速擬合，然后過度擬合訓練數據集。在梯度提升模型中減慢學習的技術是在添加到模型時應用新樹的校正的加權因子。這種加權稱為收縮因子或學習率，取決于文獻或工具。天然梯度提升與收縮時的梯度提升相同，其中收縮系數設定為 1.0。設置值小于 1.0 會對添加到模型中的每個樹進行較少的更正。這反過來導致必須將更多樹添加到模型中。通常具有 0.1 至 0.3 范圍內的小值，以及小于 0.1 的值。讓我們研究一下學習率對標準機器學習數據集的影響。 ## 問題描述：Otto Dataset 在本教程中，我們將使用 [Otto Group 產品分類挑戰](https://www.kaggle.com/c/otto-group-product-classification-challenge)數據集。此數據集可從 Kaggle 免費獲得（您需要注冊 Kaggle 才能下載此數據集）。您可以從[數據頁面](https://www.kaggle.com/c/otto-group-product-classification-challenge/data)下載訓練數據集 **train.csv.zip** ，并將解壓縮的 **train.csv** 文件放入您的工作目錄。該數據集描述了超過 61,000 種產品的 93 個模糊細節，這些產品分為 10 個產品類別（例如時裝，電子產品等）。輸入屬性是某種不同事件的計數。目標是對新產品進行預測，因為 10 個類別中的每個類別都有一組概率，并且使用多類對數損失（也稱為交叉熵）來評估模型。這個競賽在 2015 年 5 月完成，這個數據集對 XGBoost 來說是一個很好的挑戰，因為它有很多例子，問題的難度以及需要很少數據準備的事實（除了將字符串類變量編碼為整數）。 ## 在 XGBoost 中調整學習率使用 scikit-learn 包裝器創建具有 XGBoost 的梯度提升模型時，可以設置 **learning_rate** 參數來控制添加到模型中的新樹的權重。我們可以使用 scikit-learn 中的網格搜索功能來評估訓練具有不同學習速率值的梯度提升模型的對數損失的影響。我們將樹的數量保持為默認值 100，并評估 Otto 數據集上學習率的標準值套件。 ```py learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3] ``` 要測試的學習率有 6 種變化，每種變化將使用 10 倍交叉驗證進行評估，這意味著總共需要訓練和評估 6×10 或 60 個 XGBoost 模型。將打印每個學習率的對數損失以及導致最佳表現的值。 ```py # XGBoost on Otto dataset, Tune learning_rate from pandas import read_csv from xgboost import XGBClassifier from sklearn.model_selection import GridSearchCV from sklearn.model_selection import StratifiedKFold from sklearn.preprocessing import LabelEncoder import matplotlib matplotlib.use('Agg') from matplotlib import pyplot # load data data = read_csv('train.csv') dataset = data.values # split data into X and y X = dataset[:,0:94] y = dataset[:,94] # encode string class values as integers label_encoded_y = LabelEncoder().fit_transform(y) # grid search model = XGBClassifier() learning_rate = [0.0001, 0.001, 0.01, 0.1, 0.2, 0.3] param_grid = dict(learning_rate=learning_rate) kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7) grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold) grid_result = grid_search.fit(X, label_encoded_y) # summarize results print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) # plot pyplot.errorbar(learning_rate, means, yerr=stds) pyplot.title("XGBoost learning_rate vs Log Loss") pyplot.xlabel('learning_rate') pyplot.ylabel('Log Loss') pyplot.savefig('learning_rate.png') ``` 運行此示例將打印每個評估學習速率的最佳結果以及日志丟失。 ```py Best: -0.001156 using {'learning_rate': 0.2} -2.155497 (0.000081) with: {'learning_rate': 0.0001} -1.841069 (0.000716) with: {'learning_rate': 0.001} -0.597299 (0.000822) with: {'learning_rate': 0.01} -0.001239 (0.001730) with: {'learning_rate': 0.1} -0.001156 (0.001684) with: {'learning_rate': 0.2} -0.001158 (0.001666) with: {'learning_rate': 0.3} ``` 有趣的是，我們可以看到最佳學習率為 0.2。這是一個很高的學習率，它表明，100 的默認樹數可能太低，需要增加。我們還可以繪制（倒置的）對數損失分數的學習率的影響，盡管所選擇的 learning_rate 值的 log10 樣擴展意味著大多數被壓縮在接近零的圖的左側。 ![Tune Learning Rate in XGBoost](https://img.kancloud.cn/07/ad/07ad042b9730d834b3053df09137ba38_800x600.jpg) 在 XGBoost 中調整學習率接下來，我們將研究在改變學習率的同時改變樹的數量。 ## 調整學習率和 XGBoost 中的樹數較小的學習率通常需要將更多樹添加到模型中。我們可以通過評估參數對的網格來探索這種關系。決策樹的數量將在 100 到 500 之間變化，學習率在 log10 范圍內從 0.0001 變化到 0.1。 ```py n_estimators = [100, 200, 300, 400, 500] learning_rate = [0.0001, 0.001, 0.01, 0.1] ``` **n_estimators** 有 5 種變體， **learning_rate** 有 4 種變體。每個組合將使用 10 倍交叉驗證進行評估，因此總共需要訓練和評估 4x5x10 或 200 個 XGBoost 模型。期望的是，對于給定的學習率，隨著樹木數量的增加，表現將提高然后穩定。完整的代碼清單如下。 ```py # XGBoost on Otto dataset, Tune learning_rate and n_estimators from pandas import read_csv from xgboost import XGBClassifier from sklearn.model_selection import GridSearchCV from sklearn.model_selection import StratifiedKFold from sklearn.preprocessing import LabelEncoder import matplotlib matplotlib.use('Agg') from matplotlib import pyplot import numpy # load data data = read_csv('train.csv') dataset = data.values # split data into X and y X = dataset[:,0:94] y = dataset[:,94] # encode string class values as integers label_encoded_y = LabelEncoder().fit_transform(y) # grid search model = XGBClassifier() n_estimators = [100, 200, 300, 400, 500] learning_rate = [0.0001, 0.001, 0.01, 0.1] param_grid = dict(learning_rate=learning_rate, n_estimators=n_estimators) kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=7) grid_search = GridSearchCV(model, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold) grid_result = grid_search.fit(X, label_encoded_y) # summarize results print("Best: %f using %s" % (grid_result.best_score_, grid_result.best_params_)) means = grid_result.cv_results_['mean_test_score'] stds = grid_result.cv_results_['std_test_score'] params = grid_result.cv_results_['params'] for mean, stdev, param in zip(means, stds, params): print("%f (%f) with: %r" % (mean, stdev, param)) # plot results scores = numpy.array(means).reshape(len(learning_rate), len(n_estimators)) for i, value in enumerate(learning_rate): pyplot.plot(n_estimators, scores[i], label='learning_rate: ' + str(value)) pyplot.legend() pyplot.xlabel('n_estimators') pyplot.ylabel('Log Loss') pyplot.savefig('n_estimators_vs_learning_rate.png') ``` 運行該示例將打印每個已評估對的最佳組合以及日志丟失。 ```py Best: -0.001152 using {'n_estimators': 300, 'learning_rate': 0.1} -2.155497 (0.000081) with: {'n_estimators': 100, 'learning_rate': 0.0001} -2.115540 (0.000159) with: {'n_estimators': 200, 'learning_rate': 0.0001} -2.077211 (0.000233) with: {'n_estimators': 300, 'learning_rate': 0.0001} -2.040386 (0.000304) with: {'n_estimators': 400, 'learning_rate': 0.0001} -2.004955 (0.000373) with: {'n_estimators': 500, 'learning_rate': 0.0001} -1.841069 (0.000716) with: {'n_estimators': 100, 'learning_rate': 0.001} -1.572384 (0.000692) with: {'n_estimators': 200, 'learning_rate': 0.001} -1.364543 (0.000699) with: {'n_estimators': 300, 'learning_rate': 0.001} -1.196490 (0.000713) with: {'n_estimators': 400, 'learning_rate': 0.001} -1.056687 (0.000728) with: {'n_estimators': 500, 'learning_rate': 0.001} -0.597299 (0.000822) with: {'n_estimators': 100, 'learning_rate': 0.01} -0.214311 (0.000929) with: {'n_estimators': 200, 'learning_rate': 0.01} -0.080729 (0.000982) with: {'n_estimators': 300, 'learning_rate': 0.01} -0.030533 (0.000949) with: {'n_estimators': 400, 'learning_rate': 0.01} -0.011769 (0.001071) with: {'n_estimators': 500, 'learning_rate': 0.01} -0.001239 (0.001730) with: {'n_estimators': 100, 'learning_rate': 0.1} -0.001153 (0.001702) with: {'n_estimators': 200, 'learning_rate': 0.1} -0.001152 (0.001704) with: {'n_estimators': 300, 'learning_rate': 0.1} -0.001153 (0.001708) with: {'n_estimators': 400, 'learning_rate': 0.1} -0.001153 (0.001708) with: {'n_estimators': 500, 'learning_rate': 0.1} ``` 我們可以看到觀察到的最佳結果是有 300 棵樹的學習率為 0.1。很難從原始數據和小的負日志損失結果中挑選出趨勢。下面是每個學習率的圖表，顯示了樹木數量變化時的對數損失表現。 ![Tuning Learning Rate and Number of Trees in XGBoost](https://img.kancloud.cn/94/2e/942e1b8709c8dad4f155794a9d1d14c5_800x600.jpg) 調整 XGBoost 中的學習率和樹數我們可以看到預期的總趨勢成立，其中表現（反向對數損失）隨著樹木數量的增加而提高。對于較小的學習率，表現通常較差，這表明可能需要更多的樹木。我們可能需要將樹的數量增加到數千，這可能在計算上非常昂貴。由于圖的大 y 軸比例， **learning_rate = 0.1** 的結果變得模糊。我們可以只為 **learning_rate = 0.1** 提取表現測量并直接繪制它們。 ```py # Plot performance for learning_rate=0.1 from matplotlib import pyplot n_estimators = [100, 200, 300, 400, 500] loss = [-0.001239, -0.001153, -0.001152, -0.001153, -0.001153] pyplot.plot(n_estimators, loss) pyplot.xlabel('n_estimators') pyplot.ylabel('Log Loss') pyplot.title('XGBoost learning_rate=0.1 n_estimators vs Log Loss') pyplot.show() ``` 運行此代碼會顯示隨著樹木數量的增加而提高的表現，其次是 400 和 500 棵樹的表現平穩。 ![Plot of Learning Rate=0.1 and varying the Number of Trees in XGBoost](https://img.kancloud.cn/31/29/3129ca0240dffbe6a1bd407000a436dd_800x600.jpg) 學習率的曲線= 0.1 并且改變 XGBoost 中的樹數 ## 摘要在這篇文章中，您發現了為梯度提升模型加權添加新樹的效果，稱為收縮或學習率。具體來說，你學到了： * 增加學習速率旨在減慢模型對訓練數據的適應性。 * 如何評估機器學習問題的一系列學習率值。 * 如何評估改變樹木數量和學習率的關系。您對梯度提升或此帖的收縮有任何疑問嗎？在評論中提出您的問題，我會盡力回答。