如何在 Python 中開發可重復使用的抽樣檢查算法框架 · Machine Learning Mastery 博客文章翻譯

# 如何在 Python 中開發可重復使用的抽樣檢查算法框架 > 譯文： [https://machinelearningmastery.com/spot-check-machine-learning-algorithms-in-python/](https://machinelearningmastery.com/spot-check-machine-learning-algorithms-in-python/) [抽樣檢查算法](https://machinelearningmastery.com/spot-check-classification-machine-learning-algorithms-python-scikit-learn/)是一種應用機器學習技術，旨在快速客觀地為新的預測建模問題提供第一組結果。與尋找算法的最佳算法或最佳配置的網格搜索和其他類型的算法調整不同，點檢查旨在快速評估各種算法并提供粗略的第一切結果。如果問題或問題表示確實是可預測的，則可以使用該第一剪切結果，如果是，則可能值得進一步研究該問題的算法類型。現場檢查是一種幫助克服應用機器學習的“[難題](https://machinelearningmastery.com/applied-machine-learning-is-hard/)”的方法，并鼓勵您清楚地考慮在任何機器學習項目中執行的[高階搜索問題](https://machinelearningmastery.com/applied-machine-learning-as-a-search-problem/) 。在本教程中，您將發現現場檢查算法對新預測建模問題的有用性，以及如何在 python 中為分類和回歸問題開發用于抽樣檢查算法的標準框架。完成本教程后，您將了解： * 抽樣檢查提供了一種快速發現在預測建模問題上表現良好的算法類型的方法。 * 如何開發用于加載數據，定義模型，評估模型和總結結果的通用框架。 * 如何應用框架進行分類和回歸問題。讓我們開始吧。 ![How to Develop a Reusable Framework for Spot-Check Algorithms in Python](https://img.kancloud.cn/e7/b9/e7b9ddc66cda52c57ec2a583e8dbc55d_640x360.jpg) 如何在 Python 中開發可重復使用的抽樣檢查算法框架 [Jeff Turner](https://www.flickr.com/photos/respres/16216077206/) 的照片，保留一些權利。 ## 教程概述本教程分為五個部分;他們是： 1. 抽樣檢查算法 2. Python 中的 Spot-Checking 框架 3. 現場檢查分類 4. 現場檢查回歸 5. 框架擴展 ## 1.抽樣檢查算法我們事先無法知道哪些算法在給定的預測建模問題上表現良好。這是應用機器學習的[難點部分，只能通過系統實驗來解決。](https://machinelearningmastery.com/applied-machine-learning-is-hard/) [抽查](https://machinelearningmastery.com/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems/)是解決這個問題的方法。它涉及針對問題快速測試大量不同的機器學習算法，以便快速發現哪些算法可能起作用以及在哪里集中注意力。 * **速度很快**;它繞過準備和分析的幾天或幾周，并使用可能不會導致結果的算法。 * **它是客觀的**，允許您發現什么可能適用于問題，而不是使用您上次使用的。 * **得到結果**;您將實際擬合模型，進行預測并了解您的問題是否可以預測以及基線技能可能是什么樣子。抽樣檢查可能需要您使用數據集的一小部分樣本才能快速轉換結果。最后，現場檢查的結果是一個起點。一個起點。他們建議將注意力集中在問題上，而不是最佳算法。該過程旨在讓您擺脫典型的思考和分析，轉而關注結果。您可以在帖子中了解有關抽查的更多信息： * [為什么你應該在機器學習問題上進行抽樣檢查算法](https://machinelearningmastery.com/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems/) 現在我們知道了什么是現場檢查，讓我們看看如何在 Python 中系統地執行抽樣檢查。 ## 2\. Python 中的 Spot-Checking 框架在本節中，我們將構建一個腳本框架，該框架可用于在分類或回歸問題上對機器學習算法進行抽樣檢查。我們需要開發框架有四個部分;他們是： * 加載數據集 * 定義模型 * 評估模型 * 總結結果讓我們依次看看每一個。 ### 加載數據集框架的第一步是加載數據。必須針對給定問題實現該功能，并專門針對該問題。它可能涉及從一個或多個 CSV 文件加載數據。我們將調用此函數 _load_data（）_;它不需要參數并返回輸入（ _X_ ）和輸出（ _y_ ）用于預測問題。 ``` # load the dataset, returns X and y elements def load_dataset(): X, y = None, None return X, y ``` ### 定義模型下一步是定義模型以評估預測建模問題。定義的模型將特定于類型預測建模問題，例如，分類或回歸。定義的模型應該是多樣的，包括以下的混合： * 線性模型。 * 非線性模型。 * 合奏模型。每個模型應該是一個很好的機會，可以很好地解決問題。這可能意味著提供模型的一些變體，使用不同的常見或眾所周知的配置，平均表現良好。我們將調用此函數 _define_models（）_。它將返回映射到 scikit-learn 模型對象的模型名稱字典。名稱應該很短，例如' _svm_ '，并且可以包括配置細節，例如“KNN-7”。該函數還將字典作為可選參數;如果未提供，則創建并填充新字典。如果提供了字典，則會向其添加模型。如果您希望使用多個函數來定義模型，或者添加具有不同配置的特定類型的大量模型，則可以增加靈活性。 ``` # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # ... return models ``` 我們的想法不是網格搜索模型參數;那可以晚點來。相反，每個模型應該有機會表現良好（即不是最佳）。這可能意味著在某些情況下嘗試許多參數組合，例如在梯度增強的情況下。 ### 評估模型下一步是評估已加載數據集上的已定義模型。 scikit-learn 庫提供了在評估期間管道模型的能力。這允許在用于擬合模型之前變換數據，并且這以正確的方式完成，使得變換在訓練數據上準備并應用于測試數據。我們可以定義一個函數，在評估之前準備給定的模型，以允許在抽樣檢查過程中使用特定的變換。它們將以一攬子方式對所有模型進行。這對于執行標準化，規范化和特征選擇等操作非常有用。我們將定義一個名為 _make_pipeline（）_ 的函數，它接受一個已定義的模型并返回一個管道。下面是準備管道的示例，該管道將首先標準化輸入數據，然后在擬合模型之前對其進行標準化。 ``` # create a feature preparation pipeline for a model def make_pipeline(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline ``` 此函數可以擴展為添加其他變換，或者簡化為返回提供的模型而不進行變換。現在我們需要評估準備好的模型。我們將使用 k-fold 交叉驗證評估模型的標準。對每個定義的模型的評估將產生結果列表。這是因為該模型的 10 個不同版本將被擬合和評估，從而得到 k 分數列表。我們將定義一個名為 _evaluate_model（）_ 的函數，該函數將獲取數據，定義的模型，多個折疊以及用于評估結果的表現指標。它將返回分數列表。該函數調用 _make_pipeline（）_ 為定義的模型準備所需的任何數據變換，然后調用 [cross_val_score（）](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html) scikit-learn 函數。重要的是， _n_jobs_ 參數設置為-1，以允許模型評估并行發生，從而利用硬件上可用的核心數量。 ``` # evaluate a single model def evaluate_model(X, y, model, folds, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) return scores ``` 模型的評估可能會因異常而失敗。我已經看到了這一點，特別是在 statsmodels 庫中的一些模型的情況下。評估模型也可能產生大量警告信息。我已經看到了這一點，特別是在使用 XGBoost 模型的情況下。在抽查時我們不關心異常或警告。我們只想知道哪些有效，哪些有效。因此，我們可以在評估每個模型時捕獲異常并忽略所有警告。名為 _robust_evaluate_model（）_ 的函數實現了此行為。 _evaluate_model（）_ 的調用方式是捕獲異常并忽略警告。如果發生異常并且給定模型無法得到結果，則返回 _ 無 _ 結果。 ``` # evaluate a model and try to trap errors and and hide warnings def robust_evaluate_model(X, y, model, folds, metric): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, metric) except: scores = None return scores ``` 最后，我們可以定義頂級函數來評估已定義模型的列表。我們將定義一個名為 _evaluate_models（）_ 的函數，它將模型字典作為參數，并將模型名稱字典返回到結果列表。交叉驗證過程中的折疊數可以由默認為 10 的可選參數指定。根據模型的預測計算的度量也可以由可選參數指定，默認為分類精度。有關支持的指標的完整列表，請參閱此列表： * [評分參數：定義模型評估規則，scikit-learn](http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter) 。跳過任何無結果，不會將其添加到結果字典中。重要的是，我們提供了一些詳細的輸出，總結了每個模型評估后的平均值和標準差。如果數據集上的抽樣檢查過程需要幾分鐘到幾小時，這將非常有用。 ``` # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric) # show process if scores is not None: # store a result results[name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (name, mean_score, std_score)) else: print('>%s: error' % name) return results ``` 請注意，如果由于某種原因您想要查看警告和錯誤，您可以更新 _evaluate_models（）_ 以直接調用 _evaluate_model（）_ 函數，繞過強大的錯誤處理。在測試靜默失敗的新方法或方法配置時，我發現這很有用。 ### 總結結果最后，我們可以評估結果。真的，我們只想知道哪些算法表現良好。總結結果的兩種有用方法是： 1. 排名前 10 位的算法的平均值和標準差的線摘要。 2. 前 10 名執行算法的框和胡須圖。線條摘要快速而精確，但假設表現良好的高斯分布，這可能不合理。盒子和須狀圖假設沒有分布，并提供了一種直觀的方法，可以直接比較模型的分數在中位數表現和分數差異方面的分布。我們將定義一個名為 _summarize_results（）_ 的函數，該函數獲取結果字典，打印結果摘要，并創建保存到文件的 boxplot 圖像。該函數接受一個參數來指定評估得分是否最大化，默認情況下為 _True_ 。要匯總的結果數也可以作為可選參數提供，默認為 10。該功能首先在打印摘要和創建框和須圖之前對得分進行排序。 ``` # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') ``` 現在我們已經專門設計了一個用于 Python 中的抽樣算法的框架，讓我們看一下如何將它應用于分類問題。 ## 3.現場檢查分類我們將使用 [make_classification（）函數](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html)生成二元分類問題。該函數將生成 1,000 個樣本，包含 20 個變量，一些冗余變量和兩個類。 ``` # load the dataset, returns X and y elements def load_dataset(): return make_classification(n_samples=1000, n_classes=2, random_state=1) ``` 作為分類問題，我們將嘗試一套分類算法，具體來說： ### 線性算法 * Logistic 回歸 * 嶺回歸 * 隨機梯度下降分類器 * 被動攻擊性分類器我嘗試了 LDA 和 QDA，但他們遺憾地在某處的 C 代碼中崩潰了。 ### 非線性算法 * k-最近鄰居 * 分類和回歸樹 * 額外的樹 * 支持向量機 * 樸素貝葉斯 ### 集合算法 * AdaBoost 的 * 袋裝決策樹 * 隨機森林 * 額外的樹木 * 梯度增壓機此外，我為一些算法添加了多種配置，如 Ridge，kNN 和 SVM，以便為他們提供很好的解決問題的機會。下面列出了完整的 _define_models（）_ 函數。 ``` # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # linear models models['logistic'] = LogisticRegression() alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['ridge-'+str(a)] = RidgeClassifier(alpha=a) models['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3) models['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3) # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k) models['cart'] = DecisionTreeClassifier() models['extra'] = ExtraTreeClassifier() models['svml'] = SVC(kernel='linear') models['svmp'] = SVC(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVC(C=c) models['bayes'] = GaussianNB() # ensemble models n_trees = 100 models['ada'] = AdaBoostClassifier(n_estimators=n_trees) models['bag'] = BaggingClassifier(n_estimators=n_trees) models['rf'] = RandomForestClassifier(n_estimators=n_trees) models['et'] = ExtraTreesClassifier(n_estimators=n_trees) models['gbm'] = GradientBoostingClassifier(n_estimators=n_trees) print('Defined %d models' % len(models)) return models ``` 而已;我們現在準備好檢查問題的算法。下面列出了完整的示例。 ``` # binary classification spot check script import warnings from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression from sklearn.linear_model import RidgeClassifier from sklearn.linear_model import SGDClassifier from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.tree import ExtraTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import BaggingClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier # load the dataset, returns X and y elements def load_dataset(): return make_classification(n_samples=1000, n_classes=2, random_state=1) # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # linear models models['logistic'] = LogisticRegression() alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['ridge-'+str(a)] = RidgeClassifier(alpha=a) models['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3) models['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3) # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k) models['cart'] = DecisionTreeClassifier() models['extra'] = ExtraTreeClassifier() models['svml'] = SVC(kernel='linear') models['svmp'] = SVC(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVC(C=c) models['bayes'] = GaussianNB() # ensemble models n_trees = 100 models['ada'] = AdaBoostClassifier(n_estimators=n_trees) models['bag'] = BaggingClassifier(n_estimators=n_trees) models['rf'] = RandomForestClassifier(n_estimators=n_trees) models['et'] = ExtraTreesClassifier(n_estimators=n_trees) models['gbm'] = GradientBoostingClassifier(n_estimators=n_trees) print('Defined %d models' % len(models)) return models # create a feature preparation pipeline for a model def make_pipeline(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # evaluate a single model def evaluate_model(X, y, model, folds, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) return scores # evaluate a model and try to trap errors and and hide warnings def robust_evaluate_model(X, y, model, folds, metric): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, metric) except: scores = None return scores # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric) # show process if scores is not None: # store a result results[name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (name, mean_score, std_score)) else: print('>%s: error' % name) return results # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') # load dataset X, y = load_dataset() # get model list models = define_models() # evaluate models results = evaluate_models(X, y, models) # summarize results summarize_results(results) ``` 運行該示例為每個評估模型打印一行，結束對問題的前 10 個執行算法的摘要。我們可以看到決策樹的集合對這個問題表現最好。這表明了一些事情： * 決策樹的集合可能是集中注意力的好地方。 * 如果進一步調整，梯度提升可能會很好。 * 該問題的“良好”表現是準確度約為 86％。 * 嶺回歸的相對較高的表現表明需要進行特征選擇。 ``` ... >bag: 0.862 (+/-0.034) >rf: 0.865 (+/-0.033) >et: 0.858 (+/-0.035) >gbm: 0.867 (+/-0.044) Rank=1, Name=gbm, Score=0.867 (+/- 0.044) Rank=2, Name=rf, Score=0.865 (+/- 0.033) Rank=3, Name=bag, Score=0.862 (+/- 0.034) Rank=4, Name=et, Score=0.858 (+/- 0.035) Rank=5, Name=ada, Score=0.850 (+/- 0.035) Rank=6, Name=ridge-0.9, Score=0.848 (+/- 0.038) Rank=7, Name=ridge-0.8, Score=0.848 (+/- 0.038) Rank=8, Name=ridge-0.7, Score=0.848 (+/- 0.038) Rank=9, Name=ridge-0.6, Score=0.848 (+/- 0.038) Rank=10, Name=ridge-0.5, Score=0.848 (+/- 0.038) ``` 還創建了一個盒子和胡須圖，以總結前 10 個表現良好的算法的結果。該圖顯示了由決策樹集合組成的方法的高程。該情節強調了進一步關注這些方法將是個好主意的觀念。 ![Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem](https://img.kancloud.cn/19/9c/199cd5ae6a0cd2fd195edd7acb0bb088_640x480.jpg) 分類問題前 10 個抽樣檢驗算法的箱線圖如果這是一個真正的分類問題，我會跟進進一步的抽查，例如： * 使用各種不同的特征選擇方法進行抽查。 * 無需數據縮放方法的抽查。 * 使用 sklearn 或 XGBoost 中的梯度增強配置的課程網格進行抽查。接下來，我們將看到如何將框架應用于回歸問題。 ## 4.現場檢查回歸我們可以通過非常小的變化來探索回歸預測建模問題的相同框架。我們可以使用 [make_regression（）函數](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_regression.html#sklearn.datasets.make_regression)來生成一個人為的回歸問題，包括 1,000 個示例和 50 個特征，其中一些是冗余的。定義的 _load_dataset（）_ 功能如下所示。 ``` # load the dataset, returns X and y elements def load_dataset(): return make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=1) ``` 然后我們可以指定一個 _get_models（）_ 函數來定義一套回歸方法。 Scikit-learn 提供了廣泛的線性回歸方法，非常出色。并非所有這些都可能是您的問題所必需的。我建議使用最小的線性回歸和彈性網，后者有一套很好的 alpha 和 lambda 參數。不過，我們將測試有關此問題的全套方法，包括： ### Linear Algorithms * 線性回歸 * 套索回歸 * 嶺回歸 * 彈性網絡回歸 * 胡貝爾回歸 * LARS 回歸 * Lasso LARS 回歸 * 被動攻擊性回歸 * RANSAC 回歸量 * 隨機梯度下降回歸 * Theil 回歸 ### Nonlinear Algorithms * k-最近鄰居 * 分類和回歸樹 * 額外的樹 * 支持向量回歸 ### Ensemble Algorithms * AdaBoost 的 * 袋裝決策樹 * 隨機森林 * 額外的樹木 * 梯度增壓機完整的 _get_models（）_ 功能如下所示。 ``` # create a dict of standard models to evaluate {name:object} def get_models(models=dict()): # linear models models['lr'] = LinearRegression() alpha = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['lasso-'+str(a)] = Lasso(alpha=a) for a in alpha: models['ridge-'+str(a)] = Ridge(alpha=a) for a1 in alpha: for a2 in alpha: name = 'en-' + str(a1) + '-' + str(a2) models[name] = ElasticNet(a1, a2) models['huber'] = HuberRegressor() models['lars'] = Lars() models['llars'] = LassoLars() models['pa'] = PassiveAggressiveRegressor(max_iter=1000, tol=1e-3) models['ranscac'] = RANSACRegressor() models['sgd'] = SGDRegressor(max_iter=1000, tol=1e-3) models['theil'] = TheilSenRegressor() # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsRegressor(n_neighbors=k) models['cart'] = DecisionTreeRegressor() models['extra'] = ExtraTreeRegressor() models['svml'] = SVR(kernel='linear') models['svmp'] = SVR(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVR(C=c) # ensemble models n_trees = 100 models['ada'] = AdaBoostRegressor(n_estimators=n_trees) models['bag'] = BaggingRegressor(n_estimators=n_trees) models['rf'] = RandomForestRegressor(n_estimators=n_trees) models['et'] = ExtraTreesRegressor(n_estimators=n_trees) models['gbm'] = GradientBoostingRegressor(n_estimators=n_trees) print('Defined %d models' % len(models)) return models ``` 默認情況下，框架使用分類準確性作為評估模型預測的方法。這對回歸沒有意義，我們可以改變這對回歸更有意義的東西，例如均方誤差。我們可以通過在調用 _evaluate_models（）_ 函數時傳遞 _metric ='neg_mean_squared_error'_ 參數來做到這一點。 ``` # evaluate models results = evaluate_models(models, metric='neg_mean_squared_error') ``` 請注意，默認情況下，scikit-learn 會反轉錯誤分數，以便最大化而不是最小化。這就是為什么均方誤差為負，并在匯總時會有負號。因為分數被反轉，我們可以繼續假設我們在 _summarize_results（）_ 函數中最大化分數，并且不需要像我們在使用時所預期的那樣指定 _maximize = False_ 。錯誤指標。完整的代碼示例如下所示。 ``` # regression spot check script import warnings from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.datasets import make_regression from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import LinearRegression from sklearn.linear_model import Lasso from sklearn.linear_model import Ridge from sklearn.linear_model import ElasticNet from sklearn.linear_model import HuberRegressor from sklearn.linear_model import Lars from sklearn.linear_model import LassoLars from sklearn.linear_model import PassiveAggressiveRegressor from sklearn.linear_model import RANSACRegressor from sklearn.linear_model import SGDRegressor from sklearn.linear_model import TheilSenRegressor from sklearn.neighbors import KNeighborsRegressor from sklearn.tree import DecisionTreeRegressor from sklearn.tree import ExtraTreeRegressor from sklearn.svm import SVR from sklearn.ensemble import AdaBoostRegressor from sklearn.ensemble import BaggingRegressor from sklearn.ensemble import RandomForestRegressor from sklearn.ensemble import ExtraTreesRegressor from sklearn.ensemble import GradientBoostingRegressor # load the dataset, returns X and y elements def load_dataset(): return make_regression(n_samples=1000, n_features=50, noise=0.1, random_state=1) # create a dict of standard models to evaluate {name:object} def get_models(models=dict()): # linear models models['lr'] = LinearRegression() alpha = [0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['lasso-'+str(a)] = Lasso(alpha=a) for a in alpha: models['ridge-'+str(a)] = Ridge(alpha=a) for a1 in alpha: for a2 in alpha: name = 'en-' + str(a1) + '-' + str(a2) models[name] = ElasticNet(a1, a2) models['huber'] = HuberRegressor() models['lars'] = Lars() models['llars'] = LassoLars() models['pa'] = PassiveAggressiveRegressor(max_iter=1000, tol=1e-3) models['ranscac'] = RANSACRegressor() models['sgd'] = SGDRegressor(max_iter=1000, tol=1e-3) models['theil'] = TheilSenRegressor() # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsRegressor(n_neighbors=k) models['cart'] = DecisionTreeRegressor() models['extra'] = ExtraTreeRegressor() models['svml'] = SVR(kernel='linear') models['svmp'] = SVR(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVR(C=c) # ensemble models n_trees = 100 models['ada'] = AdaBoostRegressor(n_estimators=n_trees) models['bag'] = BaggingRegressor(n_estimators=n_trees) models['rf'] = RandomForestRegressor(n_estimators=n_trees) models['et'] = ExtraTreesRegressor(n_estimators=n_trees) models['gbm'] = GradientBoostingRegressor(n_estimators=n_trees) print('Defined %d models' % len(models)) return models # create a feature preparation pipeline for a model def make_pipeline(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # evaluate a single model def evaluate_model(X, y, model, folds, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) return scores # evaluate a model and try to trap errors and and hide warnings def robust_evaluate_model(X, y, model, folds, metric): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, metric) except: scores = None return scores # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric) # show process if scores is not None: # store a result results[name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (name, mean_score, std_score)) else: print('>%s: error' % name) return results # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') # load dataset X, y = load_dataset() # get model list models = get_models() # evaluate models results = evaluate_models(X, y, models, metric='neg_mean_squared_error') # summarize results summarize_results(results) ``` 運行該示例總結了所評估的每個模型的表現，然后打印出前 10 個表現良好的算法的表現。我們可以看到許多線性算法可能在這個問題上找到了相同的最優解。值得注意的是，那些表現良好的方法使用正則化作為一種??特征選擇，允許他們放大最佳解決方案。這將表明在對此問題進行建模時特征選擇的重要性，并且線性方法將成為關注的領域，至少目前是這樣。查看評估模型的打印分數還顯示了對此問題執行的非線性和集合算法的差異程度。 ``` ... >bag: -6118.084 (+/-1558.433) >rf: -6127.169 (+/-1594.392) >et: -5017.062 (+/-1037.673) >gbm: -2347.807 (+/-500.364) Rank=1, Name=lars, Score=-0.011 (+/- 0.001) Rank=2, Name=ranscac, Score=-0.011 (+/- 0.001) Rank=3, Name=lr, Score=-0.011 (+/- 0.001) Rank=4, Name=ridge-0.0, Score=-0.011 (+/- 0.001) Rank=5, Name=en-0.0-0.1, Score=-0.011 (+/- 0.001) Rank=6, Name=en-0.0-0.8, Score=-0.011 (+/- 0.001) Rank=7, Name=en-0.0-0.2, Score=-0.011 (+/- 0.001) Rank=8, Name=en-0.0-0.7, Score=-0.011 (+/- 0.001) Rank=9, Name=en-0.0-0.0, Score=-0.011 (+/- 0.001) Rank=10, Name=en-0.0-0.3, Score=-0.011 (+/- 0.001) ``` 在這種情況下，會創建一個盒子和胡須圖，而不是真正為結果分析增加價值。 ![Boxplot of top 10 Spot-Checking Algorithms on a Regression Problem](https://img.kancloud.cn/95/3c/953c6e0785be5e5095173c7801b32eee_640x480.jpg) 回歸問題前 10 個點檢算法的箱形圖 ## 5.框架擴展在本節中，我們將探討抽樣檢查框架的一些方便擴展。 ### 課程網格搜索梯度提升我發現自己使用 XGBoost 和梯度提升很多直接分類和回歸問題。因此，我喜歡在抽樣檢查時使用方法的標準配置參數的課程網格。下面是一個可以直接在現場檢查框架中使用的功能。 ``` # define gradient boosting models def define_gbm_models(models=dict(), use_xgb=True): # define config ranges rates = [0.001, 0.01, 0.1] trees = [50, 100] ss = [0.5, 0.7, 1.0] depth = [3, 7, 9] # add configurations for l in rates: for e in trees: for s in ss: for d in depth: cfg = [l, e, s, d] if use_xgb: name = 'xgb-' + str(cfg) models[name] = XGBClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) else: name = 'gbm-' + str(cfg) models[name] = GradientBoostingClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) print('Defined %d models' % len(models)) return models ``` 默認情況下，該函數將使用 XGBoost 模型，但如果函數的 _use_xgb_ 參數設置為 _False_ ，則可以使用 sklearn 梯度增強模型。同樣，我們并沒有嘗試在問題上優化調整 GBM，只是很快找到配置空間中可能值得進一步調查的區域。此功能可以直接用于分類和回歸問題，只需稍微改變“ _XGBClassifier_ ”到“ _XGBRegressor_ ”和“ _GradientBoostingClassifier_ ”到“[HTG6” ] GradientBoostingRegressor “。例如： ``` # define gradient boosting models def get_gbm_models(models=dict(), use_xgb=True): # define config ranges rates = [0.001, 0.01, 0.1] trees = [50, 100] ss = [0.5, 0.7, 1.0] depth = [3, 7, 9] # add configurations for l in rates: for e in trees: for s in ss: for d in depth: cfg = [l, e, s, d] if use_xgb: name = 'xgb-' + str(cfg) models[name] = XGBRegressor(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) else: name = 'gbm-' + str(cfg) models[name] = GradientBoostingXGBRegressor(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) print('Defined %d models' % len(models)) return models ``` 為了具體化，下面是更新的二元分類示例，以定義 XGBoost 模型。 ``` # binary classification spot check script import warnings from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression from sklearn.linear_model import RidgeClassifier from sklearn.linear_model import SGDClassifier from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.tree import ExtraTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import BaggingClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier from xgboost import XGBClassifier # load the dataset, returns X and y elements def load_dataset(): return make_classification(n_samples=1000, n_classes=2, random_state=1) # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # linear models models['logistic'] = LogisticRegression() alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['ridge-'+str(a)] = RidgeClassifier(alpha=a) models['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3) models['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3) # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k) models['cart'] = DecisionTreeClassifier() models['extra'] = ExtraTreeClassifier() models['svml'] = SVC(kernel='linear') models['svmp'] = SVC(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVC(C=c) models['bayes'] = GaussianNB() # ensemble models n_trees = 100 models['ada'] = AdaBoostClassifier(n_estimators=n_trees) models['bag'] = BaggingClassifier(n_estimators=n_trees) models['rf'] = RandomForestClassifier(n_estimators=n_trees) models['et'] = ExtraTreesClassifier(n_estimators=n_trees) models['gbm'] = GradientBoostingClassifier(n_estimators=n_trees) print('Defined %d models' % len(models)) return models # define gradient boosting models def define_gbm_models(models=dict(), use_xgb=True): # define config ranges rates = [0.001, 0.01, 0.1] trees = [50, 100] ss = [0.5, 0.7, 1.0] depth = [3, 7, 9] # add configurations for l in rates: for e in trees: for s in ss: for d in depth: cfg = [l, e, s, d] if use_xgb: name = 'xgb-' + str(cfg) models[name] = XGBClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) else: name = 'gbm-' + str(cfg) models[name] = GradientBoostingClassifier(learning_rate=l, n_estimators=e, subsample=s, max_depth=d) print('Defined %d models' % len(models)) return models # create a feature preparation pipeline for a model def make_pipeline(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # evaluate a single model def evaluate_model(X, y, model, folds, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) return scores # evaluate a model and try to trap errors and and hide warnings def robust_evaluate_model(X, y, model, folds, metric): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, metric) except: scores = None return scores # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric) # show process if scores is not None: # store a result results[name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (name, mean_score, std_score)) else: print('>%s: error' % name) return results # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') # load dataset X, y = load_dataset() # get model list models = define_models() # add gbm models models = define_gbm_models(models) # evaluate models results = evaluate_models(X, y, models) # summarize results summarize_results(results) ``` 運行該示例表明，確實有些 XGBoost 模型在問題上表現良好。 ``` ... >xgb-[0.1, 100, 1.0, 3]: 0.864 (+/-0.044) >xgb-[0.1, 100, 1.0, 7]: 0.865 (+/-0.036) >xgb-[0.1, 100, 1.0, 9]: 0.867 (+/-0.039) Rank=1, Name=xgb-[0.1, 50, 1.0, 3], Score=0.872 (+/- 0.039) Rank=2, Name=et, Score=0.869 (+/- 0.033) Rank=3, Name=xgb-[0.1, 50, 1.0, 9], Score=0.868 (+/- 0.038) Rank=4, Name=xgb-[0.1, 100, 1.0, 9], Score=0.867 (+/- 0.039) Rank=5, Name=xgb-[0.01, 50, 1.0, 3], Score=0.867 (+/- 0.035) Rank=6, Name=xgb-[0.1, 50, 1.0, 7], Score=0.867 (+/- 0.037) Rank=7, Name=xgb-[0.001, 100, 0.7, 9], Score=0.866 (+/- 0.040) Rank=8, Name=xgb-[0.01, 100, 1.0, 3], Score=0.866 (+/- 0.037) Rank=9, Name=xgb-[0.001, 100, 0.7, 3], Score=0.866 (+/- 0.034) Rank=10, Name=xgb-[0.01, 50, 0.7, 3], Score=0.866 (+/- 0.034) ``` ![Boxplot of top 10 Spot-Checking Algorithms on a Classification Problem with XGBoost](https://img.kancloud.cn/0d/13/0d13741b7040ed01285e417a2ec1bc2f_640x480.jpg) XGBoost 分類問題前 10 個抽樣檢驗算法的箱線圖 ### 重復評估上述結果也突出了評估的嘈雜性，例如：此次運行中額外樹木的結果與上面的運行不同（0.858 對 0.869）。我們使用 k 折交叉驗證來產生一個分數，但人口很少，計算的平均值會很吵。只要我們將抽樣檢查結果作為起點而不是算法對問題的明確結果，這就沒問題了。這很難做到;它需要從業者的紀律。或者，您可能希望調整框架，使模型評估方案更好地匹配您打算用于特定問題的模型評估方案。例如，在評估諸如袋裝或增強決策樹之類的隨機算法時，最好在相同的訓練/測試集（稱為重復）上多次運行每個實驗，以便考慮學習算法的隨機性質。我們可以更新 _evaluate_model（）_ 函數來重復給定模型的 n 次評估，每次都有不同的數據分割，然后返回所有分數。例如，10 次交叉驗證的三次重復將導致每次 30 分，以計算模型的平均表現。 ``` # evaluate a single model def evaluate_model(X, y, model, folds, repeats, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = list() # repeat model evaluation n times for _ in range(repeats): # perform run scores_r = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) # add scores to list scores += scores_r.tolist() return scores ``` 或者，您可能更喜歡從每個 k 倍交叉驗證運行計算平均分數，然后計算所有運行的平均值，如下所述： * [如何評估深度學習模型的技巧](https://machinelearningmastery.com/evaluate-skill-deep-learning-models/) 然后我們可以更新 _robust_evaluate_model（）_ 函數來傳遞重復參數和 _evaluate_models（）_ 函數來定義默認值，例如 3。下面列出了具有三次重復模型評估的二元分類示例的完整示例。 ``` # binary classification spot check script import warnings from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression from sklearn.linear_model import RidgeClassifier from sklearn.linear_model import SGDClassifier from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.tree import ExtraTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import BaggingClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier # load the dataset, returns X and y elements def load_dataset(): return make_classification(n_samples=1000, n_classes=2, random_state=1) # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # linear models models['logistic'] = LogisticRegression() alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['ridge-'+str(a)] = RidgeClassifier(alpha=a) models['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3) models['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3) # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k) models['cart'] = DecisionTreeClassifier() models['extra'] = ExtraTreeClassifier() models['svml'] = SVC(kernel='linear') models['svmp'] = SVC(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVC(C=c) models['bayes'] = GaussianNB() # ensemble models n_trees = 100 models['ada'] = AdaBoostClassifier(n_estimators=n_trees) models['bag'] = BaggingClassifier(n_estimators=n_trees) models['rf'] = RandomForestClassifier(n_estimators=n_trees) models['et'] = ExtraTreesClassifier(n_estimators=n_trees) models['gbm'] = GradientBoostingClassifier(n_estimators=n_trees) print('Defined %d models' % len(models)) return models # create a feature preparation pipeline for a model def make_pipeline(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # evaluate a single model def evaluate_model(X, y, model, folds, repeats, metric): # create the pipeline pipeline = make_pipeline(model) # evaluate model scores = list() # repeat model evaluation n times for _ in range(repeats): # perform run scores_r = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) # add scores to list scores += scores_r.tolist() return scores # evaluate a model and try to trap errors and hide warnings def robust_evaluate_model(X, y, model, folds, repeats, metric): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, repeats, metric) except: scores = None return scores # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, folds=10, repeats=3, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, repeats, metric) # show process if scores is not None: # store a result results[name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (name, mean_score, std_score)) else: print('>%s: error' % name) return results # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') # load dataset X, y = load_dataset() # get model list models = define_models() # evaluate models results = evaluate_models(X, y, models) # summarize results summarize_results(results) ``` 運行該示例可以更準確地估計分數。 ``` ... >bag: 0.861 (+/-0.037) >rf: 0.859 (+/-0.036) >et: 0.869 (+/-0.035) >gbm: 0.867 (+/-0.044) Rank=1, Name=et, Score=0.869 (+/- 0.035) Rank=2, Name=gbm, Score=0.867 (+/- 0.044) Rank=3, Name=bag, Score=0.861 (+/- 0.037) Rank=4, Name=rf, Score=0.859 (+/- 0.036) Rank=5, Name=ada, Score=0.850 (+/- 0.035) Rank=6, Name=ridge-0.9, Score=0.848 (+/- 0.038) Rank=7, Name=ridge-0.8, Score=0.848 (+/- 0.038) Rank=8, Name=ridge-0.7, Score=0.848 (+/- 0.038) Rank=9, Name=ridge-0.6, Score=0.848 (+/- 0.038) Rank=10, Name=ridge-0.5, Score=0.848 (+/- 0.038) ``` 報告的方法仍然存在一些差異，但不到一次 k-fold 交叉驗證。可以增加重復次數以進一步減少這種變化，代價是運行時間較長，并且可能違背抽樣檢查的意圖。 ### 各種輸入表示在擬合模型之前，我非常喜歡避免對數據表示的假設和建議。相反，我也想檢查輸入數據的多個表示和變換，我將其稱為視圖。我在帖子中解釋了這個： * [如何充分利用機器學習數據](https://machinelearningmastery.com/how-to-get-the-most-from-your-machine-learning-data/) 我們可以更新框架，以便對每個模型的多個不同表示進行抽查。一種方法是更新 _evaluate_models（）_ 函數，以便我們可以提供可用于每個已定義模型的 _make_pipeline（）_ 函數列表。 ``` # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, pipe_funcs, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate model under each preparation function for i in range(len(pipe_funcs)): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric, pipe_funcs[i]) # update name run_name = str(i) + name # show process if scores is not None: # store a result results[run_name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (run_name, mean_score, std_score)) else: print('>%s: error' % run_name) return results ``` 然后，可以將所選擇的流水線函數向下傳遞給 _robust_evaluate_model（）_ 函數以及可以使用它的 _evaluate_model（）_ 函數。然后我們可以定義一堆不同的管道函數;例如： ``` # no transforms pipeline def pipeline_none(model): return model # standardize transform pipeline def pipeline_standardize(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # normalize transform pipeline def pipeline_normalize(model): steps = list() # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # standardize and normalize pipeline def pipeline_std_norm(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline ``` 然后創建一個可以提供給 _evaluate_models（）_ 函數的函數名列表。 ``` # define transform pipelines pipelines = [pipeline_none, pipeline_standardize, pipeline_normalize, pipeline_std_norm] ``` 下面列出了更新為檢查管道變換的分類案例的完整示例。 ``` # binary classification spot check script import warnings from numpy import mean from numpy import std from matplotlib import pyplot from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import MinMaxScaler from sklearn.pipeline import Pipeline from sklearn.linear_model import LogisticRegression from sklearn.linear_model import RidgeClassifier from sklearn.linear_model import SGDClassifier from sklearn.linear_model import PassiveAggressiveClassifier from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.tree import ExtraTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import AdaBoostClassifier from sklearn.ensemble import BaggingClassifier from sklearn.ensemble import RandomForestClassifier from sklearn.ensemble import ExtraTreesClassifier from sklearn.ensemble import GradientBoostingClassifier # load the dataset, returns X and y elements def load_dataset(): return make_classification(n_samples=1000, n_classes=2, random_state=1) # create a dict of standard models to evaluate {name:object} def define_models(models=dict()): # linear models models['logistic'] = LogisticRegression() alpha = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for a in alpha: models['ridge-'+str(a)] = RidgeClassifier(alpha=a) models['sgd'] = SGDClassifier(max_iter=1000, tol=1e-3) models['pa'] = PassiveAggressiveClassifier(max_iter=1000, tol=1e-3) # non-linear models n_neighbors = range(1, 21) for k in n_neighbors: models['knn-'+str(k)] = KNeighborsClassifier(n_neighbors=k) models['cart'] = DecisionTreeClassifier() models['extra'] = ExtraTreeClassifier() models['svml'] = SVC(kernel='linear') models['svmp'] = SVC(kernel='poly') c_values = [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0] for c in c_values: models['svmr'+str(c)] = SVC(C=c) models['bayes'] = GaussianNB() # ensemble models n_trees = 100 models['ada'] = AdaBoostClassifier(n_estimators=n_trees) models['bag'] = BaggingClassifier(n_estimators=n_trees) models['rf'] = RandomForestClassifier(n_estimators=n_trees) models['et'] = ExtraTreesClassifier(n_estimators=n_trees) models['gbm'] = GradientBoostingClassifier(n_estimators=n_trees) print('Defined %d models' % len(models)) return models # no transforms pipeline def pipeline_none(model): return model # standardize transform pipeline def pipeline_standardize(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # normalize transform pipeline def pipeline_normalize(model): steps = list() # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # standardize and normalize pipeline def pipeline_std_norm(model): steps = list() # standardization steps.append(('standardize', StandardScaler())) # normalization steps.append(('normalize', MinMaxScaler())) # the model steps.append(('model', model)) # create pipeline pipeline = Pipeline(steps=steps) return pipeline # evaluate a single model def evaluate_model(X, y, model, folds, metric, pipe_func): # create the pipeline pipeline = pipe_func(model) # evaluate model scores = cross_val_score(pipeline, X, y, scoring=metric, cv=folds, n_jobs=-1) return scores # evaluate a model and try to trap errors and and hide warnings def robust_evaluate_model(X, y, model, folds, metric, pipe_func): scores = None try: with warnings.catch_warnings(): warnings.filterwarnings("ignore") scores = evaluate_model(X, y, model, folds, metric, pipe_func) except: scores = None return scores # evaluate a dict of models {name:object}, returns {name:score} def evaluate_models(X, y, models, pipe_funcs, folds=10, metric='accuracy'): results = dict() for name, model in models.items(): # evaluate model under each preparation function for i in range(len(pipe_funcs)): # evaluate the model scores = robust_evaluate_model(X, y, model, folds, metric, pipe_funcs[i]) # update name run_name = str(i) + name # show process if scores is not None: # store a result results[run_name] = scores mean_score, std_score = mean(scores), std(scores) print('>%s: %.3f (+/-%.3f)' % (run_name, mean_score, std_score)) else: print('>%s: error' % run_name) return results # print and plot the top n results def summarize_results(results, maximize=True, top_n=10): # check for no results if len(results) == 0: print('no results') return # determine how many results to summarize n = min(top_n, len(results)) # create a list of (name, mean(scores)) tuples mean_scores = [(k,mean(v)) for k,v in results.items()] # sort tuples by mean score mean_scores = sorted(mean_scores, key=lambda x: x[1]) # reverse for descending order (e.g. for accuracy) if maximize: mean_scores = list(reversed(mean_scores)) # retrieve the top n for summarization names = [x[0] for x in mean_scores[:n]] scores = [results[x[0]] for x in mean_scores[:n]] # print the top n print() for i in range(n): name = names[i] mean_score, std_score = mean(results[name]), std(results[name]) print('Rank=%d, Name=%s, Score=%.3f (+/- %.3f)' % (i+1, name, mean_score, std_score)) # boxplot for the top n pyplot.boxplot(scores, labels=names) _, labels = pyplot.xticks() pyplot.setp(labels, rotation=90) pyplot.savefig('spotcheck.png') # load dataset X, y = load_dataset() # get model list models = define_models() # define transform pipelines pipelines = [pipeline_none, pipeline_standardize, pipeline_normalize, pipeline_std_norm] # evaluate models results = evaluate_models(X, y, models, pipelines) # summarize results summarize_results(results) ``` 運行該示例表明，我們通過將管道號添加到算法描述名稱的開頭來區分每個管道的結果，例如， ' _0rf_ '表示第一個管道的 RF，沒有變換。樹算法的集合在這個問題上表現良好，并且這些算法對于數據縮放是不變的。這意味著它們在每個管道上的結果將是相似的（或相同的），反過來它們將擠出前 10 個列表中的其他算法。 ``` ... >0gbm: 0.865 (+/-0.044) >1gbm: 0.865 (+/-0.044) >2gbm: 0.865 (+/-0.044) >3gbm: 0.865 (+/-0.044) Rank=1, Name=3rf, Score=0.870 (+/- 0.034) Rank=2, Name=2rf, Score=0.870 (+/- 0.034) Rank=3, Name=1rf, Score=0.870 (+/- 0.034) Rank=4, Name=0rf, Score=0.870 (+/- 0.034) Rank=5, Name=3bag, Score=0.866 (+/- 0.039) Rank=6, Name=2bag, Score=0.866 (+/- 0.039) Rank=7, Name=1bag, Score=0.866 (+/- 0.039) Rank=8, Name=0bag, Score=0.866 (+/- 0.039) Rank=9, Name=3gbm, Score=0.865 (+/- 0.044) Rank=10, Name=2gbm, Score=0.865 (+/- 0.044) ``` ## 進一步閱讀如果您希望深入了解，本節將提供有關該主題的更多資源。 * [為什么你應該在機器學習問題上進行抽樣檢查算法](https://machinelearningmastery.com/why-you-should-be-spot-checking-algorithms-on-your-machine-learning-problems/) * [使用 scikit-learn](https://machinelearningmastery.com/spot-check-classification-machine-learning-algorithms-python-scikit-learn/) 在 Python 中進行 Spot-Check 分類機器學習算法 * [使用 scikit-learn](https://machinelearningmastery.com/spot-check-regression-machine-learning-algorithms-python-scikit-learn/) 在 Python 中使用 Spot 檢查回歸機器學習算法 * [如何評估深度學習模型的技巧](https://machinelearningmastery.com/evaluate-skill-deep-learning-models/) * [為什么應用機器學習很難](https://machinelearningmastery.com/applied-machine-learning-is-hard/) * [應用機器學習作為搜索問題的溫和介紹](https://machinelearningmastery.com/applied-machine-learning-as-a-search-problem/) * [如何充分利用機器學習數據](https://machinelearningmastery.com/how-to-get-the-most-from-your-machine-learning-data/) ## 摘要在本教程中，您發現了點檢查算法對新預測建模問題的有用性，以及如何為 python 中的點檢查算法開發標準框架以用于分類和回歸問題。具體來說，你學到了： * 抽樣檢查提供了一種快速發現在預測建模問題上表現良好的算法類型的方法。 * 如何開發用于加載數據，定義模型，評估模型和總結結果的通用框架。 * 如何應用框架進行分類和回歸問題。您是否使用過這個框架，或者您是否有進一步的建議來改進它？請在評論中告訴我。你有任何問題嗎？在下面的評論中提出您的問題，我會盡力回答。