3.3. 模型評估: 量化預測的質量 · sklearn中文文檔

# 3.3. 模型評估: 量化預測的質量校驗者: [@颶風](https://github.com/apachecn/scikit-learn-doc-zh) [@小瑤](https://github.com/apachecn/scikit-learn-doc-zh) [@FAME](https://github.com/apachecn/scikit-learn-doc-zh) [@v](https://github.com/apachecn/scikit-learn-doc-zh) 翻譯者: [@小瑤](https://github.com/apachecn/scikit-learn-doc-zh) [@片刻](https://github.com/apachecn/scikit-learn-doc-zh) [@那伊抹微笑](https://github.com/apachecn/scikit-learn-doc-zh) 有 3 種不同的 API 用于評估模型預測的質量: - **Estimator score method（估計器得分的方法）**: Estimators（估計器）有一個 `score（得分）` 方法，為其解決的問題提供了默認的 evaluation criterion （評估標準）。在這個頁面上沒有相關討論，但是在每個 estimator （估計器）的文檔中會有相關的討論。 - **Scoring parameter（評分參數）**: Model-evaluation tools （模型評估工具）使用 [cross-validation](cross_validation.html#cross-validation) (如 [`model_selection.cross_val_score`](generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score "sklearn.model_selection.cross_val_score") 和 [`model_selection.GridSearchCV`](generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV "sklearn.model_selection.GridSearchCV")) 依靠 internal *scoring* strategy （內部 *scoring（得分）* 策略）。這在 [scoring 參數: 定義模型評估規則](#scoring-parameter) 部分討論。 - **Metric functions（指標函數）**: `metrics` 模塊實現了針對特定目的評估預測誤差的函數。這些指標在以下部分部分詳細介紹 [分類指標](#classification-metrics), [多標簽排名指標](#multilabel-ranking-metrics), [回歸指標](#regression-metrics) 和 [聚類指標](#clustering-metrics) 。最后， [虛擬估計](#dummy-estimators) 用于獲取隨機預測的這些指標的基準值。 See also 對于 “pairwise（成對）” metrics（指標），*samples（樣本）* 之間而不是 estimators （估計量）或者 predictions（預測值），請參閱 [成對的矩陣, 類別和核函數](metrics.html#metrics) 部分。 ## 3.3.1. `scoring` 參數: 定義模型評估規則 Model selection （模型選擇）和 evaluation （評估）使用工具，例如 [`model_selection.GridSearchCV`](generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV "sklearn.model_selection.GridSearchCV") 和 [`model_selection.cross_val_score`](generated/sklearn.model_selection.cross_val_score.html#sklearn.model_selection.cross_val_score "sklearn.model_selection.cross_val_score") ，采用 `scoring` 參數來控制它們對 estimators evaluated （評估的估計量）應用的指標。 ### 3.3.1.1. 常見場景: 預定義值對于最常見的用例, 您可以使用 `scoring` 參數指定一個 scorer object （記分對象）; 下表顯示了所有可能的值。所有 scorer objects （記分對象）遵循慣例 **higher return values are better than lower return values（較高的返回值優于較低的返回值）** 。因此，測量模型和數據之間距離的 metrics （度量），如 [`metrics.mean_squared_error`](generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error "sklearn.metrics.mean_squared_error") 可用作返回 metric （指數）的 negated value （否定值）的 neg\_mean\_squared\_error 。 Scoring（得分）Function（函數）Comment（注解）**Classification（分類）** ‘accuracy’[`metrics.accuracy_score`](generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score "sklearn.metrics.accuracy_score") ‘average\_precision’[`metrics.average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score") ‘f1’[`metrics.f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")for binary targets（用于二進制目標）‘f1\_micro’[`metrics.f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")micro-averaged（微平均）‘f1\_macro’[`metrics.f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")macro-averaged（微平均）‘f1\_weighted’[`metrics.f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")weighted average（加權平均）‘f1\_samples’[`metrics.f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")by multilabel sample（通過 multilabel 樣本）‘neg\_log\_loss’[`metrics.log_loss`](generated/sklearn.metrics.log_loss.html#sklearn.metrics.log_loss "sklearn.metrics.log_loss")requires `predict_proba` support（需要 `predict_proba` 支持）‘precision’ etc.[`metrics.precision_score`](generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score "sklearn.metrics.precision_score")suffixes apply as with ‘f1’（后綴適用于 ‘f1’）‘recall’ etc.[`metrics.recall_score`](generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score "sklearn.metrics.recall_score")suffixes apply as with ‘f1’（后綴適用于 ‘f1’）‘roc\_auc’[`metrics.roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score") **Clustering（聚類）** ‘adjusted\_mutual\_info\_score’[`metrics.adjusted_mutual_info_score`](generated/sklearn.metrics.adjusted_mutual_info_score.html#sklearn.metrics.adjusted_mutual_info_score "sklearn.metrics.adjusted_mutual_info_score") ‘adjusted\_rand\_score’[`metrics.adjusted_rand_score`](generated/sklearn.metrics.adjusted_rand_score.html#sklearn.metrics.adjusted_rand_score "sklearn.metrics.adjusted_rand_score") ‘completeness\_score’[`metrics.completeness_score`](generated/sklearn.metrics.completeness_score.html#sklearn.metrics.completeness_score "sklearn.metrics.completeness_score") ‘fowlkes\_mallows\_score’[`metrics.fowlkes_mallows_score`](generated/sklearn.metrics.fowlkes_mallows_score.html#sklearn.metrics.fowlkes_mallows_score "sklearn.metrics.fowlkes_mallows_score") ‘homogeneity\_score’[`metrics.homogeneity_score`](generated/sklearn.metrics.homogeneity_score.html#sklearn.metrics.homogeneity_score "sklearn.metrics.homogeneity_score") ‘mutual\_info\_score’[`metrics.mutual_info_score`](generated/sklearn.metrics.mutual_info_score.html#sklearn.metrics.mutual_info_score "sklearn.metrics.mutual_info_score") ‘normalized\_mutual\_info\_score’[`metrics.normalized_mutual_info_score`](generated/sklearn.metrics.normalized_mutual_info_score.html#sklearn.metrics.normalized_mutual_info_score "sklearn.metrics.normalized_mutual_info_score") ‘v\_measure\_score’[`metrics.v_measure_score`](generated/sklearn.metrics.v_measure_score.html#sklearn.metrics.v_measure_score "sklearn.metrics.v_measure_score") **Regression（回歸）** ‘explained\_variance’[`metrics.explained_variance_score`](generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score "sklearn.metrics.explained_variance_score") ‘neg\_mean\_absolute\_error’[`metrics.mean_absolute_error`](generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error "sklearn.metrics.mean_absolute_error") ‘neg\_mean\_squared\_error’[`metrics.mean_squared_error`](generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error "sklearn.metrics.mean_squared_error") ‘neg\_mean\_squared\_log\_error’[`metrics.mean_squared_log_error`](generated/sklearn.metrics.mean_squared_log_error.html#sklearn.metrics.mean_squared_log_error "sklearn.metrics.mean_squared_log_error") ‘neg\_median\_absolute\_error’[`metrics.median_absolute_error`](generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error "sklearn.metrics.median_absolute_error") ‘r2’[`metrics.r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score") 使用案例: ``` >>> from sklearn import svm, datasets >>> from sklearn.model_selection import cross_val_score >>> iris = datasets.load_iris() >>> X, y = iris.data, iris.target >>> clf = svm.SVC(probability=True, random_state=0) >>> cross_val_score(clf, X, y, scoring='neg_log_loss') array([-0.07..., -0.16..., -0.06...]) >>> model = svm.SVC() >>> cross_val_score(model, X, y, scoring='wrong_choice') Traceback (most recent call last): ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score'] ``` Note ValueError exception 列出的值對應于以下部分描述的 functions measuring prediction accuracy （測量預測精度的函數）。這些函數的 scorer objects （記分對象）存儲在 dictionary `sklearn.metrics.SCORERS` 中。 ### 3.3.1.2. 根據 metric 函數定義您的評分策略模塊 [`sklearn.metrics`](classes.html#module-sklearn.metrics "sklearn.metrics") 還公開了一組 measuring a prediction error （測量預測誤差）的簡單函數，給出了基礎真實的數據和預測: - 函數以 `_score` 結尾返回一個值來最大化，越高越好。 - 函數 `_error` 或 `_loss` 結尾返回一個值來 minimize （最小化），越低越好。當使用 [`make_scorer`](generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer "sklearn.metrics.make_scorer") 轉換成 scorer object （記分對象）時，將 `greater_is_better` 參數設置為 False（默認為 True; 請參閱下面的參數說明）。可用于各種機器學習任務的 Metrics （指標）在下面詳細介紹。許多 metrics （指標）沒有被用作 `scoring（得分）` 值的名稱，有時是因為它們需要額外的參數，例如 [`fbeta_score`](generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score "sklearn.metrics.fbeta_score") 。在這種情況下，您需要生成一個適當的 scoring object （評分對象）。生成 callable object for scoring （可評估對象進行評分）的最簡單方法是使用 [`make_scorer`](generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer "sklearn.metrics.make_scorer") 。該函數將 metrics （指數）轉換為可用于可調用的 model evaluation （模型評估）。一個典型的用例是從庫中包含一個非默認值參數的 existing metric function （現有指數函數），例如 [`fbeta_score`](generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score "sklearn.metrics.fbeta_score") 函數的 `beta` 參數: ``` >>> from sklearn.metrics import fbeta_score, make_scorer >>> ftwo_scorer = make_scorer(fbeta_score, beta=2) >>> from sklearn.model_selection import GridSearchCV >>> from sklearn.svm import LinearSVC >>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer) ``` 第二個用例是使用 [`make_scorer`](generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer "sklearn.metrics.make_scorer") 從簡單的 python 函數構建一個完全 custom scorer object （自定義的記分對象），可以使用幾個參數 : - 你要使用的 python 函數（在下面的例子中是 `my_custom_loss_func`） - python 函數是否返回一個分數 (`greater_is_better=True`, 默認值) 或者一個 loss （損失） (`greater_is_better=False`)。如果是一個 loss （損失），scorer object （記分對象）的 python 函數的輸出被 negated （否定），符合 cross validation convention （交叉驗證約定），scorers 為更好的模型返回更高的值。 - 僅用于 classification metrics （分類指數）: 您提供的 python 函數是否需要連續的 continuous decision certainties （判斷確定性）（`needs_threshold=True`）。默認值為 False 。 - 任何其他參數，如 `beta` 或者 `labels` 在函數 [`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score") 。以下是建立 custom scorers （自定義記分對象）的示例，并使用 `greater_is_better` 參數: ``` >>> import numpy as np >>> def my_custom_loss_func(ground_truth, predictions): ... diff = np.abs(ground_truth - predictions).max() ... return np.log(1 + diff) ... >>> # loss_func will negate the return value of my_custom_loss_func, >>> # which will be np.log(2), 0.693, given the values for ground_truth >>> # and predictions defined below. >>> loss = make_scorer(my_custom_loss_func, greater_is_better=False) >>> score = make_scorer(my_custom_loss_func, greater_is_better=True) >>> ground_truth = [[1], [1]] >>> predictions = [0, 1] >>> from sklearn.dummy import DummyClassifier >>> clf = DummyClassifier(strategy='most_frequent', random_state=0) >>> clf = clf.fit(ground_truth, predictions) >>> loss(clf,ground_truth, predictions) -0.69... >>> score(clf,ground_truth, predictions) 0.69... ``` ### 3.3.1.3. 實現自己的記分對象您可以通過從頭開始構建自己的 scoring object （記分對象），而不使用 [`make_scorer`](generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer "sklearn.metrics.make_scorer") factory 來生成更加靈活的 model scorers （模型記分對象）。對于被叫做 scorer 來說，它需要符合以下兩個規則所指定的協議: - 可以使用參數 `(estimator, X, y)` 來調用它，其中 `estimator` 是要被評估的模型，`X` 是驗證數據， `y` 是 `X` (在有監督情況下) 或 `None` (在無監督情況下) 已經被標注的真實數據目標。 - 它返回一個浮點數，用于對 `X` 進行量化 `estimator` 的預測質量，參考 `y` 。再次，按照慣例，更高的數字更好，所以如果你的 scorer 返回 loss ，那么這個值應該被 negated 。 ### 3.3.1.4. 使用多個指數評估 Scikit-learn 還允許在 `GridSearchCV`, `RandomizedSearchCV` 和 `cross_validate` 中評估 multiple metric （多個指數）。為 `scoring` 參數指定多個評分指標有兩種方法: - As an iterable of string metrics（作為 string metrics 的迭代）:: ``` >>> scoring = ['accuracy', 'precision'] ``` - As a `dict` mapping the scorer name to the scoring function（作為 `dict` ，將 scorer 名稱映射到 scoring 函數）:: ``` >>> from sklearn.metrics import accuracy_score >>> from sklearn.metrics import make_scorer >>> scoring = {'accuracy': make_scorer(accuracy_score), ... 'prec': 'precision'} ``` 請注意， dict 值可以是 scorer functions （記分函數）或者 predefined metric strings （預定義 metric 字符串）之一。目前，只有那些返回 single score （單一分數）的 scorer functions （記分函數）才能在 dict 內傳遞。不允許返回多個值的 Scorer functions （Scorer 函數），并且需要一個 wrapper 才能返回 single metric（單個指標）: ``` >>> from sklearn.model_selection import cross_validate >>> from sklearn.metrics import confusion_matrix >>> # A sample toy binary classification dataset >>> X, y = datasets.make_classification(n_classes=2, random_state=0) >>> svm = LinearSVC(random_state=0) >>> def tp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0] >>> def tn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 0] >>> def fp(y_true, y_pred): return confusion_matrix(y_true, y_pred)[1, 0] >>> def fn(y_true, y_pred): return confusion_matrix(y_true, y_pred)[0, 1] >>> scoring = {'tp' : make_scorer(tp), 'tn' : make_scorer(tn), ... 'fp' : make_scorer(fp), 'fn' : make_scorer(fn)} >>> cv_results = cross_validate(svm.fit(X, y), X, y, scoring=scoring) >>> # Getting the test set true positive scores >>> print(cv_results['test_tp']) [12 13 15] >>> # Getting the test set false negative scores >>> print(cv_results['test_fn']) [5 4 1] ``` ## 3.3.2. 分類指標 [`sklearn.metrics`](classes.html#module-sklearn.metrics "sklearn.metrics") 模塊實現了幾個 loss, score, 和 utility 函數來衡量 classification （分類）性能。某些 metrics （指標）可能需要 positive class （正類），confidence values（置信度值）或 binary decisions values （二進制決策值）的概率估計。大多數的實現允許每個樣本通過 `sample_weight` 參數為 overall score （總分）提供 weighted contribution （加權貢獻）。其中一些僅限于二分類案例: [`precision_recall_curve`](generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve "sklearn.metrics.precision_recall_curve")(y\_true, probas\_pred)Compute precision-recall pairs for different probability thresholds[`roc_curve`](generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve "sklearn.metrics.roc_curve")(y\_true, y\_score\[, pos\_label, …\])Compute Receiver operating characteristic (ROC)其他也可以在多分類案例中運行: [`cohen_kappa_score`](generated/sklearn.metrics.cohen_kappa_score.html#sklearn.metrics.cohen_kappa_score "sklearn.metrics.cohen_kappa_score")(y1, y2\[, labels, weights, …\])Cohen’s kappa: a statistic that measures inter-annotator agreement.[`confusion_matrix`](generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix "sklearn.metrics.confusion_matrix")(y\_true, y\_pred\[, labels, …\])Compute confusion matrix to evaluate the accuracy of a classification[`hinge_loss`](generated/sklearn.metrics.hinge_loss.html#sklearn.metrics.hinge_loss "sklearn.metrics.hinge_loss")(y\_true, pred\_decision\[, labels, …\])Average hinge loss (non-regularized)[`matthews_corrcoef`](generated/sklearn.metrics.matthews_corrcoef.html#sklearn.metrics.matthews_corrcoef "sklearn.metrics.matthews_corrcoef")(y\_true, y\_pred\[, …\])Compute the Matthews correlation coefficient (MCC)有些還可以在 multilabel case （多重案例）中工作: [`accuracy_score`](generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score "sklearn.metrics.accuracy_score")(y\_true, y\_pred\[, normalize, …\])Accuracy classification score.[`classification_report`](generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report "sklearn.metrics.classification_report")(y\_true, y\_pred\[, …\])Build a text report showing the main classification metrics[`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")(y\_true, y\_pred\[, labels, …\])Compute the F1 score, also known as balanced F-score or F-measure[`fbeta_score`](generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score "sklearn.metrics.fbeta_score")(y\_true, y\_pred, beta\[, labels, …\])Compute the F-beta score[`hamming_loss`](generated/sklearn.metrics.hamming_loss.html#sklearn.metrics.hamming_loss "sklearn.metrics.hamming_loss")(y\_true, y\_pred\[, labels, …\])Compute the average Hamming loss.[`jaccard_similarity_score`](generated/sklearn.metrics.jaccard_similarity_score.html#sklearn.metrics.jaccard_similarity_score "sklearn.metrics.jaccard_similarity_score")(y\_true, y\_pred\[, …\])Jaccard similarity coefficient score[`log_loss`](generated/sklearn.metrics.log_loss.html#sklearn.metrics.log_loss "sklearn.metrics.log_loss")(y\_true, y\_pred\[, eps, normalize, …\])Log loss, aka logistic loss or cross-entropy loss.[`precision_recall_fscore_support`](generated/sklearn.metrics.precision_recall_fscore_support.html#sklearn.metrics.precision_recall_fscore_support "sklearn.metrics.precision_recall_fscore_support")(y\_true, y\_pred)Compute precision, recall, F-measure and support for each class[`precision_score`](generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score "sklearn.metrics.precision_score")(y\_true, y\_pred\[, labels, …\])Compute the precision[`recall_score`](generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score "sklearn.metrics.recall_score")(y\_true, y\_pred\[, labels, …\])Compute the recall[`zero_one_loss`](generated/sklearn.metrics.zero_one_loss.html#sklearn.metrics.zero_one_loss "sklearn.metrics.zero_one_loss")(y\_true, y\_pred\[, normalize, …\])Zero-one classification loss.一些通常用于 ranking: [`dcg_score`](generated/sklearn.metrics.dcg_score.html#sklearn.metrics.dcg_score "sklearn.metrics.dcg_score")(y\_true, y\_score\[, k\])Discounted cumulative gain (DCG) at rank K.[`ndcg_score`](generated/sklearn.metrics.ndcg_score.html#sklearn.metrics.ndcg_score "sklearn.metrics.ndcg_score")(y\_true, y\_score\[, k\])Normalized discounted cumulative gain (NDCG) at rank K.有些工作與 binary 和 multilabel （但不是多類）的問題: [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score")(y\_true, y\_score\[, …\])Compute average precision (AP) from prediction scores[`roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score")(y\_true, y\_score\[, average, …\])Compute Area Under the Curve (AUC) from prediction scores在以下小節中，我們將介紹每個這些功能，前面是一些關于通用 API 和 metric 定義的注釋。 ### 3.3.2.1. 從二分到多分類和 multilabel 一些 metrics 基本上是為 binary classification tasks （二分類任務）定義的 (例如 [`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score"), [`roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score")) 。在這些情況下，默認情況下僅評估 positive label （正標簽），假設默認情況下，positive label （正類）標記為 `1` （盡管可以通過 `pos_label` 參數進行配置）。將 binary metric （二分指標）擴展為 multiclass （多類）或 multilabel （多標簽）問題時，數據將被視為二分問題的集合，每個類都有一個。然后可以使用多種方法在整個類中 average binary metric calculations （平均二分指標計算），每種類在某些情況下可能會有用。如果可用，您應該使用 `average` 參數來選擇它們。 - `"macro（宏）"` 簡單地計算 binary metrics （二分指標）的平均值，賦予每個類別相同的權重。在不常見的類別重要的問題上，macro-averaging （宏觀平均）可能是突出表現的一種手段。另一方面，所有類別同樣重要的假設通常是不真實的，因此 macro-averaging （宏觀平均）將過度強調不頻繁類的典型的低性能。 - `"weighted（加權）"` 通過計算其在真實數據樣本中的存在來對每個類的 score 進行加權的 binary metrics （二分指標）的平均值來計算類不平衡。 - `"micro（微）"` 給每個 sample-class pair （樣本類對）對 overall metric （總體指數）（sample-class 權重的結果除外）等同的貢獻。除了對每個類別的 metric 進行求和之外，這個總和構成每個類別度量的 dividends （除數）和 divisors （除數）計算一個整體商。在 multilabel settings （多標簽設置）中，Micro-averaging 可能是優先選擇的，包括要忽略 majority class （多數類）的 multiclass classification （多類分類）。 - `"samples（樣本）"` 僅適用于 multilabel problems （多標簽問題）。它 does not calculate a per-class measure （不計算每個類別的 measure），而是計算 evaluation data （評估數據）中的每個樣本的 true and predicted classes （真實和預測類別）的 metric （指標），并返回 (`sample_weight`-weighted) 加權平均。 - 選擇 `average=None` 將返回一個 array 與每個類的 score 。雖然將 multiclass data （多類數據）提供給 metric ，如 binary targets （二分類目標），作為 array of class labels （類標簽的數組），multilabel data （多標簽數據）被指定為 indicator matrix（指示符矩陣），其中 cell `[i, j]` 具有值 1，如果樣本 `i` 具有標號 `j` ，否則為值 0 。 ### 3.3.2.2. 精確度得分 [`accuracy_score`](generated/sklearn.metrics.accuracy_score.html#sklearn.metrics.accuracy_score "sklearn.metrics.accuracy_score") 函數計算 [accuracy](https://en.wikipedia.org/wiki/Accuracy_and_precision), 正確預測的分數（默認）或計數 (normalize=False)。在 multilabel classification （多標簽分類）中，函數返回 subset accuracy（子集精度）。如果樣本的 entire set of predicted labels （整套預測標簽）與真正的標簽組合匹配，則子集精度為 1.0; 否則為 0.0 。如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是第 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg) 個樣本的預測值，![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是相應的真實值，則 ![n_\text{samples}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 上的正確預測的分數被定義為 ![\texttt{accuracy}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples}-1} 1(\hat{y}_i = y_i)](https://box.kancloud.cn/809171f366a6d6d848e9d06d2132880f_341x55.jpg) 其中 ![1(x)](https://box.kancloud.cn/89c489280f4c7e8dd0c3c421b0d4358b_31x18.jpg) 是 [indicator function（指示函數）](https://en.wikipedia.org/wiki/Indicator_function). ``` >>> import numpy as np >>> from sklearn.metrics import accuracy_score >>> y_pred = [0, 2, 1, 3] >>> y_true = [0, 1, 2, 3] >>> accuracy_score(y_true, y_pred) 0.5 >>> accuracy_score(y_true, y_pred, normalize=False) 2 ``` In the multilabel case with binary label indicators（在具有二分標簽指示符的多標簽情況下）: ``` >>> accuracy_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) 0.5 ``` 示例: - 參閱 [Test with permutations the significance of a classification score](../auto_examples/feature_selection/plot_permutation_test_for_classification.html#sphx-glr-auto-examples-feature-selection-plot-permutation-test-for-classification-py)例如使用數據集排列的 accuracy score （精度分數）。 ### 3.3.2.3. Cohen’s kappa 函數 [`cohen_kappa_score`](generated/sklearn.metrics.cohen_kappa_score.html#sklearn.metrics.cohen_kappa_score "sklearn.metrics.cohen_kappa_score") 計算 [Cohen’s kappa](https://en.wikipedia.org/wiki/Cohen%27s_kappa) statistic（統計）。這個 measure （措施）旨在比較不同人工標注者的標簽，而不是 classifier （分類器）與 ground truth （真實數據）。 kappa score （參閱 docstring ）是 -1 和 1 之間的數字。 .8 以上的 scores 通常被認為是很好的 agreement （協議）; 0 或者更低表示沒有 agreement （實際上是 random labels （隨機標簽））。 Kappa scores 可以計算 binary or multiclass （二分或者多分類）問題，但不能用于 multilabel problems （多標簽問題）（除了手動計算 per-label score （每個標簽分數）），而不是兩個以上的 annotators （注釋器）。 ``` >>> from sklearn.metrics import cohen_kappa_score >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> cohen_kappa_score(y_true, y_pred) 0.4285714285714286 ``` ### 3.3.2.4. 混淆矩陣 [`confusion_matrix`](generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix "sklearn.metrics.confusion_matrix") 函數通過計算 [confusion matrix（混淆矩陣）](https://en.wikipedia.org/wiki/Confusion_matrix) 來 evaluates classification accuracy （評估分類的準確性）。根據定義，confusion matrix （混淆矩陣）中的 entry（條目） ![i, j](https://box.kancloud.cn/d2f39b1461f90c8e29c7740511a49fbb_22x17.jpg)，是實際上在 group ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg) 中的 observations （觀察數），但預測在 group ![j](https://box.kancloud.cn/7c1f52c4cc615183dbd16304bc8c1e94_9x16.jpg) 中。這里是一個示例: ``` >>> from sklearn.metrics import confusion_matrix >>> y_true = [2, 0, 2, 2, 0, 1] >>> y_pred = [0, 0, 2, 2, 0, 2] >>> confusion_matrix(y_true, y_pred) array([[2, 0, 0], [0, 0, 1], [1, 0, 2]]) ``` 這是一個這樣的 confusion matrix （混淆矩陣）的可視化表示（這個數字來自于 [Confusion matrix](../auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py)）: [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_confusion_matrix_0011.png](https://box.kancloud.cn/2aeb2d020c8bce222c39a58e3747112f_566x424.jpg)](../auto_examples/model_selection/plot_confusion_matrix.html)對于 binary problems （二分類問題），我們可以得到 true negatives（真 negatives）, false positives（假 positives）, false negatives（假 negatives）和 true positives（真 positives）的數量如下: ``` >>> y_true = [0, 0, 0, 1, 1, 1, 1, 1] >>> y_pred = [0, 1, 0, 1, 0, 1, 0, 1] >>> tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel() >>> tn, fp, fn, tp (2, 1, 2, 3) ``` 示例: - 參閱 [Confusion matrix](../auto_examples/model_selection/plot_confusion_matrix.html#sphx-glr-auto-examples-model-selection-plot-confusion-matrix-py)例如使用 confusion matrix （混淆矩陣）來評估 classifier （分類器）的輸出質量。 - 參閱 [Recognizing hand-written digits](../auto_examples/classification/plot_digits_classification.html#sphx-glr-auto-examples-classification-plot-digits-classification-py)例如使用 confusion matrix （混淆矩陣）來分類手寫數字。 - 參閱 [Classification of text documents using sparse features](../auto_examples/text/document_classification_20newsgroups.html#sphx-glr-auto-examples-text-document-classification-20newsgroups-py)例如使用 confusion matrix （混淆矩陣）對文本文檔進行分類。 ### 3.3.2.5. 分類報告 [`classification_report`](generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report "sklearn.metrics.classification_report") 函數構建一個顯示 main classification metrics （主分類指標）的文本報告。這是一個小例子，其中包含自定義的 `target_names` 和 inferred labels （推斷標簽）: ``` >>> from sklearn.metrics import classification_report >>> y_true = [0, 1, 2, 2, 0] >>> y_pred = [0, 0, 2, 1, 0] >>> target_names = ['class 0', 'class 1', 'class 2'] >>> print(classification_report(y_true, y_pred, target_names=target_names)) precision recall f1-score support class 0 0.67 1.00 0.80 2 class 1 0.00 0.00 0.00 1 class 2 1.00 0.50 0.67 2 avg / total 0.67 0.60 0.59 5 ``` 示例: - 參閱 [Recognizing hand-written digits](../auto_examples/classification/plot_digits_classification.html#sphx-glr-auto-examples-classification-plot-digits-classification-py)作為手寫數字的分類報告的使用示例。 - 參閱 [Classification of text documents using sparse features](../auto_examples/text/document_classification_20newsgroups.html#sphx-glr-auto-examples-text-document-classification-20newsgroups-py)作為文本文檔的分類報告使用的示例。 - 參閱 [Parameter estimation using grid search with cross-validation](../auto_examples/model_selection/plot_grid_search_digits.html#sphx-glr-auto-examples-model-selection-plot-grid-search-digits-py)例如使用 grid search with nested cross-validation （嵌套交叉驗證進行網格搜索）的分類報告。 ### 3.3.2.6. 漢明損失 [`hamming_loss`](generated/sklearn.metrics.hamming_loss.html#sklearn.metrics.hamming_loss "sklearn.metrics.hamming_loss") 計算兩組樣本之間的 average Hamming loss （平均漢明損失）或者 [Hamming distance（漢明距離）](https://en.wikipedia.org/wiki/Hamming_distance) 。如果 ![\hat{y}_j](https://box.kancloud.cn/861bda2ee215ffca90db3a0d09904009_15x19.jpg) 是給定樣本的第 ![j](https://box.kancloud.cn/7c1f52c4cc615183dbd16304bc8c1e94_9x16.jpg) 個標簽的預測值，則 ![y_j](https://box.kancloud.cn/677cd283282d3a460d2aa6e3eb63ac0c_15x14.jpg) 是相應的真實值，而 ![n_\text{labels}](https://box.kancloud.cn/da46b8c2d704c4e72eb3d96ac2a94be8_43x12.jpg) 是 classes or labels （類或者標簽）的數量，則兩個樣本之間的 Hamming loss （漢明損失） ![L_{Hamming}](https://box.kancloud.cn/8ab12c0818ad4fd4021b518fafe01faa_72x18.jpg) 定義為: ![L_{Hamming}(y, \hat{y}) = \frac{1}{n_\text{labels}} \sum_{j=0}^{n_\text{labels} - 1} 1(\hat{y}_j \not= y_j)](https://box.kancloud.cn/cacfcbf9798a368e1b7b9700a55759cd_321x56.jpg) 其中 ![1(x)](https://box.kancloud.cn/89c489280f4c7e8dd0c3c421b0d4358b_31x18.jpg) 是 [indicator function（指標函數）](https://en.wikipedia.org/wiki/Indicator_function). ``` >>> from sklearn.metrics import hamming_loss >>> y_pred = [1, 2, 3, 4] >>> y_true = [2, 2, 3, 4] >>> hamming_loss(y_true, y_pred) 0.25 ``` 在具有 binary label indicators （二分標簽指示符）的 multilabel （多標簽）情況下: ``` >>> hamming_loss(np.array([[0, 1], [1, 1]]), np.zeros((2, 2))) 0.75 ``` Note 在 multiclass classification （多類分類）中， Hamming loss （漢明損失）對應于 `y_true` 和 `y_pred` 之間的 Hamming distance（漢明距離），它類似于 [零一損失](#zero-one-loss) 函數。然而， zero-one loss penalizes （0-1損失懲罰）不嚴格匹配真實集合的預測集，Hamming loss （漢明損失）懲罰 individual labels （獨立標簽）。因此，Hamming loss（漢明損失）高于 zero-one loss（0-1 損失），總是在 0 和 1 之間，包括 0 和 1;預測真正的標簽的正確的 subset or superset （子集或超集）將給出 0 和 1 之間的 Hamming loss（漢明損失）。 ### 3.3.2.7. Jaccard 相似系數 score [`jaccard_similarity_score`](generated/sklearn.metrics.jaccard_similarity_score.html#sklearn.metrics.jaccard_similarity_score "sklearn.metrics.jaccard_similarity_score") 函數計算 pairs of label sets （標簽組對）之間的 [Jaccard similarity coefficients](https://en.wikipedia.org/wiki/Jaccard_index) 也稱作 Jaccard index 的平均值（默認）或總和。將第 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg) 個樣本的 Jaccard similarity coefficient 與被標注過的真實數據的標簽集 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 和 predicted label set （預測標簽集）:math:hat{y}\_i 定義為 ![J(y_i, \hat{y}_i) = \frac{|y_i \cap \hat{y}_i|}{|y_i \cup \hat{y}_i|}.](https://box.kancloud.cn/c1610da1d48f42a710e13784d9a0b237_150x44.jpg) 在 binary and multiclass classification （二分和多類分類）中，Jaccard similarity coefficient score 等于 classification accuracy（分類精度）。 ``` >>> import numpy as np >>> from sklearn.metrics import jaccard_similarity_score >>> y_pred = [0, 2, 1, 3] >>> y_true = [0, 1, 2, 3] >>> jaccard_similarity_score(y_true, y_pred) 0.5 >>> jaccard_similarity_score(y_true, y_pred, normalize=False) 2 ``` 在具有 binary label indicators （二分標簽指示符）的 multilabel （多標簽）情況下: ``` >>> jaccard_similarity_score(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) 0.75 ``` ### 3.3.2.8. 精準，召回和 F-measures 直觀地來理解，[precision](https://en.wikipedia.org/wiki/Precision_and_recall#Precision) 是 the ability of the classifier not to label as positive a sample that is negative （classifier （分類器）的標簽不能被標記為正的樣本為負的能力），并且 [recall](https://en.wikipedia.org/wiki/Precision_and_recall#Recall) 是 classifier （分類器）查找所有 positive samples （正樣本）的能力。 [F-measure](https://en.wikipedia.org/wiki/F1_score) (![F_\beta](https://box.kancloud.cn/aebb383fddf232175c9166b889cb08d2_20x18.jpg) 和 ![F_1](https://box.kancloud.cn/fa296a713faafbad8659a3b82cc39f07_18x16.jpg) measures) 可以解釋為 precision （精度）和 recall （召回）的 weighted harmonic mean （加權調和平均值）。 ![F_\beta](https://box.kancloud.cn/aebb383fddf232175c9166b889cb08d2_20x18.jpg) measure 值達到其最佳值 1 ，其最差分數為 0 。與 ![\beta = 1](https://box.kancloud.cn/5aec3b14495f3f06c786a0fda0ba733f_43x16.jpg), ![F_\beta](https://box.kancloud.cn/aebb383fddf232175c9166b889cb08d2_20x18.jpg) 和 ![F_1](https://box.kancloud.cn/fa296a713faafbad8659a3b82cc39f07_18x16.jpg) 是等價的， recall （召回）和 precision （精度）同樣重要。 [`precision_recall_curve`](generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve "sklearn.metrics.precision_recall_curve") 通過改變 decision threshold （決策閾值）從 ground truth label （被標記的真實數據標簽）和 score given by the classifier （分類器給出的分數）計算 precision-recall curve （精確召回曲線）。 [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score") 函數根據 prediction scores （預測分數）計算出 average precision (AP)（平均精度）。該分數對應于 precision-recall curve （精確召回曲線）下的面積。該值在 0 和 1 之間，并且越高越好。通過 random predictions （隨機預測）， AP 是 fraction of positive samples （正樣本的分數）。幾個函數可以讓您 analyze the precision （分析精度），recall（召回）和 F-measures 得分: [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score")(y\_true, y\_score\[, …\])Compute average precision (AP) from prediction scores[`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score")(y\_true, y\_pred\[, labels, …\])Compute the F1 score, also known as balanced F-score or F-measure[`fbeta_score`](generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score "sklearn.metrics.fbeta_score")(y\_true, y\_pred, beta\[, labels, …\])Compute the F-beta score[`precision_recall_curve`](generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve "sklearn.metrics.precision_recall_curve")(y\_true, probas\_pred)Compute precision-recall pairs for different probability thresholds[`precision_recall_fscore_support`](generated/sklearn.metrics.precision_recall_fscore_support.html#sklearn.metrics.precision_recall_fscore_support "sklearn.metrics.precision_recall_fscore_support")(y\_true, y\_pred)Compute precision, recall, F-measure and support for each class[`precision_score`](generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score "sklearn.metrics.precision_score")(y\_true, y\_pred\[, labels, …\])Compute the precision[`recall_score`](generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score "sklearn.metrics.recall_score")(y\_true, y\_pred\[, labels, …\])Compute the recall請注意，[`precision_recall_curve`](generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve "sklearn.metrics.precision_recall_curve") 函數僅限于 binary case （二分情況）。 [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score") 函數只適用于 binary classification and multilabel indicator format （二分類和多標簽指示器格式）。示例: - 參閱 [Classification of text documents using sparse features](../auto_examples/text/document_classification_20newsgroups.html#sphx-glr-auto-examples-text-document-classification-20newsgroups-py)例如 [`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score") 用于分類文本文檔的用法。 - 參閱 [Parameter estimation using grid search with cross-validation](../auto_examples/model_selection/plot_grid_search_digits.html#sphx-glr-auto-examples-model-selection-plot-grid-search-digits-py)例如 [`precision_score`](generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score "sklearn.metrics.precision_score") 和 [`recall_score`](generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score "sklearn.metrics.recall_score") 用于 using grid search with nested cross-validation （使用嵌套交叉驗證的網格搜索）來估計參數。 - 參閱 [Precision-Recall](../auto_examples/model_selection/plot_precision_recall.html#sphx-glr-auto-examples-model-selection-plot-precision-recall-py)例如 [`precision_recall_curve`](generated/sklearn.metrics.precision_recall_curve.html#sklearn.metrics.precision_recall_curve "sklearn.metrics.precision_recall_curve") 用于 evaluate classifier output quality（評估分類器輸出質量）。 #### 3.3.2.8.1. 二分類在二分類任務中，術語 ‘’positive（正）’’ 和 ‘’negative（負）’’ 是指 classifier’s prediction （分類器的預測），術語 ‘’true（真）’’ 和 ‘’false（假）’’ 是指該預測是否對應于 external judgment （外部判斷）（有時被稱為 ‘’observation（觀測值）’‘）。給出這些定義，我們可以指定下表: Actual class (observation)Predicted class (expectation)tp (true positive) Correct resultfp (false positive) Unexpected resultfn (false negative) Missing resulttn (true negative) Correct absence of result在這種情況下，我們可以定義 precision（精度）, recall（召回）和 F-measure 的概念: ![\text{precision} = \frac{tp}{tp + fp},](https://box.kancloud.cn/35516d0d1bf6b5398e4e880de7b90125_157x41.jpg) ![\text{recall} = \frac{tp}{tp + fn},](https://box.kancloud.cn/b2177648bc64936d464c43526f394ac3_132x41.jpg) ![F_\beta = (1 + \beta^2) \frac{\text{precision} \times \text{recall}}{\beta^2 \text{precision} + \text{recall}}.](https://box.kancloud.cn/ab1e348b2b224dec4bd59566d68baada_266x42.jpg) 以下是 binary classification （二分類）中的一些小例子: ``` >>> from sklearn import metrics >>> y_pred = [0, 1, 0, 0] >>> y_true = [0, 1, 0, 1] >>> metrics.precision_score(y_true, y_pred) 1.0 >>> metrics.recall_score(y_true, y_pred) 0.5 >>> metrics.f1_score(y_true, y_pred) 0.66... >>> metrics.fbeta_score(y_true, y_pred, beta=0.5) 0.83... >>> metrics.fbeta_score(y_true, y_pred, beta=1) 0.66... >>> metrics.fbeta_score(y_true, y_pred, beta=2) 0.55... >>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5) (array([ 0.66..., 1. ]), array([ 1. , 0.5]), array([ 0.71..., 0.83...]), array([2, 2]...)) >>> import numpy as np >>> from sklearn.metrics import precision_recall_curve >>> from sklearn.metrics import average_precision_score >>> y_true = np.array([0, 0, 1, 1]) >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> precision, recall, threshold = precision_recall_curve(y_true, y_scores) >>> precision array([ 0.66..., 0.5 , 1. , 1. ]) >>> recall array([ 1. , 0.5, 0.5, 0. ]) >>> threshold array([ 0.35, 0.4 , 0.8 ]) >>> average_precision_score(y_true, y_scores) 0.83... ``` #### 3.3.2.8.2. 多類和多標簽分類在 multiclass and multilabel classification task（多類和多標簽分類任務）中，precision（精度）, recall（召回）, and F-measures 的概念可以獨立地應用于每個標簽。有以下幾種方法 combine results across labels （將結果跨越標簽組合），由 `average` 參數指定為 [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score") （僅用于 multilabel）， [`f1_score`](generated/sklearn.metrics.f1_score.html#sklearn.metrics.f1_score "sklearn.metrics.f1_score"), [`fbeta_score`](generated/sklearn.metrics.fbeta_score.html#sklearn.metrics.fbeta_score "sklearn.metrics.fbeta_score"), [`precision_recall_fscore_support`](generated/sklearn.metrics.precision_recall_fscore_support.html#sklearn.metrics.precision_recall_fscore_support "sklearn.metrics.precision_recall_fscore_support"), [`precision_score`](generated/sklearn.metrics.precision_score.html#sklearn.metrics.precision_score "sklearn.metrics.precision_score") 和 [`recall_score`](generated/sklearn.metrics.recall_score.html#sklearn.metrics.recall_score "sklearn.metrics.recall_score") 函數，如上 [above](#average) 所述。請注意，對于在包含所有標簽的多類設置中進行 “micro”-averaging （”微”平均），將產生相等的 precision（精度）， recall（召回）和 ![F](https://box.kancloud.cn/3108b1c6b81a7a313256a1d3d96339cb_14x12.jpg) ，而 “weighted（加權）” averaging（平均）可能會產生 precision（精度）和 recall（召回）之間的 F-score 。為了使這一點更加明確，請考慮以下 notation （符號）: - ![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg)*predicted（預測）* ![(sample, label)](https://box.kancloud.cn/9774608efebee2864b745e6385a89457_113x18.jpg) 對 - ![\hat{y}](https://box.kancloud.cn/277d247a09c0ccb4240fe50a4806934e_9x17.jpg)*true（真）* ![(sample, label)](https://box.kancloud.cn/9774608efebee2864b745e6385a89457_113x18.jpg) 對 - ![L](https://box.kancloud.cn/932e52dfeb15d15287c07f0b899113b1_12x12.jpg) labels 集合 - ![S](https://box.kancloud.cn/3654ba253cea374c1cf48d1877e4bf6c_12x12.jpg) samples 集合 - ![y_s](https://box.kancloud.cn/dd31a23b6c2dbeefc1f85acd7e3a8d50_15x12.jpg)![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg) 的子集與樣本 ![s](https://box.kancloud.cn/2f503365de225730371850cf4efa70b8_8x8.jpg), 即 ![y_s := \left\{(s', l) \in y | s' = s\right\}](https://box.kancloud.cn/934d26a5b6b4d4a1b66208825f7e09d3_184x19.jpg) - ![y_l](https://box.kancloud.cn/0096fa3ea351944ac5ecbbc83435d4a3_13x12.jpg)![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg) 的子集與 label ![l](https://box.kancloud.cn/aee0e789bbfa3644393125081b3c7fe3_5x13.jpg) - 類似的, ![\hat{y}_s](https://box.kancloud.cn/60703a24106632dea3e2367d6ac26998_15x17.jpg) 和 ![\hat{y}_l](https://box.kancloud.cn/dc92a9e5cfa3bc302fce1d30f9cb21db_13x17.jpg) 是 ![\hat{y}](https://box.kancloud.cn/277d247a09c0ccb4240fe50a4806934e_9x17.jpg) 的子集 - ![P(A, B) := \frac{\left| A \cap B \right|}{\left|A\right|}](https://box.kancloud.cn/b171fae44f547082158d7064c6b1ed86_131x27.jpg) - ![R(A, B) := \frac{\left| A \cap B \right|}{\left|B\right|}](https://box.kancloud.cn/3c415239cb39c5c44ed525902b44d7f3_131x27.jpg)(Conventions （公約）在處理 ![B = \emptyset](https://box.kancloud.cn/25c189db7d8e2e08c207908cfca8b154_48x17.jpg) 有所不同; 這個實現使用 ![R(A, B):=0](https://box.kancloud.cn/04d7cdbf733a3efb5f1b97606339878e_102x18.jpg), 與 ![P](https://box.kancloud.cn/08277e04611b27b30b29f99ba0830d27_14x12.jpg) 類似.) - ![F_\beta(A, B) := \left(1 + \beta^2\right) \frac{P(A, B) \times R(A, B)}{\beta^2 P(A, B) + R(A, B)}](https://box.kancloud.cn/58682d26e2c0f321451be5c51a71b083_279x29.jpg) 然后將 metrics （指標）定義為: `average`PrecisionRecallF\_beta`"micro"`![P(y, \hat{y})](https://box.kancloud.cn/31bdafc622943bf7bbf046a585f234a1_54x18.jpg)![R(y, \hat{y})](https://box.kancloud.cn/77291189d1c126bde670c7573836df8a_54x18.jpg)![F_\beta(y, \hat{y})](https://box.kancloud.cn/3e1bb97dd8c354b59ccd006fbbc0ffbe_60x20.jpg)`"samples"`![\frac{1}{\left|S\right|} \sum_{s \in S} P(y_s, \hat{y}_s)](https://box.kancloud.cn/ffc47952b16d9ce0fc5a2a1072fc2cc1_133x26.jpg)![\frac{1}{\left|S\right|} \sum_{s \in S} R(y_s, \hat{y}_s)](https://box.kancloud.cn/d950c91e223d0880b9c067aa2c613191_133x26.jpg)![\frac{1}{\left|S\right|} \sum_{s \in S} F_\beta(y_s, \hat{y}_s)](https://box.kancloud.cn/6bc2f50f2a4574345196a579e2e75679_139x26.jpg)`"macro"`![\frac{1}{\left|L\right|} \sum_{l \in L} P(y_l, \hat{y}_l)](https://box.kancloud.cn/9dbcb3265dcb4917f65feaa4d657ee11_127x26.jpg)![\frac{1}{\left|L\right|} \sum_{l \in L} R(y_l, \hat{y}_l)](https://box.kancloud.cn/12cf8b4c87020129b9c559a18090aeaa_127x26.jpg)![\frac{1}{\left|L\right|} \sum_{l \in L} F_\beta(y_l, \hat{y}_l)](https://box.kancloud.cn/4275f37b87c67faea17b7717f1e212be_134x26.jpg)`"weighted"`![\frac{1}{\sum_{l \in L} \left|\hat{y}_l\right|} \sum_{l \in L} \left|\hat{y}_l\right| P(y_l, \hat{y}_l)](https://box.kancloud.cn/15564b3dbe432feb2db5382696528958_189x28.jpg)![\frac{1}{\sum_{l \in L} \left|\hat{y}_l\right|} \sum_{l \in L} \left|\hat{y}_l\right| R(y_l, \hat{y}_l)](https://box.kancloud.cn/01627e11e94d2a84370fa4059fe7ad50_189x28.jpg)![\frac{1}{\sum_{l \in L} \left|\hat{y}_l\right|} \sum_{l \in L} \left|\hat{y}_l\right| F_\beta(y_l, \hat{y}_l)](https://box.kancloud.cn/de9bc098c0220fc46108d35a282ad6a7_196x28.jpg)`None`![\langle P(y_l, \hat{y}_l) | l \in L \rangle](https://box.kancloud.cn/794db845db45cb15c4b585eb2a04cf41_119x19.jpg)![\langle R(y_l, \hat{y}_l) | l \in L \rangle](https://box.kancloud.cn/abe1b0edf7b287e3822bc59dfeead69c_119x19.jpg)![\langle F_\beta(y_l, \hat{y}_l) | l \in L \rangle](https://box.kancloud.cn/bfc9917fb837e2d72b3407d605eb6280_126x20.jpg) ``` >>> from sklearn import metrics >>> y_true = [0, 1, 2, 0, 1, 2] >>> y_pred = [0, 2, 1, 0, 0, 1] >>> metrics.precision_score(y_true, y_pred, average='macro') 0.22... >>> metrics.recall_score(y_true, y_pred, average='micro') ... 0.33... >>> metrics.f1_score(y_true, y_pred, average='weighted') 0.26... >>> metrics.fbeta_score(y_true, y_pred, average='macro', beta=0.5) 0.23... >>> metrics.precision_recall_fscore_support(y_true, y_pred, beta=0.5, average=None) ... (array([ 0.66..., 0. , 0. ]), array([ 1., 0., 0.]), array([ 0.71..., 0. , 0. ]), array([2, 2, 2]...)) ``` For multiclass classification with a “negative class”, it is possible to exclude some labels: ``` >>> metrics.recall_score(y_true, y_pred, labels=[1, 2], average='micro') ... # excluding 0, no labels were correctly recalled 0.0 ``` Similarly, labels not present in the data sample may be accounted for in macro-averaging. ``` >>> metrics.precision_score(y_true, y_pred, labels=[0, 1, 2, 3], average='macro') ... 0.166... ``` ### 3.3.2.9. Hinge loss [`hinge_loss`](generated/sklearn.metrics.hinge_loss.html#sklearn.metrics.hinge_loss "sklearn.metrics.hinge_loss") 函數使用 [hinge loss](https://en.wikipedia.org/wiki/Hinge_loss) 計算模型和數據之間的 average distance （平均距離），這是一種只考慮 prediction errors （預測誤差）的 one-sided metric （單向指標）。（Hinge loss 用于最大邊界分類器，如支持向量機）如果標簽用 +1 和 -1 編碼，則 ![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg): 是真實值，并且 ![w](https://box.kancloud.cn/0635104a899a1b8951f0b8da2816a950_13x8.jpg) 是由 `decision_function` 輸出的 predicted decisions （預測決策），則 hinge loss 定義為: ![L_\text{Hinge}(y, w) = \max\left\{1 - wy, 0\right\} = \left|1 - wy\right|_+](https://box.kancloud.cn/7945964990e3ffe30480d4126a51c97b_338x21.jpg) 如果有兩個以上的標簽， [`hinge_loss`](generated/sklearn.metrics.hinge_loss.html#sklearn.metrics.hinge_loss "sklearn.metrics.hinge_loss") 由于 Crammer & Singer 而使用了 multiclass variant （多類型變體）。 [Here](http://jmlr.csail.mit.edu/papers/volume2/crammer01a/crammer01a.pdf) 是描述它的論文。如果 ![y_w](https://box.kancloud.cn/f59a5c945b996a714d788df68ff437c9_18x12.jpg) 是真實標簽的 predicted decision （預測決策），并且 ![y_t](https://box.kancloud.cn/9323f4f7e5c7cc18a7f74854ba96f757_14x12.jpg) 是所有其他標簽的預測決策的最大值，其中預測決策由 decision function （決策函數）輸出，則 multiclass hinge loss 定義如下: ![L_\text{Hinge}(y_w, y_t) = \max\left\{1 + y_t - y_w, 0\right\}](https://box.kancloud.cn/ad08bde480d9c85f362ae931509e9f47_282x20.jpg) 這里是一個小例子，演示了在 binary class （二類）問題中使用了具有 svm classifier （svm 的分類器）的 [`hinge_loss`](generated/sklearn.metrics.hinge_loss.html#sklearn.metrics.hinge_loss "sklearn.metrics.hinge_loss") 函數: ``` >>> from sklearn import svm >>> from sklearn.metrics import hinge_loss >>> X = [[0], [1]] >>> y = [-1, 1] >>> est = svm.LinearSVC(random_state=0) >>> est.fit(X, y) LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class='ovr', penalty='l2', random_state=0, tol=0.0001, verbose=0) >>> pred_decision = est.decision_function([[-2], [3], [0.5]]) >>> pred_decision array([-2.18..., 2.36..., 0.09...]) >>> hinge_loss([-1, 1, 1], pred_decision) 0.3... ``` 這里是一個示例，演示了在 multiclass problem （多類問題）中使用了具有 svm 分類器的 [`hinge_loss`](generated/sklearn.metrics.hinge_loss.html#sklearn.metrics.hinge_loss "sklearn.metrics.hinge_loss") 函數: ``` >>> X = np.array([[0], [1], [2], [3]]) >>> Y = np.array([0, 1, 2, 3]) >>> labels = np.array([0, 1, 2, 3]) >>> est = svm.LinearSVC() >>> est.fit(X, Y) LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True, intercept_scaling=1, loss='squared_hinge', max_iter=1000, multi_class='ovr', penalty='l2', random_state=None, tol=0.0001, verbose=0) >>> pred_decision = est.decision_function([[-1], [2], [3]]) >>> y_true = [0, 2, 3] >>> hinge_loss(y_true, pred_decision, labels) 0.56... ``` ### 3.3.2.10. Log 損失 Log loss，又被稱為 logistic regression loss（logistic 回歸損失）或者 cross-entropy loss（交叉熵損失）定義在 probability estimates （概率估計）。它通常用于 (multinomial) logistic regression （（多項式）logistic 回歸）和 neural networks （神經網絡）以及 expectation-maximization （期望最大化）的一些變體中，并且可用于評估分類器的 probability outputs （概率輸出）（`predict_proba`）而不是其 discrete predictions （離散預測）。對于具有真實標簽 ![y \in \{0,1\}](https://box.kancloud.cn/75f499527495bb78ae7059f7b06dace1_75x19.jpg) 的 binary classification （二分類）和 probability estimate （概率估計） ![p = \operatorname{Pr}(y = 1)](https://box.kancloud.cn/06010300e140579db2ce53cdb9684b17_109x18.jpg), 每個樣本的 log loss 是給定的分類器的 negative log-likelihood 真正的標簽: ![L_{\log}(y, p) = -\log \operatorname{Pr}(y|p) = -(y \log (p) + (1 - y) \log (1 - p))](https://box.kancloud.cn/839a8713ade921d22bc68371c07ec19f_460x20.jpg) 這擴展到 multiclass case （多類案例）如下。讓一組樣本的真實標簽被編碼為 1-of-K binary indicator matrix ![Y](https://box.kancloud.cn/f016166d5c910a983468b95f8fb3a11e_14x12.jpg), 即如果樣本 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg) 具有取自一組 ![K](https://box.kancloud.cn/cdbf8b35090576059f2b14a851c73b86_16x12.jpg) 個標簽的標簽 ![k](https://box.kancloud.cn/300675e73ace6bf4c352cfbb633f0199_9x13.jpg) ，則 ![y_{i,k} = 1](https://box.kancloud.cn/a09df6dff8352bafdef46a2689937570_57x18.jpg) 。令 ![P](https://box.kancloud.cn/08277e04611b27b30b29f99ba0830d27_14x12.jpg) 為 matrix of probability estimates （概率估計矩陣）， ![p_{i,k} = \operatorname{Pr}(t_{i,k} = 1)](https://box.kancloud.cn/58f2753deaabdbe91357330520f2273f_137x20.jpg) 。那么整套的 log loss 就是 ![L_{\log}(Y, P) = -\log \operatorname{Pr}(Y|P) = - \frac{1}{N} \sum_{i=0}^{N-1} \sum_{k=0}^{K-1} y_{i,k} \log p_{i,k}](https://box.kancloud.cn/2448c2f352a06ded30953a56b537885c_412x55.jpg) 為了看這這里如何 generalizes （推廣）上面給出的 binary log loss （二分 log loss），請注意，在 binary case （二分情況下），![p_{i,0} = 1 - p_{i,1}](https://box.kancloud.cn/3e09dbd17aed7c847240eec39536209b_104x18.jpg) 和 ![y_{i,0} = 1 - y_{i,1}](https://box.kancloud.cn/21012ac5279bc983e8559254ba107797_102x18.jpg) ，因此擴展 ![y_{i,k} \in \{0,1\}](https://box.kancloud.cn/ffbfd10600b986e967aa461d09c77741_90x20.jpg) 的 inner sum （內部和），給出 binary log loss （二分 log loss）。 [`log_loss`](generated/sklearn.metrics.log_loss.html#sklearn.metrics.log_loss "sklearn.metrics.log_loss") 函數計算出一個 a list of ground-truth labels （已標注的真實數據的標簽的列表）和一個 probability matrix （概率矩陣）的 log loss，由 estimator （估計器）的 `predict_proba` 方法返回。 ``` >>> from sklearn.metrics import log_loss >>> y_true = [0, 0, 1, 1] >>> y_pred = [[.9, .1], [.8, .2], [.3, .7], [.01, .99]] >>> log_loss(y_true, y_pred) 0.1738... ``` `y_pred` 中的第一個 `[.9, .1]` 表示第一個樣本具有標簽 0 的 90% 概率。log loss 是非負數。 ### 3.3.2.11. 馬修斯相關系數 [`matthews_corrcoef`](generated/sklearn.metrics.matthews_corrcoef.html#sklearn.metrics.matthews_corrcoef "sklearn.metrics.matthews_corrcoef") 函數用于計算 binary classes （二分類）的 [Matthew’s correlation coefficient (MCC)](https://en.wikipedia.org/wiki/Matthews_correlation_coefficient) 引用自 Wikipedia: > “Matthews correlation coefficient（馬修斯相關系數）用于機器學習，作為 binary (two-class) classifications （二分類）分類質量的度量。它考慮到 true and false positives and negatives （真和假的 positives 和 negatives），通常被認為是可以使用的 balanced measure（平衡措施），即使 classes are of very different sizes （類別大小不同）。MCC 本質上是 -1 和 +1 之間的相關系數值。系數 +1 表示完美預測，0 表示平均隨機預測， -1 表示反向預測。statistic （統計量）也稱為 phi coefficient （phi）系數。” 在 binary (two-class) （二分類）情況下，![tp](https://box.kancloud.cn/3bc2f4b08dfd9e12a871651663b77f05_15x16.jpg), ![tn](https://box.kancloud.cn/1a61c0b8663984aa130af62f0355004b_17x12.jpg), ![fp](https://box.kancloud.cn/3a735bc9d1d0d0a559cc8b668f8a0a8d_20x16.jpg) 和 ![fn](https://box.kancloud.cn/c58d49e179940b9451fad80ba9889407_22x16.jpg) 分別是 true positives, true negatives, false positives 和 false negatives 的數量，MCC 定義為 ![MCC = \frac{tp \times tn - fp \times fn}{\sqrt{(tp + fp)(tp + fn)(tn + fp)(tn + fn)}}.](https://box.kancloud.cn/8e130557bd3736e69b6491d0a8918690_389x46.jpg) 在 multiclass case （多類的情況）下， Matthews correlation coefficient（馬修斯相關系數）可以根據 ![K](https://box.kancloud.cn/cdbf8b35090576059f2b14a851c73b86_16x12.jpg) classes （類）的 [`confusion_matrix`](generated/sklearn.metrics.confusion_matrix.html#sklearn.metrics.confusion_matrix "sklearn.metrics.confusion_matrix") ![C](https://box.kancloud.cn/95378f1036b3ba9a15a5f33f8521b6f2_14x12.jpg) 定義 [defined](http://rk.kvl.dk/introduction/index.html) 。為了簡化定義，考慮以下中間變量: - ![t_k=\sum_{i}^{K} C_{ik}](https://box.kancloud.cn/412b32fa1a93162b2cd88c462229851b_97x24.jpg) 真正發生了 ![k](https://box.kancloud.cn/300675e73ace6bf4c352cfbb633f0199_9x13.jpg) 類的次數, - ![p_k=\sum_{i}^{K} C_{ki}](https://box.kancloud.cn/43a1b032ee1bc365425add998e6adef1_101x24.jpg)![k](https://box.kancloud.cn/300675e73ace6bf4c352cfbb633f0199_9x13.jpg) 類被預測的次數, - ![c=\sum_{k}^{K} C_{kk}](https://box.kancloud.cn/8bdbb2f7a55b60deaaac83f7ec2f5b44_94x24.jpg) 正確預測的樣本總數, - ![s=\sum_{i}^{K} \sum_{j}^{K} C_{ij}](https://box.kancloud.cn/5c840b3b50f7688d5fd3d5bcd10341b6_125x27.jpg) 樣本總數. 然后 multiclass MCC 定義為: ![MCC = \frac{ c \times s - \sum_{k}^{K} p_k \times t_k }{\sqrt{ (s^2 - \sum_{k}^{K} p_k^2) \times (s^2 - \sum_{k}^{K} t_k^2) }}](https://box.kancloud.cn/45f9f42d0f6da4e0516c11514d394a0f_319x63.jpg) 當有兩個以上的標簽時， MCC 的值將不再在 -1 和 +1 之間。相反，根據已經標注的真實數據的數量和分布情況，最小值將介于 -1 和 0 之間。最大值始終為 +1 。這是一個小例子，說明了使用 [`matthews_corrcoef`](generated/sklearn.metrics.matthews_corrcoef.html#sklearn.metrics.matthews_corrcoef "sklearn.metrics.matthews_corrcoef") 函數: ``` >>> from sklearn.metrics import matthews_corrcoef >>> y_true = [+1, +1, +1, -1] >>> y_pred = [+1, -1, +1, +1] >>> matthews_corrcoef(y_true, y_pred) -0.33... ``` ### 3.3.2.12. Receiver operating characteristic (ROC) 函數 [`roc_curve`](generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve "sklearn.metrics.roc_curve") 計算 [receiver operating characteristic curve, or ROC curve](https://en.wikipedia.org/wiki/Receiver_operating_characteristic). 引用 Wikipedia : > “A receiver operating characteristic (ROC), 或者簡單的 ROC 曲線，是一個圖形圖，說明了 binary classifier （二分分類器）系統的性能，因為 discrimination threshold （鑒別閾值）是變化的。它是通過在不同的閾值設置下，從 true positives out of the positives (TPR = true positive 比例) 與 false positives out of the negatives (FPR = false positive 比例) 繪制 true positive 的比例來創建的。 TPR 也稱為 sensitivity（靈敏度），FPR 是減去 specificity（特異性）或 true negative 比例。” 該函數需要真正的 binar value （二分值）和 target scores（目標分數），這可以是 positive class 的 probability estimates （概率估計），confidence values（置信度值）或 binary decisions（二分決策）。這是一個如何使用 [`roc_curve`](generated/sklearn.metrics.roc_curve.html#sklearn.metrics.roc_curve "sklearn.metrics.roc_curve") 函數的小例子: ``` >>> import numpy as np >>> from sklearn.metrics import roc_curve >>> y = np.array([1, 1, 2, 2]) >>> scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> fpr, tpr, thresholds = roc_curve(y, scores, pos_label=2) >>> fpr array([ 0. , 0.5, 0.5, 1. ]) >>> tpr array([ 0.5, 0.5, 1. , 1. ]) >>> thresholds array([ 0.8 , 0.4 , 0.35, 0.1 ]) ``` 該圖顯示了這樣的 ROC 曲線的示例: [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_roc_0011.png](https://box.kancloud.cn/5c2bf5f5aef5777e6f29a985946a7461_566x424.jpg)](../auto_examples/model_selection/plot_roc.html)[`roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score") 函數計算 receiver operating characteristic (ROC) 曲線下的面積，也由 AUC 和 AUROC 表示。通過計算 roc 曲線下的面積，曲線信息總結為一個數字。有關更多的信息，請參閱 [Wikipedia article on AUC](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve) . ``` >>> import numpy as np >>> from sklearn.metrics import roc_auc_score >>> y_true = np.array([0, 0, 1, 1]) >>> y_scores = np.array([0.1, 0.4, 0.35, 0.8]) >>> roc_auc_score(y_true, y_scores) 0.75 ``` 在 multi-label classification （多標簽分類）中， [`roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score") 函數通過在標簽上進行平均來擴展 [above](#average) . 與諸如 subset accuracy （子集精確度），Hamming loss（漢明損失）或 F1 score 的 metrics（指標）相比， ROC 不需要優化每個標簽的閾值。[`roc_auc_score`](generated/sklearn.metrics.roc_auc_score.html#sklearn.metrics.roc_auc_score "sklearn.metrics.roc_auc_score") 函數也可以用于 multi-class classification （多類分類），如果預測的輸出被 binarized （二分化）。 [![http://sklearn.apachecn.org/cn/0.19.0/_images/sphx_glr_plot_roc_0021.png](https://box.kancloud.cn/559d19a1c834cd7073eeafc432137492_566x424.jpg)](../auto_examples/model_selection/plot_roc.html)示例: - 參閱 [Receiver Operating Characteristic (ROC)](../auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py)例如使用 ROC 來評估分類器輸出的質量。 - 參閱 [Receiver Operating Characteristic (ROC) with cross validation](../auto_examples/model_selection/plot_roc_crossval.html#sphx-glr-auto-examples-model-selection-plot-roc-crossval-py)例如使用 ROC 來評估分類器輸出質量，使用 cross-validation （交叉驗證）。 - 參閱 [Species distribution modeling](../auto_examples/applications/plot_species_distribution_modeling.html#sphx-glr-auto-examples-applications-plot-species-distribution-modeling-py)例如使用 ROC 來 model species distribution 模擬物種分布。 ### 3.3.2.13. 零一損失 [`zero_one_loss`](generated/sklearn.metrics.zero_one_loss.html#sklearn.metrics.zero_one_loss "sklearn.metrics.zero_one_loss") 函數通過 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 計算 0-1 classification loss (![L_{0-1}](https://box.kancloud.cn/57bcb4509034930f7cda2cb6e33c49ef_34x16.jpg)) 的 sum （和）或 average （平均值）。默認情況下，函數在樣本上 normalizes （標準化）。要獲得 ![L_{0-1}](https://box.kancloud.cn/57bcb4509034930f7cda2cb6e33c49ef_34x16.jpg) 的總和，將 `normalize` 設置為 `False`。在 multilabel classification （多標簽分類）中，如果零標簽與標簽嚴格匹配，則 [`zero_one_loss`](generated/sklearn.metrics.zero_one_loss.html#sklearn.metrics.zero_one_loss "sklearn.metrics.zero_one_loss") 將一個子集作為一個子集，如果有任何錯誤，則為零。默認情況下，函數返回不完全預測子集的百分比。為了得到這樣的子集的計數，將 `normalize` 設置為 `False` 。如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是第 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg) 個樣本的預測值，![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是相應的真實值，則 0-1 loss ![L_{0-1}](https://box.kancloud.cn/57bcb4509034930f7cda2cb6e33c49ef_34x16.jpg) 定義為: ![L_{0-1}(y_i, \hat{y}_i) = 1(\hat{y}_i \not= y_i)](https://box.kancloud.cn/acf23a7229cdc980ad48988870da5430_183x18.jpg) 其中 ![1(x)](https://box.kancloud.cn/89c489280f4c7e8dd0c3c421b0d4358b_31x18.jpg) 是 [indicator function](https://en.wikipedia.org/wiki/Indicator_function). ``` >>> from sklearn.metrics import zero_one_loss >>> y_pred = [1, 2, 3, 4] >>> y_true = [2, 2, 3, 4] >>> zero_one_loss(y_true, y_pred) 0.25 >>> zero_one_loss(y_true, y_pred, normalize=False) 1 ``` 在具有 binary label indicators （二分標簽指示符）的 multilabel （多標簽）情況下，第一個標簽集 \[0,1\] 有錯誤: ``` >>> zero_one_loss(np.array([[0, 1], [1, 1]]), np.ones((2, 2))) 0.5 >>> zero_one_loss(np.array([[0, 1], [1, 1]]), np.ones((2, 2)), normalize=False) 1 ``` 示例: - 參閱 [Recursive feature elimination with cross-validation](../auto_examples/feature_selection/plot_rfe_with_cross_validation.html#sphx-glr-auto-examples-feature-selection-plot-rfe-with-cross-validation-py)例如 zero one loss 使用以通過 cross-validation （交叉驗證）執行遞歸特征消除。 ### 3.3.2.14. Brier 分數損失 [`brier_score_loss`](generated/sklearn.metrics.brier_score_loss.html#sklearn.metrics.brier_score_loss "sklearn.metrics.brier_score_loss") 函數計算二進制類的 [Brier 分數](https://en.wikipedia.org/wiki/Brier_score) 。引用維基百科： > “Brier 分數是一個特有的分數函數，用于衡量概率預測的準確性。它適用于預測必須將概率分配給一組相互排斥的離散結果的任務。” 該函數返回的是實際結果與可能結果的預測概率之間均方差的得分。實際結果必須為1或0（真或假），而實際結果的預測概率可以是0到1之間的值。 Brier 分數損失也在0到1之間，分數越低（均方差越小），預測越準確。它可以被認為是對一組概率預測的 “校準” 的度量。 ![BS = \frac{1}{N} \sum_{t=1}^{N}(f_t - o_t)^2](https://box.kancloud.cn/6156e9795596f7573d196022ae976075_171x54.jpg) 其中: ![N](https://box.kancloud.cn/08e4021d29ea7df2884794031c0a46ab_16x12.jpg) 是預測的總數， ![f_t](https://box.kancloud.cn/7f898c2f14804992e3c9a9f1c0325b5d_14x16.jpg) 是實際結果 ![o_t](https://box.kancloud.cn/f26397db34f90ba646578829418a63bd_14x11.jpg) 的預測概率。這是一個使用這個函數的小例子: ``` >>> import numpy as np >>> from sklearn.metrics import brier_score_loss >>> y_true = np.array([0, 1, 1, 0]) >>> y_true_categorical = np.array(["spam", "ham", "ham", "spam"]) >>> y_prob = np.array([0.1, 0.9, 0.8, 0.4]) >>> y_pred = np.array([0, 1, 1, 0]) >>> brier_score_loss(y_true, y_prob) 0.055 >>> brier_score_loss(y_true, 1-y_prob, pos_label=0) 0.055 >>> brier_score_loss(y_true_categorical, y_prob, pos_label="ham") 0.055 >>> brier_score_loss(y_true, y_prob > 0.5) 0.0 ``` 示例: - 請參閱分類器的概率校準 [Probability calibration of classifiers](../auto_examples/calibration/plot_calibration.html#sphx-glr-auto-examples-calibration-plot-calibration-py) ，通過 Brier 分數損失使用示例來執行分類器的概率校準。參考文獻: - 1. Brier, [以概率表示的預測驗證](http://docs.lib.noaa.gov/rescue/mwr/078/mwr-078-01-0001.pdf) , 月度天氣評估78.1（1950） ## 3.3.3. 多標簽排名指標在多分類學習中，每個樣本可以具有與其相關聯的任何數量的真實標簽。目標是給予高分，更好地評價真實標簽。 ### 3.3.3.1. 覆蓋誤差 [`coverage_error`](generated/sklearn.metrics.coverage_error.html#sklearn.metrics.coverage_error "sklearn.metrics.coverage_error") 函數計算必須包含在最終預測中的標簽的平均數，以便預測所有真正的標簽。如果您想知道有多少 top 評分標簽，您必須通過平均來預測，而不會丟失任何真正的標簽，這很有用。因此，此指標的最佳價值是真正標簽的平均數量。 Note 我們的實現的分數比 Tsoumakas 等人在2010年的分數大1。這擴展了它來處理一個具有0個真實標簽實例的退化情況。正式地，給定真實標簽 ![y \in \left\{0, 1\right\}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/99496fa61a296fc96d962538f62d613b_168x20.jpg) 的二進制指示矩陣和與每個標簽 ![\hat{f} \in \mathbb{R}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/830203b1fefe4cf7ee211f050f60a73c_138x21.jpg) 相關聯的分數，覆蓋范圍被定義為 ![coverage(y, \hat{f}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} \max_{j:y_{ij} = 1} \text{rank}_{ij}](https://box.kancloud.cn/0fb6115e9f25fcfbd4bab2844e3c8dc6_355x55.jpg) 與 ![\text{rank}_{ij} = \left|\left\{k: \hat{f}_{ik} \geq \hat{f}_{ij} \right\}\right|](https://box.kancloud.cn/a9131e4c2e30430d83cde05ac878ac37_195x35.jpg) 。給定等級定義，通過給出將被分配給所有綁定值的最大等級， `y_scores` 中的關系會被破壞。這是一個使用這個函數的小例子: ``` >>> import numpy as np >>> from sklearn.metrics import coverage_error >>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) >>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) >>> coverage_error(y_true, y_score) 2.5 ``` ### 3.3.3.2. 標簽排名平均精度 [`label_ranking_average_precision_score`](generated/sklearn.metrics.label_ranking_average_precision_score.html#sklearn.metrics.label_ranking_average_precision_score "sklearn.metrics.label_ranking_average_precision_score") 函數實現標簽排名平均精度（LRAP）。該度量值與 [`average_precision_score`](generated/sklearn.metrics.average_precision_score.html#sklearn.metrics.average_precision_score "sklearn.metrics.average_precision_score") 函數相關聯，但是基于標簽排名的概念，而不是精確度和召回。標簽排名平均精度（LRAP）是分配給每個樣本的每個真實標簽的平均值，真實對總標簽與較低分數的比率。如果能夠為每個樣本相關標簽提供更好的排名，這個指標就會產生更好的分數。獲得的得分總是嚴格大于0，最佳值為1。如果每個樣本只有一個相關標簽，則標簽排名平均精度等于 [平均倒數等級](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) 。正式地，給定真實標簽 ![y \in \mathcal{R}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/f7d7f6dafae1465920170b71d2e8ed7e_139x17.jpg) 的二進制指示矩陣和與每個標簽 ![\hat{f} \in \mathcal{R}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/2b3bb2acf5548a8c7844591812f9a5de_141x21.jpg) 相關聯的得分，平均精度被定義為 ![LRAP(y, \hat{f}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} \frac{1}{|y_i|} \sum_{j:y_{ij} = 1} \frac{|\mathcal{L}_{ij}|}{\text{rank}_{ij}}](https://box.kancloud.cn/549da3b83be31beef9fe61381af749ce_373x59.jpg) 與 ![\mathcal{L}_{ij} = \left\{k: y_{ik} = 1, \hat{f}_{ik} \geq \hat{f}_{ij} \right\}](https://box.kancloud.cn/a16c29007cc06ecc6337c03518dcd0bd_223x34.jpg)， ![\text{rank}_{ij} = \left|\left\{k: \hat{f}_{ik} \geq \hat{f}_{ij} \right\}\right|](https://box.kancloud.cn/a9131e4c2e30430d83cde05ac878ac37_195x35.jpg) 和 ![|\cdot|](https://box.kancloud.cn/d4590e3f2b051ecb5ee082686a3277fb_19x19.jpg) 是集合的 l0 范數或基數。這是一個使用這個函數的小例子: ``` >>> import numpy as np >>> from sklearn.metrics import label_ranking_average_precision_score >>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) >>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) >>> label_ranking_average_precision_score(y_true, y_score) 0.416... ``` ### 3.3.3.3. 排序損失 [`label_ranking_loss`](generated/sklearn.metrics.label_ranking_loss.html#sklearn.metrics.label_ranking_loss "sklearn.metrics.label_ranking_loss") 函數計算在樣本上平均排序錯誤的標簽對數量的排序損失，即真實標簽的分數低于假標簽，由虛假和真實標簽的倒數加權。最低可實現的排名損失為零。正式地，給定真相標簽 ![y \in \left\{0, 1\right\}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/99496fa61a296fc96d962538f62d613b_168x20.jpg) 的二進制指示矩陣和與每個標簽 ![\hat{f} \in \mathbb{R}^{n_\text{samples} \times n_\text{labels}}](https://box.kancloud.cn/830203b1fefe4cf7ee211f050f60a73c_138x21.jpg) 相關聯的得分，排序損失被定義為 ![\text{ranking\_loss}(y, \hat{f}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} \frac{1}{|y_i|(n_\text{labels} - |y_i|)} \left|\left\{(k, l): \hat{f}_{ik} < \hat{f}_{il}, y_{ik} = 1, y_{il} = 0?\right\}\right|](https://box.kancloud.cn/0df989f2a7bdea33f129725711a67e84_566x44.jpg) 其中 ![|\cdot|](https://box.kancloud.cn/d4590e3f2b051ecb5ee082686a3277fb_19x19.jpg) 是 ![\ell_0](https://box.kancloud.cn/8f9f212fe4aff91bfceccf86970f3036_14x16.jpg) 范數或集合的基數。這是一個使用這個函數的小例子: ``` >>> import numpy as np >>> from sklearn.metrics import label_ranking_loss >>> y_true = np.array([[1, 0, 0], [0, 0, 1]]) >>> y_score = np.array([[0.75, 0.5, 1], [1, 0.2, 0.1]]) >>> label_ranking_loss(y_true, y_score) 0.75... >>> # With the following prediction, we have perfect and minimal loss >>> y_score = np.array([[1.0, 0.1, 0.2], [0.1, 0.2, 0.9]]) >>> label_ranking_loss(y_true, y_score) 0.0 ``` 參考文獻: - Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). 挖掘多標簽數據。在數據挖掘和知識發現手冊（第667-685頁）。美國 Springer. ## 3.3.4. 回歸指標該 [`sklearn.metrics`](classes.html#module-sklearn.metrics "sklearn.metrics") 模塊實現了一些 loss, score 以及 utility 函數以測量 regression（回歸）的性能. 其中一些已經被加強以處理多個輸出的場景: [`mean_squared_error`](generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error "sklearn.metrics.mean_squared_error"), [`mean_absolute_error`](generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error "sklearn.metrics.mean_absolute_error"), [`explained_variance_score`](generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score "sklearn.metrics.explained_variance_score") 和 [`r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score"). 這些函數有 `multioutput` 這樣一個 keyword（關鍵的）參數, 它指定每一個目標的 score（得分）或 loss（損失）的平均值的方式. 默認是 `'uniform_average'`, 其指定了輸出時一致的權重均值. 如果一個 `ndarray` 的 shape `(n_outputs,)` 被傳遞, 則其中的 entries（條目）將被解釋為權重，并返回相應的加權平均值. 如果 `multioutput` 指定了 `'raw_values'` , 則所有未改變的部分 score（得分）或 loss（損失）將以 `(n_outputs,)` 形式的數組返回. 該 [`r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score") 和 [`explained_variance_score`](generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score "sklearn.metrics.explained_variance_score") 函數接受一個額外的值 `'variance_weighted'` 用于 `multioutput` 參數. 該選項通過相應目標變量的方差使得每個單獨的 score 進行加權. 該設置量化了全局捕獲的未縮放方差. 如果目標變量的大小不一樣, 則該 score 更好地解釋了較高的方差變量. `multioutput='variance_weighted'` 是 [`r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score") 的默認值以向后兼容. 以后該值會被改成 `uniform_average`. ### 3.3.4.1. 解釋方差得分該 [`explained_variance_score`](generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score "sklearn.metrics.explained_variance_score") 函數計算了 [explained variance regression score（解釋的方差回歸得分）](https://en.wikipedia.org/wiki/Explained_variation). 如果 ![\hat{y}](https://box.kancloud.cn/277d247a09c0ccb4240fe50a4806934e_9x17.jpg) 是預估的目標輸出, ![y](https://box.kancloud.cn/0255a09d3dccb9843dcf063bbeec303f_9x12.jpg) 是相應（正確的）目標輸出, 并且 ![Var](https://box.kancloud.cn/40c5a0b6c189fe9d7bb20b2bf4f5c652_32x12.jpg) is [方差](https://en.wikipedia.org/wiki/Variance), 標準差的平方, 那么解釋的方差預估如下: ![\texttt{explained\_{}variance}(y, \hat{y}) = 1 - \frac{Var\{ y - \hat{y}\}}{Var\{y\}}](https://box.kancloud.cn/2bc78541e33cc29bff5b0a3f1f141401_356x44.jpg) 最好的得分是 1.0, 值越低越差. 下面是一下有關 [`explained_variance_score`](generated/sklearn.metrics.explained_variance_score.html#sklearn.metrics.explained_variance_score "sklearn.metrics.explained_variance_score") 函數使用的一些例子: ``` >>> from sklearn.metrics import explained_variance_score >>> y_true = [3, -0.5, 2, 7] >>> y_pred = [2.5, 0.0, 2, 8] >>> explained_variance_score(y_true, y_pred) 0.957... >>> y_true = [[0.5, 1], [-1, 1], [7, -6]] >>> y_pred = [[0, 2], [-1, 2], [8, -5]] >>> explained_variance_score(y_true, y_pred, multioutput='raw_values') ... array([ 0.967..., 1. ]) >>> explained_variance_score(y_true, y_pred, multioutput=[0.3, 0.7]) ... 0.990... ``` ### 3.3.4.2. 平均絕對誤差該 [`mean_absolute_error`](generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error "sklearn.metrics.mean_absolute_error") 函數計算了 [平均絕對誤差](https://en.wikipedia.org/wiki/Mean_absolute_error), 一個對應絕對誤差損失預期值或者 ![l1](https://box.kancloud.cn/3f2bf3b1e65ec3cf41a273465a2cb464_14x14.jpg)-norm 損失的風險度量. 如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg)-th 樣本的預測值, 并且 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是對應的真實值, 則平均絕對誤差 (MAE) 預估的 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 定義如下 ![\text{MAE}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \left| y_i - \hat{y}_i \right|.](https://box.kancloud.cn/237ebedaff706b133a7cf063cc26a923_302x55.jpg) 下面是一個有關 [`mean_absolute_error`](generated/sklearn.metrics.mean_absolute_error.html#sklearn.metrics.mean_absolute_error "sklearn.metrics.mean_absolute_error") 函數用法的小例子: ``` >>> from sklearn.metrics import mean_absolute_error >>> y_true = [3, -0.5, 2, 7] >>> y_pred = [2.5, 0.0, 2, 8] >>> mean_absolute_error(y_true, y_pred) 0.5 >>> y_true = [[0.5, 1], [-1, 1], [7, -6]] >>> y_pred = [[0, 2], [-1, 2], [8, -5]] >>> mean_absolute_error(y_true, y_pred) 0.75 >>> mean_absolute_error(y_true, y_pred, multioutput='raw_values') array([ 0.5, 1. ]) >>> mean_absolute_error(y_true, y_pred, multioutput=[0.3, 0.7]) ... 0.849... ``` ### 3.3.4.3. 均方誤差該 [`mean_squared_error`](generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error "sklearn.metrics.mean_squared_error") 函數計算了 [均方誤差](https://en.wikipedia.org/wiki/Mean_squared_error), 一個對應于平方（二次）誤差或損失的預期值的風險度量. 如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg)-th 樣本的預測值, 并且 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是對應的真實值, 則均方誤差（MSE）預估的 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 定義如下 ![\text{MSE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (y_i - \hat{y}_i)^2.](https://box.kancloud.cn/15e8ddbde0468048a7e3cb5ff72178b5_304x55.jpg) 下面是一個有關 [`mean_squared_error`](generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error "sklearn.metrics.mean_squared_error") 函數用法的小例子: ``` >>> from sklearn.metrics import mean_squared_error >>> y_true = [3, -0.5, 2, 7] >>> y_pred = [2.5, 0.0, 2, 8] >>> mean_squared_error(y_true, y_pred) 0.375 >>> y_true = [[0.5, 1], [-1, 1], [7, -6]] >>> y_pred = [[0, 2], [-1, 2], [8, -5]] >>> mean_squared_error(y_true, y_pred) 0.7083... ``` Examples: - 點擊 [Gradient Boosting regression](../auto_examples/ensemble/plot_gradient_boosting_regression.html#sphx-glr-auto-examples-ensemble-plot-gradient-boosting-regression-py)查看均方誤差用于梯度上升（gradient boosting）回歸的使用例子。 ### 3.3.4.4. 均方誤差對數該 [`mean_squared_log_error`](generated/sklearn.metrics.mean_squared_log_error.html#sklearn.metrics.mean_squared_log_error "sklearn.metrics.mean_squared_log_error") 函數計算了一個對應平方對數（二次）誤差或損失的預估值風險度量. 如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg)-th 樣本的預測值, 并且 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是對應的真實值, 則均方誤差對數（MSLE）預估的 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 定義如下 ![\text{MSLE}(y, \hat{y}) = \frac{1}{n_\text{samples}} \sum_{i=0}^{n_\text{samples} - 1} (\log_e (1 + y_i) - \log_e (1 + \hat{y}_i) )^2.](https://box.kancloud.cn/dcbceb4d5febafaf34d96cea44ddb566_465x55.jpg) 其中 ![\log_e (x)](https://box.kancloud.cn/2f7277025fecc7b33c608e12a780e647_53x19.jpg) 表示 ![x](https://box.kancloud.cn/8b09310f09e864e3a4f6d92b559afe29_10x8.jpg) 的自然對數. 當目標具有指數增長的趨勢時, 該指標最適合使用, 例如人口數量, 跨年度商品的平均銷售額等. 請注意, 該指標會對低于預測的估計值進行估計. 下面是一個有關 [`mean_squared_log_error`](generated/sklearn.metrics.mean_squared_log_error.html#sklearn.metrics.mean_squared_log_error "sklearn.metrics.mean_squared_log_error") 函數用法的小例子: ``` >>> from sklearn.metrics import mean_squared_log_error >>> y_true = [3, 5, 2.5, 7] >>> y_pred = [2.5, 5, 4, 8] >>> mean_squared_log_error(y_true, y_pred) 0.039... >>> y_true = [[0.5, 1], [1, 2], [7, 6]] >>> y_pred = [[0.5, 2], [1, 2.5], [8, 8]] >>> mean_squared_log_error(y_true, y_pred) 0.044... ``` ### 3.3.4.5. 中位絕對誤差該 [`median_absolute_error`](generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error "sklearn.metrics.median_absolute_error") 函數尤其有趣, 因為它的離群值很強. 通過取目標和預測之間的所有絕對差值的中值來計算損失. 如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg)-th 樣本的預測值, 并且 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是對應的真實值, 則中位絕對誤差（MedAE）預估的 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 定義如下 ![\text{MedAE}(y, \hat{y}) = \text{median}(\mid y_1 - \hat{y}_1 \mid, \ldots, \mid y_n - \hat{y}_n \mid).](https://box.kancloud.cn/cd4fd08eb6c0c1dc2ec0e8a1b527b1c6_391x19.jpg) 該 [`median_absolute_error`](generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error "sklearn.metrics.median_absolute_error") 函數不支持多輸出. 下面是一個有關 [`median_absolute_error`](generated/sklearn.metrics.median_absolute_error.html#sklearn.metrics.median_absolute_error "sklearn.metrics.median_absolute_error") 函數用法的小例子: ``` >>> from sklearn.metrics import median_absolute_error >>> y_true = [3, -0.5, 2, 7] >>> y_pred = [2.5, 0.0, 2, 8] >>> median_absolute_error(y_true, y_pred) 0.5 ``` ### 3.3.4.6. R2 score, 可決系數該 [`r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score") 函數計算了 computes R2, 即 [可決系數](https://en.wikipedia.org/wiki/Coefficient_of_determination). 它提供了將來樣本如何可能被模型預測的估量. 最佳分數為 1.0, 可以為負數（因為模型可能會更糟）. 總是預測 y 的預期值，不考慮輸入特征的常數模型將得到 R^2 得分為 0.0. 如果 ![\hat{y}_i](https://box.kancloud.cn/a71dac1a33f465fa5197c33da8720585_13x17.jpg) 是 ![i](https://box.kancloud.cn/9f004454a20e19932b2a071a380ff8ff_6x13.jpg)-th 樣本的預測值, 并且 ![y_i](https://box.kancloud.cn/e79627211612bc56c4f7d926a93fbe8d_13x12.jpg) 是對應的真實值, 則 R2 得分預估的 ![n_{\text{samples}}](https://box.kancloud.cn/70c64e65bf204369a0c85d76a39ecf38_55x14.jpg) 定義如下 ![R^2(y, \hat{y}) = 1 - \frac{\sum_{i=0}^{n_{\text{samples}} - 1} (y_i - \hat{y}_i)^2}{\sum_{i=0}^{n_\text{samples} - 1} (y_i - \bar{y})^2}](https://box.kancloud.cn/c1c0b11f63ad138415e3c7661d24fd26_273x53.jpg) 其中 ![\bar{y} = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}} - 1} y_i](https://box.kancloud.cn/34472c0a4a8d151c01b20f772feadd89_186x31.jpg). 下面是一個有關 [`r2_score`](generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score "sklearn.metrics.r2_score") 函數用法的小例子: ``` >>> from sklearn.metrics import r2_score >>> y_true = [3, -0.5, 2, 7] >>> y_pred = [2.5, 0.0, 2, 8] >>> r2_score(y_true, y_pred) 0.948... >>> y_true = [[0.5, 1], [-1, 1], [7, -6]] >>> y_pred = [[0, 2], [-1, 2], [8, -5]] >>> r2_score(y_true, y_pred, multioutput='variance_weighted') ... 0.938... >>> y_true = [[0.5, 1], [-1, 1], [7, -6]] >>> y_pred = [[0, 2], [-1, 2], [8, -5]] >>> r2_score(y_true, y_pred, multioutput='uniform_average') ... 0.936... >>> r2_score(y_true, y_pred, multioutput='raw_values') ... array([ 0.965..., 0.908...]) >>> r2_score(y_true, y_pred, multioutput=[0.3, 0.7]) ... 0.925... ``` 示例: - 點擊 [Lasso and Elastic Net for Sparse Signals](../auto_examples/linear_model/plot_lasso_and_elasticnet.html#sphx-glr-auto-examples-linear-model-plot-lasso-and-elasticnet-py)查看關于R2用于評估在Lasso and Elastic Net on sparse signals上的使用. ## 3.3.5. 聚類指標該 [`sklearn.metrics`](classes.html#module-sklearn.metrics "sklearn.metrics") 模塊實現了一些 loss, score 和 utility 函數. 更多信息請參閱 [聚類性能度量](clustering.html#clustering-evaluation) 部分, 例如聚類, 以及用于二分聚類的 [Biclustering 評測](biclustering.html#biclustering-evaluation). ## 3.3.6. 虛擬估計在進行監督學習的過程中，簡單的 sanity check（理性檢查）包括將人的估計與簡單的經驗法則進行比較. [`DummyClassifier`](generated/sklearn.dummy.DummyClassifier.html#sklearn.dummy.DummyClassifier "sklearn.dummy.DummyClassifier") 實現了幾種簡單的分類策略: - `stratified` 通過在訓練集類分布方面來生成隨機預測. - `most_frequent` 總是預測訓練集中最常見的標簽. - `prior` always predicts the class that maximizes the class prior (like `most_frequent`) and ``predict_proba` returns the class prior. - `uniform` 隨機產生預測. - `constant` 總是預測用戶提供的常量標簽.A major motivation of this method is F1-scoring, when the positive class is in the minority. 這種方法的主要動機是 F1-scoring, 當 positive class（正類）較少時. 請注意, 這些所有的策略, `predict` 方法徹底的忽略了輸入數據! 為了說明 [`DummyClassifier`](generated/sklearn.dummy.DummyClassifier.html#sklearn.dummy.DummyClassifier "sklearn.dummy.DummyClassifier"), 首先讓我們創建一個 imbalanced dataset: ``` >>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>> iris = load_iris() >>> X, y = iris.data, iris.target >>> y[y != 1] = -1 >>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) ``` 接下來, 讓我們比較一下 `SVC` 和 `most_frequent` 的準確性. ``` >>> from sklearn.dummy import DummyClassifier >>> from sklearn.svm import SVC >>> clf = SVC(kernel='linear', C=1).fit(X_train, y_train) >>> clf.score(X_test, y_test) 0.63... >>> clf = DummyClassifier(strategy='most_frequent',random_state=0) >>> clf.fit(X_train, y_train) DummyClassifier(constant=None, random_state=0, strategy='most_frequent') >>> clf.score(X_test, y_test) 0.57... ``` 我們看到 `SVC` 沒有比一個 dummy classifier（虛擬分類器）好很多. 現在, 讓我們來更改一下 kernel: ``` >>> clf = SVC(kernel='rbf', C=1).fit(X_train, y_train) >>> clf.score(X_test, y_test) 0.97... ``` 我們看到準確率提升到將近 100%. 建議采用交叉驗證策略, 以更好地估計精度, 如果不是太耗 CPU 的話. 更多信息請參閱 [交叉驗證：評估估算器的表現](cross_validation.html#cross-validation) 部分. 此外，如果要優化參數空間，強烈建議您使用適當的方法; 更多詳情請參閱 [調整估計器的超參數](grid_search.html#grid-search) 部分. 通常來說，當分類器的準確度太接近隨機情況時，這可能意味著出現了一些問題: 特征沒有幫助, 超參數沒有正確調整, class 不平衡造成分類器有問題等… [`DummyRegressor`](generated/sklearn.dummy.DummyRegressor.html#sklearn.dummy.DummyRegressor "sklearn.dummy.DummyRegressor") 還實現了四個簡單的經驗法則來進行回歸: - `mean` 總是預測訓練目標的平均值. - `median` 總是預測訓練目標的中位數. - `quantile` 總是預測用戶提供的訓練目標的 quantile（分位數）. - `constant` 總是預測由用戶提供的常數值. 在以上所有的策略中, `predict` 方法完全忽略了輸入數據.