如何在 Python 中進行時間序列預測的網格搜索三次指數平滑 · Machine Learning Mastery 博客文章翻譯

# 如何在 Python 中進行時間序列預測的網格搜索三次指數平滑 > 原文： [https://machinelearningmastery.com/how-to-grid-search-triple-exponential-smoothing-for-time-series-forecasting-in-python/](https://machinelearningmastery.com/how-to-grid-search-triple-exponential-smoothing-for-time-series-forecasting-in-python/) 指數平滑是單變量數據的時間序列預測方法，可以擴展為支持具有系統趨勢或季節性成分的數據。通常的做法是使用優化過程來查找模型超參數，這些參數導致指數平滑模型具有給定時間序列數據集的最佳表現。此實踐僅適用于模型用于描述水平，趨勢和季節性的指數結構的系數。還可以自動優化指數平滑模型的其他超參數，例如是否對趨勢和季節性分量建模，如果是，是否使用加法或乘法方法對它們進行建模。在本教程中，您將了解如何開發一個框架，用于網格搜索所有指數平滑模型超參數，以進行單變量時間序列預測。完成本教程后，您將了解： * 如何使用前向驗證從頭開始開發網格搜索 ETS 模型的框架。 * 如何為女性出生日常時間序列數據網格搜索 ETS 模型超參數。 * 如何針對洗發水銷售，汽車銷售和溫度的月度時間序列數據網格搜索 ETS 模型超參數。讓我們開始吧。 * **Oct8 / 2018** ：更新了 ETS 模型的擬合，以使用 NumPy 陣列修復乘法趨勢/季節性問題（感謝 Amit Amola）。 ![How to Grid Search Triple Exponential Smoothing for Time Series Forecasting in Python](https://img.kancloud.cn/4b/3e/4b3e74e4f8b8fe5601e098f5114d539e_640x429.jpg) 如何在 Python 中進行時間序列預測的網格搜索三次指數平滑照片由 [john mcsporran](https://www.flickr.com/photos/127130111@N06/16375806988/) 拍攝，保留一些權利。 ## 教程概述本教程分為六個部分;他們是： 1. 時間序列預測的指數平滑 2. 開發網格搜索框架 3. 案例研究 1：沒有趨勢或季節性 4. 案例研究 2：趨勢 5. 案例研究 3：季節性 6. 案例研究 4：趨勢和季節性 ## 時間序列預測的指數平滑指數平滑是單變量數據的時間序列預測方法。像 Box-Jenkins ARIMA 系列方法這樣的時間序列方法開發了一種模型，其中預測是近期過去觀察或滯后的加權線性和。指數平滑預測方法的類似之處在于預測是過去觀察的加權和，但模型明確地使用指數減小的權重用于過去的觀察。具體而言，過去的觀察以幾何減小的比率加權。 > 使用指數平滑方法產生的預測是過去觀測的加權平均值，隨著觀測結果的變化，權重呈指數衰減。換句話說，觀察越近，相關重量越高。 - 第 171 頁，[預測：原則和實踐](https://amzn.to/2xlJsfV)，2013。指數平滑方法可以被視為對等，并且是流行的 Box-Jenkins ARIMA 類時間序列預測方法的替代方法。總的來說，這些方法有時被稱為 ETS 模型，參考 _ 錯誤 _，_ 趨勢 _ 和 _ 季節性 _ 的顯式建模。指數平滑有三種類型;他們是： * **單指數平滑**或 SES，用于沒有趨勢或季節性的單變量數據。 * **雙指數平滑**用于支持趨勢的單變量數據。 * **三重指數平滑**，或 Holt-Winters 指數平滑，支持趨勢和季節性。三指數平滑模型通過趨勢性質（加法，乘法或無）的性質和季節性的性質（加法，乘法或無）來表示單指數和雙指數平滑，以及任何阻尼趨勢。 ## 開發網格搜索框架在本節中，我們將為給定的單變量時間序列預測問題開發一個網格搜索指數平滑模型超參數的框架。我們將使用 statsmodels 庫提供的 [Holt-Winters 指數平滑](http://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html)的實現。該模型具有超參數，可控制為系列，趨勢和季節性執行的指數的性質，具體為： * **smoothing_level** （ _alpha_ ）：該級別的平滑系數。 * **smoothing_slope** （ _beta_ ）：趨勢的平滑系數。 * **smoothing_seasonal** （ _gamma_ ）：季節性成分的平滑系數。 * **damping_slope** （ _phi_ ）：阻尼趨勢的系數。在定義模型時，可以指定所有這四個超參數。如果未指定它們，庫將自動調整模型并找到這些超參數的最佳值（例如 _optimized = True_ ）。還有其他超參數，模型不會自動調整您可能想要指定的;他們是： * **趨勢**：趨勢分量的類型，作為加法的“_ 加 _”或乘法的“ _mul_ ”。可以通過將趨勢設置為“無”來禁用對趨勢建模。 * **阻尼**：趨勢分量是否應該被阻尼，無論是真還是假。 * **季節性**：季節性成分的類型，為“_ 添加 _”為添加劑或“ _mul_ ”為乘法。可以通過將季節性組件設置為“無”來禁用它。 * **seasonal_periods** ：季節性時間段內的時間步數，例如在一年一度的季節性結構中 12 個月 12 個月。 * **use_boxcox** ：是否執行系列的冪變換（True / False）或指定變換的 lambda。如果您對問題了解得足以指定其中一個或多個參數，則應指定它們。如果沒有，您可以嘗試網格搜索這些參數。我們可以通過定義一個適合具有給定配置的模型的函數來開始，并進行一步預測。下面的 _exp_smoothing_forecast（）_ 實現了這種行為。該函數采用連續先前觀察的數組或列表以及用于配置模型的配置參數列表。配置參數依次為：趨勢類型，阻尼類型，季節性類型，季節周期，是否使用 Box-Cox 變換，以及在擬合模型時是否消除偏差。 ```py # one-step Holt Winter's Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] ``` 接下來，我們需要建立一些函數，通過[前向驗證](https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/)重復擬合和評估模型，包括將數據集拆分為訓練和測試集并評估一步預測。我們可以使用給定指定大小的分割的切片來分割列表或 NumPy 數據數組，例如，從測試集中的數據中使用的時間步數。下面的 _train_test_split（）_ 函數為提供的數據集和要在測試集中使用的指定數量的時間步驟實現此功能。 ```py # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] ``` 在對測試數據集中的每個步驟進行預測之后，需要將它們與測試集進行比較以計算錯誤分數。時間序列預測有許多流行的錯誤分數。在這種情況下，我們將使用均方根誤差（RMSE），但您可以將其更改為您的首選度量，例如 MAPE，MAE 等下面的 _measure_rmse（）_ 函數將根據實際（測試集）和預測值列表計算 RMSE。 ```py # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) ``` 我們現在可以實現前向驗證方案。這是評估尊重觀測時間順序的時間序列預測模型的標準方法。首先，使用 _train_test_split（）_ 函數將提供的單變量時間序列數據集分成訓練集和測試集。然后枚舉測試集中的觀察數。對于每一個，我們在所有歷史記錄中擬合模型并進行一步預測。然后將對時間步驟的真實觀察添加到歷史中，并重復該過程。調用 _exp_smoothing_forecast（）_ 函數以適合模型并進行預測。最后，通過調用 _measure_rmse（）_ 函數，將所有一步預測與實際測試集進行比較，計算錯誤分數。下面的 _walk_forward_validation（）_ 函數實現了這一點，采用了單變量時間序列，在測試集中使用的一些時間步驟，以及一組模型配置。 ```py # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error ``` 如果您對進行多步預測感興趣，可以在 _exp_smoothing_forecast（）_ 函數中更改 _predict（）_ 的調用，并更改 _ 中的錯誤計算 measure_rmse（）_ 功能。我們可以使用不同的模型配置列表重復調用 _walk_forward_validation（）_。一個可能的問題是，可能不會為模型調用模型配置的某些組合，并且會拋出異常，例如，指定數據中季節性結構的一些但不是所有方面。此外，某些型號還可能會對某些數據發出警告，例如：來自 statsmodels 庫調用的線性代數庫。我們可以在網格搜索期間捕獲異常并忽略警告，方法是將所有調用包含在 _walk_forward_validation（）_ 中，并使用 try-except 和 block 來忽略警告。我們還可以添加調試支持來禁用這些保護，以防我們想要查看實際情況。最后，如果確實發生錯誤，我們可以返回 _ 無 _ 結果;否則，我們可以打印一些關于評估的每個模型的技能的信息。當評估大量模型時，這很有用。下面的 _score_model（）_ 函數實現了這個并返回（鍵和結果）的元組，其中鍵是測試模型配置的字符串版本。 ```py # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) ``` 接下來，我們需要一個循環來測試不同模型配置的列表。這是驅動網格搜索過程的主要功能，并將為每個模型配置調用 _score_model（）_ 函數。通過并行評估模型配置，我們可以大大加快網格搜索過程。一種方法是使用 [Joblib 庫](https://pythonhosted.org/joblib/)。我們可以定義一個 _Parallel_ 對象，其中包含要使用的核心數，并將其設置為硬件中檢測到的 CPU 核心數。 ```py executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') ``` 然后我們可以創建一個并行執行的任務列表，這將是對我們擁有的每個模型配置的 _score_model（）_ 函數的一次調用。 ```py tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) ``` 最后，我們可以使用 _Parallel_ 對象并行執行任務列表。 ```py scores = executor(tasks) ``` 而已。我們還可以提供評估所有模型配置的非并行版本，以防我們想要調試某些內容。 ```py scores = [score_model(data, n_test, cfg) for cfg in cfg_list] ``` 評估配置列表的結果將是元組列表，每個元組都有一個名稱，該名稱總結了特定的模型配置，并且使用該配置評估的模型的錯誤為 RMSE，如果出現錯誤則為 None。我們可以使用“無”過濾掉所有分數。 ```py scores = [r for r in scores if r[1] != None] ``` 然后我們可以按照升序排列列表中的所有元組（最好是第一個），然后返回此分數列表以供審閱。給定單變量時間序列數據集，模型配置列表（列表列表）以及在測試集中使用的時間步數，下面的 _grid_search（）_ 函數實現此行為。可選的并行參數允許對所有內核的模型進行開啟或關閉調整，默認情況下處于打開狀態。 ```py # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores ``` 我們差不多完成了。剩下要做的唯一事情是定義模型配置列表以嘗試數據集。我們可以一般地定義它。我們可能想要指定的唯一參數是系列中季節性組件的周期性（如果存在）。默認情況下，我們假設沒有季節性組件。下面的 _exp_smoothing_configs（）_ 函數將創建要評估的模型配置列表。可以指定季節性時段的可選列表，您甚至可以更改該功能以指定您可能了解的有關時間序列的其他元素。從理論上講，有 72 種可能的模型配置需要評估，但在實踐中，許多模型配置無效并會導致我們將陷入和忽略的錯誤。 ```py # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models ``` 我們現在有一個網格搜索三重指數平滑模型超參數的框架，通過一步前進驗證。它是通用的，適用于作為列表或 NumPy 數組提供的任何內存中單變量時間序列。我們可以通過在人為設計的 10 步數據集上進行測試來確保所有部分協同工作。下面列出了完整的示例。 ```py # grid search holt winter's exponential smoothing from math import sqrt from multiprocessing import cpu_count from joblib import Parallel from joblib import delayed from warnings import catch_warnings from warnings import filterwarnings from statsmodels.tsa.holtwinters import ExponentialSmoothing from sklearn.metrics import mean_squared_error from numpy import array # one-step Holt Winter’s Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models if __name__ == '__main__': # define dataset data = [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0] print(data) # data split n_test = 4 # model configs cfg_list = exp_smoothing_configs() # grid search scores = grid_search(data, cfg_list, n_test) print('done') # list top 3 configs for cfg, error in scores[:3]: print(cfg, error) ``` 首先運行該示例打印設計的時間序列數據集。接下來，在評估模型配置及其錯誤時報告它們。最后，報告前三種配置的配置和錯誤。 ```py [10.0, 20.0, 30.0, 40.0, 50.0, 60.0, 70.0, 80.0, 90.0, 100.0] > Model[[None, False, None, None, True, True]] 1.380 > Model[[None, False, None, None, True, False]] 10.000 > Model[[None, False, None, None, False, True]] 2.563 > Model[[None, False, None, None, False, False]] 10.000 done [None, False, None, None, True, True] 1.379824445857423 [None, False, None, None, False, True] 2.5628662672606612 [None, False, None, None, False, False] 10.0 ``` 我們不報告模型本身優化的模型參數。假設您可以通過指定更廣泛的超參數來再次獲得相同的結果，并允許庫找到相同的內部參數。您可以通過重新配置具有相同配置的獨立模型并在模型擬合上打印' _params_ '屬性的內容來訪問這些內部參數;例如： ```py print(model_fit.params) ``` 現在我們有了一個強大的網格搜索框架來搜索 ETS 模型超參數，讓我們在一套標準的單變量時間序列數據集上進行測試。選擇數據集用于演示目的;我并不是說 ETS 模型是每個數據集的最佳方法，在某些情況下，SARIMA 或其他東西可能更合適。 ## 案例研究 1：沒有趨勢或季節性 “每日女性分娩”數據集總結了 1959 年美國加利福尼亞州每日女性總分娩數。數據集沒有明顯的趨勢或季節性成分。 ![Line Plot of the Daily Female Births Dataset](https://img.kancloud.cn/82/c2/82c2332333012a46b0561998c9b6224b_1440x780.jpg) 每日女性出生數據集的線圖您可以從 [DataMarket](https://datamarket.com/data/set/235k/daily-total-female-births-in-california-1959#!ds=235k&display=line) 了解有關數據集的更多信息。直接從這里下載數據集： * [每日總數 - 女性分娩.sv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv) 在當前工作目錄中使用文件名“ _daily-total-female-births.csv_ ”保存文件。我們可以使用函數 _read_csv（）_ 將此數據集作為 Pandas 系列加載。 ```py series = read_csv('daily-total-female-births.csv', header=0, index_col=0) ``` 數據集有一年或 365 個觀測值。我們將使用前 200 個進行訓練，將剩余的 165 個作為測試集。下面列出了搜索每日女性單變量時間序列預測問題的完整示例網格。 ```py # grid search ets models for daily female births from math import sqrt from multiprocessing import cpu_count from joblib import Parallel from joblib import delayed from warnings import catch_warnings from warnings import filterwarnings from statsmodels.tsa.holtwinters import ExponentialSmoothing from sklearn.metrics import mean_squared_error from pandas import read_csv from numpy import array # one-step Holt Winter’s Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models if __name__ == '__main__': # load dataset series = read_csv('daily-total-female-births.csv', header=0, index_col=0) data = series.values # data split n_test = 165 # model configs cfg_list = exp_smoothing_configs() # grid search scores = grid_search(data[:,0], cfg_list, n_test) print('done') # list top 3 configs for cfg, error in scores[:3]: print(cfg, error) ``` 運行該示例可能需要幾分鐘，因為在現代硬件上安裝每個 ETS 模型大約需要一分鐘。在評估模型時打印模型配置和 RMSE 在運行結束時報告前三個模型配置及其錯誤。我們可以看到最好的結果是大約 6.96 個出生的 RMSE，具有以下配置： * **趨勢**：乘法 * **阻尼**：錯誤 * **季節性**：無 * **季節性時期**：無 * **Box-Cox 變換**：是的 * **刪除偏差**：是的令人驚訝的是，假設乘法趨勢的模型比不具有乘法趨勢的模型表現得更好。除非我們拋棄假設和網格搜索模型，否則我們不會知道情況就是這樣。 ```py > Model[['add', False, None, None, True, True]] 7.081 > Model[['add', False, None, None, True, False]] 7.113 > Model[['add', False, None, None, False, True]] 7.112 > Model[['add', False, None, None, False, False]] 7.115 > Model[['add', True, None, None, True, True]] 7.118 > Model[['add', True, None, None, True, False]] 7.170 > Model[['add', True, None, None, False, True]] 7.113 > Model[['add', True, None, None, False, False]] 7.126 > Model[['mul', True, None, None, True, True]] 7.118 > Model[['mul', True, None, None, True, False]] 7.170 > Model[['mul', True, None, None, False, True]] 7.113 > Model[['mul', True, None, None, False, False]] 7.126 > Model[['mul', False, None, None, True, True]] 6.961 > Model[['mul', False, None, None, True, False]] 6.985 > Model[[None, False, None, None, True, True]] 7.169 > Model[[None, False, None, None, True, False]] 7.212 > Model[[None, False, None, None, False, True]] 7.117 > Model[[None, False, None, None, False, False]] 7.126 done ['mul', False, None, None, True, True] 6.960703917145126 ['mul', False, None, None, True, False] 6.984513598720297 ['add', False, None, None, True, True] 7.081359856193836 ``` ## 案例研究 2：趨勢 “洗發水”數據集總結了三年內洗發水的月銷售額。數據集包含明顯的趨勢，但沒有明顯的季節性成分。 ![Line Plot of the Monthly Shampoo Sales Dataset](https://img.kancloud.cn/ae/a5/aea5992c9bbc15a4ef6046500013d962_1438x776.jpg) 月度洗發水銷售數據集的線圖您可以從 [DataMarket](https://datamarket.com/data/set/22r0/sales-of-shampoo-over-a-three-year-period#!ds=22r0&display=line) 了解有關數據集的更多信息。直接從這里下載數據集： * [shampoo.csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/shampoo.csv) 在當前工作目錄中使用文件名“shampoo.csv”保存文件。我們可以使用函數 _read_csv（）_ 將此數據集作為 Pandas 系列加載。 ```py # parse dates def custom_parser(x): return datetime.strptime('195'+x, '%Y-%m') # load dataset series = read_csv('shampoo.csv', header=0, index_col=0, date_parser=custom_parser) ``` 數據集有三年，或 36 個觀測值。我們將使用前 24 個用于訓練，其余 12 個用作測試集。下面列出了搜索洗發水銷售單變量時間序列預測問題的完整示例網格。 ```py # grid search ets models for monthly shampoo sales from math import sqrt from multiprocessing import cpu_count from joblib import Parallel from joblib import delayed from warnings import catch_warnings from warnings import filterwarnings from statsmodels.tsa.holtwinters import ExponentialSmoothing from sklearn.metrics import mean_squared_error from pandas import read_csv from numpy import array # one-step Holt Winter’s Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models if __name__ == '__main__': # load dataset series = read_csv('shampoo.csv', header=0, index_col=0) data = series.values # data split n_test = 12 # model configs cfg_list = exp_smoothing_configs() # grid search scores = grid_search(data[:,0], cfg_list, n_test) print('done') # list top 3 configs for cfg, error in scores[:3]: print(cfg, error) ``` 鑒于存在少量觀察，運行該示例很快。在評估模型時打印模型配置和 RMSE。在運行結束時報告前三個模型配置及其錯誤。我們可以看到最好的結果是 RMSE 約為 83.74 銷售，具有以下配置： * **趨勢**：乘法 * **阻尼**：錯誤 * **季節性**：無 * **季節性時期**：無 * **Box-Cox 變換**：錯誤 * **刪除偏差**：錯誤 ```py > Model[['add', False, None, None, False, True]] 106.431 > Model[['add', False, None, None, False, False]] 104.874 > Model[['add', True, None, None, False, False]] 103.069 > Model[['add', True, None, None, False, True]] 97.918 > Model[['mul', True, None, None, False, True]] 95.337 > Model[['mul', True, None, None, False, False]] 102.152 > Model[['mul', False, None, None, False, True]] 86.406 > Model[['mul', False, None, None, False, False]] 83.747 > Model[[None, False, None, None, False, True]] 99.416 > Model[[None, False, None, None, False, False]] 108.031 done ['mul', False, None, None, False, False] 83.74666940175238 ['mul', False, None, None, False, True] 86.40648953786152 ['mul', True, None, None, False, True] 95.33737598817238 ``` ## 案例研究 3：季節性 “月平均溫度”數據集總結了 1920 至 1939 年華氏諾丁漢城堡的月平均氣溫，以華氏度為單位。數據集具有明顯的季節性成分，沒有明顯的趨勢。 ![Line Plot of the Monthly Mean Temperatures Dataset](https://img.kancloud.cn/24/3c/243cfe0fd0e8ab5923b76dcc30ca7a95_1454x766.jpg) 月平均氣溫數據集的線圖您可以從 [DataMarket](https://datamarket.com/data/set/22li/mean-monthly-air-temperature-deg-f-nottingham-castle-1920-1939#!ds=22li&display=line) 了解有關數據集的更多信息。直接從這里下載數據集： * [monthly-mean-temp.csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-mean-temp.csv) 在當前工作目錄中使用文件名“monthly-mean-temp.csv”保存文件。我們可以使用函數 _read_csv（）_ 將此數據集作為 Pandas 系列加載。 ```py series = read_csv('monthly-mean-temp.csv', header=0, index_col=0) ``` 數據集有 20 年，或 240 個觀測值。我們將數據集修剪為過去五年的數據（60 個觀測值），以加快模型評估過程，并使用去年或 12 個觀測值來測試集。 ```py # trim dataset to 5 years data = data[-(5*12):] ``` 季節性成分的周期約為一年，或 12 個觀測值。在準備模型配置時，我們將此作為調用 _exp_smoothing_configs（）_ 函數的季節性時段。 ```py # model configs cfg_list = exp_smoothing_configs(seasonal=[0, 12]) ``` 下面列出了搜索月平均溫度時間序列預測問題的完整示例網格。 ```py # grid search ets hyperparameters for monthly mean temp dataset from math import sqrt from multiprocessing import cpu_count from joblib import Parallel from joblib import delayed from warnings import catch_warnings from warnings import filterwarnings from statsmodels.tsa.holtwinters import ExponentialSmoothing from sklearn.metrics import mean_squared_error from pandas import read_csv from numpy import array # one-step Holt Winter’s Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models if __name__ == '__main__': # load dataset series = read_csv('monthly-mean-temp.csv', header=0, index_col=0) data = series.values # trim dataset to 5 years data = data[-(5*12):] # data split n_test = 12 # model configs cfg_list = exp_smoothing_configs(seasonal=[0,12]) # grid search scores = grid_search(data[:,0], cfg_list, n_test) print('done') # list top 3 configs for cfg, error in scores[:3]: print(cfg, error) ``` 鑒于大量數據，運行示例相對較慢。在評估模型時打印模型配置和 RMSE。在運行結束時報告前三個模型配置及其錯誤。我們可以看到最好的結果是大約 1.50 度的 RMSE，具有以下配置： * **趨勢**：無 * **阻尼**：錯誤 * **季節性**：添加劑 * **季節性時期**：12 * **Box-Cox 變換**：錯誤 * **刪除偏差**：錯誤 ```py > Model[['add', True, 'mul', 12, True, False]] 1.659 > Model[['add', True, 'mul', 12, True, True]] 1.663 > Model[['add', True, 'mul', 12, False, True]] 1.603 > Model[['add', True, 'mul', 12, False, False]] 1.609 > Model[['mul', False, None, 0, True, True]] 4.920 > Model[['mul', False, None, 0, True, False]] 4.881 > Model[['mul', False, None, 0, False, True]] 4.838 > Model[['mul', False, None, 0, False, False]] 4.813 > Model[['add', True, 'add', 12, False, True]] 1.568 > Model[['mul', False, None, 12, True, True]] 4.920 > Model[['add', True, 'add', 12, False, False]] 1.555 > Model[['add', True, 'add', 12, True, False]] 1.638 > Model[['add', True, 'add', 12, True, True]] 1.646 > Model[['mul', False, None, 12, True, False]] 4.881 > Model[['mul', False, None, 12, False, True]] 4.838 > Model[['mul', False, None, 12, False, False]] 4.813 > Model[['add', True, None, 0, True, True]] 4.654 > Model[[None, False, 'add', 12, True, True]] 1.508 > Model[['add', True, None, 0, True, False]] 4.597 > Model[['add', True, None, 0, False, True]] 4.800 > Model[[None, False, 'add', 12, True, False]] 1.507 > Model[['add', True, None, 0, False, False]] 4.760 > Model[[None, False, 'add', 12, False, True]] 1.502 > Model[['add', True, None, 12, True, True]] 4.654 > Model[[None, False, 'add', 12, False, False]] 1.502 > Model[['add', True, None, 12, True, False]] 4.597 > Model[[None, False, 'mul', 12, True, True]] 1.507 > Model[['add', True, None, 12, False, True]] 4.800 > Model[[None, False, 'mul', 12, True, False]] 1.507 > Model[['add', True, None, 12, False, False]] 4.760 > Model[[None, False, 'mul', 12, False, True]] 1.502 > Model[['add', False, 'add', 12, True, True]] 1.859 > Model[[None, False, 'mul', 12, False, False]] 1.502 > Model[[None, False, None, 0, True, True]] 5.188 > Model[[None, False, None, 0, True, False]] 5.143 > Model[[None, False, None, 0, False, True]] 5.187 > Model[[None, False, None, 0, False, False]] 5.143 > Model[[None, False, None, 12, True, True]] 5.188 > Model[[None, False, None, 12, True, False]] 5.143 > Model[[None, False, None, 12, False, True]] 5.187 > Model[[None, False, None, 12, False, False]] 5.143 > Model[['add', False, 'add', 12, True, False]] 1.825 > Model[['add', False, 'add', 12, False, True]] 1.706 > Model[['add', False, 'add', 12, False, False]] 1.710 > Model[['add', False, 'mul', 12, True, True]] 1.882 > Model[['add', False, 'mul', 12, True, False]] 1.739 > Model[['add', False, 'mul', 12, False, True]] 1.580 > Model[['add', False, 'mul', 12, False, False]] 1.581 > Model[['add', False, None, 0, True, True]] 4.980 > Model[['add', False, None, 0, True, False]] 4.900 > Model[['add', False, None, 0, False, True]] 5.203 > Model[['add', False, None, 0, False, False]] 5.151 > Model[['add', False, None, 12, True, True]] 4.980 > Model[['add', False, None, 12, True, False]] 4.900 > Model[['add', False, None, 12, False, True]] 5.203 > Model[['add', False, None, 12, False, False]] 5.151 > Model[['mul', True, 'add', 12, True, True]] 19.353 > Model[['mul', True, 'add', 12, True, False]] 9.807 > Model[['mul', True, 'add', 12, False, True]] 11.696 > Model[['mul', True, 'add', 12, False, False]] 2.847 > Model[['mul', True, None, 0, True, True]] 4.607 > Model[['mul', True, None, 0, True, False]] 4.570 > Model[['mul', True, None, 0, False, True]] 4.630 > Model[['mul', True, None, 0, False, False]] 4.596 > Model[['mul', True, None, 12, True, True]] 4.607 > Model[['mul', True, None, 12, True, False]] 4.570 > Model[['mul', True, None, 12, False, True]] 4.630 > Model[['mul', True, None, 12, False, False]] 4.593 > Model[['mul', False, 'add', 12, True, True]] 4.230 > Model[['mul', False, 'add', 12, True, False]] 4.157 > Model[['mul', False, 'add', 12, False, True]] 1.538 > Model[['mul', False, 'add', 12, False, False]] 1.520 done [None, False, 'add', 12, False, False] 1.5015527325330889 [None, False, 'add', 12, False, True] 1.5015531225114707 [None, False, 'mul', 12, False, False] 1.501561363221282 ``` ## 案例研究 4：趨勢和季節性 “月度汽車銷售”數據集總結了 1960 年至 1968 年間加拿大魁北克省的月度汽車銷量。數據集具有明顯的趨勢和季節性成分。 ![Line Plot of the Monthly Car Sales Dataset](https://img.kancloud.cn/04/5f/045f949f08b91dfff5ec9152a3aaca14_1462x768.jpg) 月度汽車銷售數據集的線圖您可以從 [DataMarket](https://datamarket.com/data/set/22n4/monthly-car-sales-in-quebec-1960-1968#!ds=22n4&display=line) 了解有關數據集的更多信息。直接從這里下載數據集： * [month-car-sales.csv](https://raw.githubusercontent.com/jbrownlee/Datasets/master/monthly-car-sales.csv) 在當前工作目錄中使用文件名“monthly-car-sales.csv”保存文件。我們可以使用函數 _read_csv（）_ 將此數據集作為 Pandas 系列加載。 ```py series = read_csv('monthly-car-sales.csv', header=0, index_col=0) ``` 數據集有九年，或 108 個觀測值。我們將使用去年或 12 個觀測值作為測試集。季節性成分的期限可能是六個月或 12 個月。在準備模型配置時，我們將嘗試將兩者作為調用 _exp_smoothing_configs（）_ 函數的季節性時段。 ```py # model configs cfg_list = exp_smoothing_configs(seasonal=[0,6,12]) ``` 下面列出了搜索月度汽車銷售時間序列預測問題的完整示例網格。 ```py # grid search ets models for monthly car sales from math import sqrt from multiprocessing import cpu_count from joblib import Parallel from joblib import delayed from warnings import catch_warnings from warnings import filterwarnings from statsmodels.tsa.holtwinters import ExponentialSmoothing from sklearn.metrics import mean_squared_error from pandas import read_csv from numpy import array # one-step Holt Winter’s Exponential Smoothing forecast def exp_smoothing_forecast(history, config): t,d,s,p,b,r = config # define model history = array(history) model = ExponentialSmoothing(history, trend=t, damped=d, seasonal=s, seasonal_periods=p) # fit model model_fit = model.fit(optimized=True, use_boxcox=b, remove_bias=r) # make one step forecast yhat = model_fit.predict(len(history), len(history)) return yhat[0] # root mean squared error or rmse def measure_rmse(actual, predicted): return sqrt(mean_squared_error(actual, predicted)) # split a univariate dataset into train/test sets def train_test_split(data, n_test): return data[:-n_test], data[-n_test:] # walk-forward validation for univariate data def walk_forward_validation(data, n_test, cfg): predictions = list() # split dataset train, test = train_test_split(data, n_test) # seed history with training dataset history = [x for x in train] # step over each time-step in the test set for i in range(len(test)): # fit model and make forecast for history yhat = exp_smoothing_forecast(history, cfg) # store forecast in list of predictions predictions.append(yhat) # add actual observation to history for the next loop history.append(test[i]) # estimate prediction error error = measure_rmse(test, predictions) return error # score a model, return None on failure def score_model(data, n_test, cfg, debug=False): result = None # convert config to a key key = str(cfg) # show all warnings and fail on exception if debugging if debug: result = walk_forward_validation(data, n_test, cfg) else: # one failure during model validation suggests an unstable config try: # never show warnings when grid searching, too noisy with catch_warnings(): filterwarnings("ignore") result = walk_forward_validation(data, n_test, cfg) except: error = None # check for an interesting result if result is not None: print(' > Model[%s] %.3f' % (key, result)) return (key, result) # grid search configs def grid_search(data, cfg_list, n_test, parallel=True): scores = None if parallel: # execute configs in parallel executor = Parallel(n_jobs=cpu_count(), backend='multiprocessing') tasks = (delayed(score_model)(data, n_test, cfg) for cfg in cfg_list) scores = executor(tasks) else: scores = [score_model(data, n_test, cfg) for cfg in cfg_list] # remove empty results scores = [r for r in scores if r[1] != None] # sort configs by error, asc scores.sort(key=lambda tup: tup[1]) return scores # create a set of exponential smoothing configs to try def exp_smoothing_configs(seasonal=[None]): models = list() # define config lists t_params = ['add', 'mul', None] d_params = [True, False] s_params = ['add', 'mul', None] p_params = seasonal b_params = [True, False] r_params = [True, False] # create config instances for t in t_params: for d in d_params: for s in s_params: for p in p_params: for b in b_params: for r in r_params: cfg = [t,d,s,p,b,r] models.append(cfg) return models if __name__ == '__main__': # load dataset series = read_csv('monthly-car-sales.csv', header=0, index_col=0) data = series.values # data split n_test = 12 # model configs cfg_list = exp_smoothing_configs(seasonal=[0,6,12]) # grid search scores = grid_search(data[:,0], cfg_list, n_test) print('done') # list top 3 configs for cfg, error in scores[:3]: print(cfg, error) ``` 鑒于大量數據，運行示例很慢。在評估模型時打印模型配置和 RMSE。在運行結束時報告前三個模型配置及其錯誤。我們可以看到最好的結果是具有以下配置的約 1,672 銷售額的 RMSE： * **趨勢**：添加劑 * **阻尼**：錯誤 * **季節性**：添加劑 * **季節性時期**：12 * **Box-Cox 變換**：錯誤 * **刪除偏差**：是的這有點令人驚訝，因為我猜想六個月的季節性模型將是首選方法。 ```py > Model[['add', True, 'add', 6, False, True]] 3240.433 > Model[['add', True, 'add', 6, False, False]] 3226.384 > Model[['add', True, 'add', 6, True, False]] 2836.535 > Model[['add', True, 'add', 6, True, True]] 2784.852 > Model[['add', True, 'add', 12, False, False]] 1696.173 > Model[['add', True, 'add', 12, False, True]] 1721.746 > Model[[None, False, 'add', 6, True, True]] 3204.874 > Model[['add', True, 'add', 12, True, False]] 2064.937 > Model[['add', True, 'add', 12, True, True]] 2098.844 > Model[[None, False, 'add', 6, True, False]] 3190.972 > Model[[None, False, 'add', 6, False, True]] 3147.623 > Model[[None, False, 'add', 6, False, False]] 3126.527 > Model[[None, False, 'add', 12, True, True]] 1834.910 > Model[[None, False, 'add', 12, True, False]] 1872.081 > Model[[None, False, 'add', 12, False, True]] 1736.264 > Model[[None, False, 'add', 12, False, False]] 1807.325 > Model[[None, False, 'mul', 6, True, True]] 2993.566 > Model[[None, False, 'mul', 6, True, False]] 2979.123 > Model[[None, False, 'mul', 6, False, True]] 3025.876 > Model[[None, False, 'mul', 6, False, False]] 3009.999 > Model[['add', True, 'mul', 6, True, True]] 2956.728 > Model[[None, False, 'mul', 12, True, True]] 1972.547 > Model[[None, False, 'mul', 12, True, False]] 1989.234 > Model[[None, False, 'mul', 12, False, True]] 1925.010 > Model[[None, False, 'mul', 12, False, False]] 1941.217 > Model[[None, False, None, 0, True, True]] 3801.741 > Model[[None, False, None, 0, True, False]] 3783.966 > Model[[None, False, None, 0, False, True]] 3801.560 > Model[[None, False, None, 0, False, False]] 3783.966 > Model[[None, False, None, 6, True, True]] 3801.741 > Model[[None, False, None, 6, True, False]] 3783.966 > Model[[None, False, None, 6, False, True]] 3801.560 > Model[[None, False, None, 6, False, False]] 3783.966 > Model[[None, False, None, 12, True, True]] 3801.741 > Model[[None, False, None, 12, True, False]] 3783.966 > Model[[None, False, None, 12, False, True]] 3801.560 > Model[[None, False, None, 12, False, False]] 3783.966 > Model[['add', True, 'mul', 6, True, False]] 2932.827 > Model[['mul', True, 'mul', 12, True, True]] 1953.405 > Model[['add', True, 'mul', 6, False, True]] 2997.259 > Model[['mul', True, 'mul', 12, True, False]] 1960.242 > Model[['add', True, 'mul', 6, False, False]] 2979.248 > Model[['mul', True, 'mul', 12, False, True]] 1907.792 > Model[['add', True, 'mul', 12, True, True]] 1972.550 > Model[['add', True, 'mul', 12, True, False]] 1989.236 > Model[['mul', True, None, 0, True, True]] 3951.024 > Model[['mul', True, None, 0, True, False]] 3930.394 > Model[['mul', True, None, 0, False, True]] 3947.281 > Model[['mul', True, None, 0, False, False]] 3926.082 > Model[['mul', True, None, 6, True, True]] 3951.026 > Model[['mul', True, None, 6, True, False]] 3930.389 > Model[['mul', True, None, 6, False, True]] 3946.654 > Model[['mul', True, None, 6, False, False]] 3926.026 > Model[['mul', True, None, 12, True, True]] 3951.027 > Model[['mul', True, None, 12, True, False]] 3930.368 > Model[['mul', True, None, 12, False, True]] 3942.037 > Model[['mul', True, None, 12, False, False]] 3920.756 > Model[['add', True, 'mul', 12, False, True]] 1750.480 > Model[['mul', False, 'add', 6, True, False]] 5043.557 > Model[['mul', False, 'add', 6, False, True]] 7425.711 > Model[['mul', False, 'add', 6, False, False]] 7448.455 > Model[['mul', False, 'add', 12, True, True]] 2160.794 > Model[['mul', False, 'add', 12, True, False]] 2346.478 > Model[['mul', False, 'add', 12, False, True]] 16303.868 > Model[['mul', False, 'add', 12, False, False]] 10268.636 > Model[['mul', False, 'mul', 12, True, True]] 3012.036 > Model[['mul', False, 'mul', 12, True, False]] 3005.824 > Model[['add', True, 'mul', 12, False, False]] 1774.636 > Model[['mul', False, 'mul', 12, False, True]] 14676.476 > Model[['add', True, None, 0, True, True]] 3935.674 > Model[['mul', False, 'mul', 12, False, False]] 13988.754 > Model[['mul', False, None, 0, True, True]] 3804.906 > Model[['mul', False, None, 0, True, False]] 3805.342 > Model[['mul', False, None, 0, False, True]] 3778.444 > Model[['mul', False, None, 0, False, False]] 3798.003 > Model[['mul', False, None, 6, True, True]] 3804.906 > Model[['mul', False, None, 6, True, False]] 3805.342 > Model[['mul', False, None, 6, False, True]] 3778.456 > Model[['mul', False, None, 6, False, False]] 3798.007 > Model[['add', True, None, 0, True, False]] 3915.499 > Model[['mul', False, None, 12, True, True]] 3804.906 > Model[['mul', False, None, 12, True, False]] 3805.342 > Model[['mul', False, None, 12, False, True]] 3778.457 > Model[['mul', False, None, 12, False, False]] 3797.989 > Model[['add', True, None, 0, False, True]] 3924.442 > Model[['add', True, None, 0, False, False]] 3905.627 > Model[['add', True, None, 6, True, True]] 3935.658 > Model[['add', True, None, 6, True, False]] 3913.420 > Model[['add', True, None, 6, False, True]] 3924.287 > Model[['add', True, None, 6, False, False]] 3913.618 > Model[['add', True, None, 12, True, True]] 3935.673 > Model[['add', True, None, 12, True, False]] 3913.428 > Model[['add', True, None, 12, False, True]] 3924.487 > Model[['add', True, None, 12, False, False]] 3913.529 > Model[['add', False, 'add', 6, True, True]] 3220.532 > Model[['add', False, 'add', 6, True, False]] 3199.766 > Model[['add', False, 'add', 6, False, True]] 3243.478 > Model[['add', False, 'add', 6, False, False]] 3226.955 > Model[['add', False, 'add', 12, True, True]] 1833.481 > Model[['add', False, 'add', 12, True, False]] 1833.511 > Model[['add', False, 'add', 12, False, True]] 1672.554 > Model[['add', False, 'add', 12, False, False]] 1680.845 > Model[['add', False, 'mul', 6, True, True]] 3014.447 > Model[['add', False, 'mul', 6, True, False]] 3016.207 > Model[['add', False, 'mul', 6, False, True]] 3025.870 > Model[['add', False, 'mul', 6, False, False]] 3010.015 > Model[['add', False, 'mul', 12, True, True]] 1982.087 > Model[['add', False, 'mul', 12, True, False]] 1981.089 > Model[['add', False, 'mul', 12, False, True]] 1898.045 > Model[['add', False, 'mul', 12, False, False]] 1894.397 > Model[['add', False, None, 0, True, True]] 3815.765 > Model[['add', False, None, 0, True, False]] 3813.234 > Model[['add', False, None, 0, False, True]] 3805.649 > Model[['add', False, None, 0, False, False]] 3809.864 > Model[['add', False, None, 6, True, True]] 3815.765 > Model[['add', False, None, 6, True, False]] 3813.234 > Model[['add', False, None, 6, False, True]] 3805.619 > Model[['add', False, None, 6, False, False]] 3809.846 > Model[['add', False, None, 12, True, True]] 3815.765 > Model[['add', False, None, 12, True, False]] 3813.234 > Model[['add', False, None, 12, False, True]] 3805.638 > Model[['add', False, None, 12, False, False]] 3809.837 > Model[['mul', True, 'add', 6, True, False]] 4099.032 > Model[['mul', True, 'add', 6, False, True]] 3818.567 > Model[['mul', True, 'add', 6, False, False]] 3745.142 > Model[['mul', True, 'add', 12, True, True]] 2203.354 > Model[['mul', True, 'add', 12, True, False]] 2284.172 > Model[['mul', True, 'add', 12, False, True]] 2842.605 > Model[['mul', True, 'add', 12, False, False]] 2086.899 done ['add', False, 'add', 12, False, True] 1672.5539372356582 ['add', False, 'add', 12, False, False] 1680.845043013083 ['add', True, 'add', 12, False, False] 1696.1734099400082 ``` ## 擴展本節列出了一些擴展您可能希望探索的教程的想法。 * **數據轉換**。更新框架以支持可配置的數據轉換，例如規范化和標準化。 * **地塊預測**。更新框架以重新擬合具有最佳配置的模型并預測整個測試數據集，然后將預測與測試集中的實際觀察值進行比較。 * **調整歷史數量**。更新框架以調整用于擬合模型的歷史數據量（例如，在 10 年最高溫度數據的情況下）。如果你探索任何這些擴展，我很想知道。 ## 進一步閱讀如果您希望深入了解，本節將提供有關該主題的更多資源。 ### 圖書 * 第 7 章指數平滑，[預測：原則和實踐](https://amzn.to/2xlJsfV)，2013。 * 第 6.4 節。時間序列分析簡介，[工程統計手冊](https://www.itl.nist.gov/div898/handbook/)，2012。 * [實際時間序列預測與 R](https://amzn.to/2LGKzKm) ，2016 年。 ### 蜜蜂 * [statsmodels.tsa.holtwinters.ExponentialSmoothing API](http://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.ExponentialSmoothing.html) * [statsmodels.tsa.holtwinters.HoltWintersResults API](http://www.statsmodels.org/dev/generated/statsmodels.tsa.holtwinters.HoltWintersResults.html) * [Joblib：運行 Python 函數作為管道作業](https://pythonhosted.org/joblib/) ### 用品 * [維基百科上的指數平滑](https://en.wikipedia.org/wiki/Exponential_smoothing) ## 摘要在本教程中，您了解了如何開發一個框架，用于網格搜索所有指數平滑模型超參數，以進行單變量時間序列預測。具體來說，你學到了： * 如何使用前向驗證從頭開始開發網格搜索 ETS 模型的框架。 * 如何為出生日常時間序列數據網格搜索 ETS 模型超參數。 * 如何為洗發水銷售，汽車銷售和溫度的月度時間序列數據網格搜索 ETS 模型超參數。你有任何問題嗎？在下面的評論中提出您的問題，我會盡力回答。