用 Keras 理解 Python 中的有狀態 LSTM 循環神經網絡 · Machine Learning Mastery 博客文章翻譯

# 用 Keras 理解 Python 中的有狀態 LSTM 循環神經網絡 > 原文： [https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/understanding-stateful-lstm-recurrent-neural-networks-python-keras/) 強大且流行的遞歸神經網絡是長期短期模型網絡或 LSTM。它被廣泛使用，因為該體系結構克服了困擾所有遞歸神經網絡的消失和暴露梯度問題，允許創建非常大且非常深的網絡。與其他遞歸神經網絡一樣，LSTM 網絡維持狀態，并且在 Keras 框架中如何實現這一點的具體細節可能會令人困惑。在這篇文章中，您將通過 Keras 深度學習庫確切了解 LSTM 網絡中的狀態。閱讀這篇文章后你會知道： * 如何為序列預測問題開發一個樸素的 LSTM 網絡。 * 如何通過 LSTM 網絡批量管理狀態和功能。 * 如何在 LSTM 網絡中手動管理狀態以進行狀態預測。讓我們開始吧。 * **2017 年 3 月更新**：更新了 Keras 2.0.2，TensorFlow 1.0.1 和 Theano 0.9.0 的示例。 * **更新 Aug / 2018** ：更新了 Python 3 的示例，更新了有狀態示例以獲得 100％的準確性。 * **更新 Mar / 2019** ：修正了有狀態示例中的拼寫錯誤。 ![Understanding Stateful LSTM Recurrent Neural Networks in Python with Keras](https://img.kancloud.cn/3f/26/3f264d3dbab085994649c64d42d97d89_640x425.png) 使用 Keras 了解 Python 中的有狀態 LSTM 回歸神經網絡 [Martin Abegglen](https://www.flickr.com/photos/twicepix/7923674788/) 的照片，保留一些權利。 ## 問題描述：學習字母表在本教程中，我們將開發和對比許多不同的 LSTM 遞歸神經網絡模型。這些比較的背景將是學習字母表的簡單序列預測問題。也就是說，給定一個字母表的字母，預測字母表的下一個字母。這是一個簡單的序列預測問題，一旦理解就可以推廣到其他序列預測問題，如時間序列預測和序列分類。讓我們用一些 python 代碼來準備問題，我們可以從示例到示例重用這些代碼。首先，讓我們導入我們計劃在本教程中使用的所有類和函數。 ```py import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils ``` 接下來，我們可以為隨機數生成器播種，以確保每次執行代碼時結果都相同。 ```py # fix random seed for reproducibility numpy.random.seed(7) ``` 我們現在可以定義我們的數據集，即字母表。為了便于閱讀，我們用大寫字母定義字母表。神經網絡模型編號，因此我們需要將字母表的字母映射為整數值。我們可以通過創建字符索引的字典（map）來輕松完成此操作。我們還可以創建反向查找，以便將預測轉換回字符以便以后使用。 ```py # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) ``` 現在我們需要創建輸入和輸出對來訓練我們的神經網絡。我們可以通過定義輸入序列長度，然后從輸入字母序列中讀取序列來完成此操作。例如，我們使用輸入長度 1.從原始輸入數據的開頭開始，我們可以讀出第一個字母“A”和下一個字母作為預測“B”。我們沿著一個角色移動并重復直到我們達到“Z”的預測。 ```py # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) ``` 我們還打印出輸入對以進行健全性檢查。將代碼運行到此點將產生以下輸出，總結長度為 1 的輸入序列和單個輸出字符。 ```py A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z ``` 我們需要將 NumPy 陣列重新整形為 LSTM 網絡所期望的格式，即[_ 樣本，時間步長，特征 _]。 ```py # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) ``` 一旦重新整形，我們就可以將輸入整數歸一化到 0 到 1 的范圍，即 LSTM 網絡使用的 S 形激活函數的范圍。 ```py # normalize X = X / float(len(alphabet)) ``` 最后，我們可以將此問題視為序列分類任務，其中 26 個字母中的每一個代表不同的類。因此，我們可以使用 Keras 內置函數 **to_categorical（）**將輸出（y）轉換為一個熱編碼。 ```py # one hot encode the output variable y = np_utils.to_categorical(dataY) ``` 我們現在準備適應不同的 LSTM 模型。 ## 用于學習 One-Char 到 One-Char 映射的 Naive LSTM 讓我們從設計一個簡單的 LSTM 開始，學習如何在給定一個字符的上下文的情況下預測字母表中的下一個字符。我們將問題框架化為單字母輸入到單字母輸出對的隨機集合。正如我們將看到的那樣，這是 LSTM 學習問題的難點框架。讓我們定義一個具有 32 個單元的 LSTM 網絡和一個具有 softmax 激活功能的輸出層，用于進行預測。因為這是一個多類分類問題，我們可以使用日志丟失函數（在 Keras 中稱為“ **categorical_crossentropy** ”），并使用 ADAM 優化函數優化網絡。該模型適用于 500 個時期，批量大小為 1。 ```py # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=1, verbose=2) ``` 在我們擬合模型之后，我們可以評估和總結整個訓練數據集的表現。 ```py # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) ``` 然后，我們可以通過網絡重新運行訓練數據并生成預測，將輸入和輸出對轉換回原始字符格式，以便直觀地了解網絡如何了解問題。 ```py # demonstrate some model predictions for pattern in dataX: x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 下面提供了整個代碼清單，以確保完整性。 ```py # Naive LSTM to learn one-char to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX: x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 運行此示例將生成以下輸出。 ```py Model Accuracy: 84.00% ['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> W ['V'] -> Y ['W'] -> Z ['X'] -> Z ['Y'] -> Z ``` 我們可以看到這個問題對于網絡來說確實很難學習。原因是，糟糕的 LSTM 單位沒有任何上下文可以使用。每個輸入 - 輸出模式以隨機順序顯示給網絡，并且在每個模式（每個批次包含一個模式的每個批次）之后重置網絡狀態。這是濫用 LSTM 網絡架構，將其視為標準的多層 Perceptron。接下來，讓我們嘗試不同的問題框架，以便為網絡提供更多的順序來學習。 ## Naive LSTM 用于三字符特征窗口到單字符映射為多層 Perceptrons 添加更多上下文數據的流行方法是使用 window 方法。這是序列中的先前步驟作為網絡的附加輸入功能提供的地方。我們可以嘗試相同的技巧，為 LSTM 網絡提供更多上下文。在這里，我們將序列長度從 1 增加到 3，例如： ```py # prepare the dataset of input to output pairs encoded as integers seq_length = 3 ``` 這創建了以下訓練模式： ```py ABC -> D BCD -> E CDE -> F ``` 然后，序列中的每個元素作為新的輸入特征提供給網絡。這需要修改數據準備步驟中輸入序列的重新形成方式： ```py # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), 1, seq_length)) ``` 在演示模型的預測時，還需要修改樣本模式的重新整形方式。 ```py x = numpy.reshape(pattern, (1, 1, len(pattern))) ``` 下面提供了整個代碼清單，以確保完整性。 ```py # Naive LSTM to learn three-char window to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), 1, seq_length)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX: x = numpy.reshape(pattern, (1, 1, len(pattern))) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 運行此示例提供以下輸出。 ```py Model Accuracy: 86.96% ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> Y ['U', 'V', 'W'] -> Z ['V', 'W', 'X'] -> Z ['W', 'X', 'Y'] -> Z ``` 我們可以看到表現上的小幅提升可能是也可能不是真實的。這是一個簡單的問題，即使使用窗口方法，我們仍然無法用 LSTM 學習。同樣，這是對問題的不良框架的 LSTM 網絡的濫用。實際上，字母序列是一個特征的時間步長，而不是單獨特征的一個時間步長。我們已經為網絡提供了更多的上下文，但沒有像預期的那樣更多的序列。在下一節中，我們將以時間步長的形式為網絡提供更多上下文。 ## 用于單字符映射的三字符時間步長窗口的樸素 LSTM 在 Keras 中，LSTM 的預期用途是以時間步長的形式提供上下文，而不是像其他網絡類型那樣提供窗口化功能。我們可以采用我們的第一個例子，只需將序列長度從 1 更改為 3。 ```py seq_length = 3 ``` 同樣，這會創建輸入 - 輸出對，如下所示： ```py ABC -> D BCD -> E CDE -> F DEF -> G ``` 不同之處在于輸入數據的重新整形將序列作為一個特征的時間步長序列，而不是多個特征的單個時間步長。 ```py # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) ``` 這是為 Keras 中的 LSTM 提供序列上下文的正確用途。完整性代碼示例如下所示。 ```py # Naive LSTM to learn three-char time steps to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 3 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=1, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX: x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 運行此示例提供以下輸出。 ```py Model Accuracy: 100.00% ['A', 'B', 'C'] -> D ['B', 'C', 'D'] -> E ['C', 'D', 'E'] -> F ['D', 'E', 'F'] -> G ['E', 'F', 'G'] -> H ['F', 'G', 'H'] -> I ['G', 'H', 'I'] -> J ['H', 'I', 'J'] -> K ['I', 'J', 'K'] -> L ['J', 'K', 'L'] -> M ['K', 'L', 'M'] -> N ['L', 'M', 'N'] -> O ['M', 'N', 'O'] -> P ['N', 'O', 'P'] -> Q ['O', 'P', 'Q'] -> R ['P', 'Q', 'R'] -> S ['Q', 'R', 'S'] -> T ['R', 'S', 'T'] -> U ['S', 'T', 'U'] -> V ['T', 'U', 'V'] -> W ['U', 'V', 'W'] -> X ['V', 'W', 'X'] -> Y ['W', 'X', 'Y'] -> Z ``` 我們可以看到模型完美地學習了問題，如模型評估和示例預測所證明的那樣。但它已經學到了一個更簡單的問題。具體來說，它學會了從字母表中的三個字母序列預測下一個字母。它可以顯示字母表中任意三個字母的隨機序列，并預測下一個字母。它實際上不能枚舉字母表。我希望更大的多層感知網絡可以使用窗口方法學習相同的映射。 LSTM 網絡是有狀態的。他們應該能夠學習整個字母順序，但默認情況下，Keras 實現會在每個訓練批次之后重置網絡狀態。 ## 批量生產中的 LSTM 狀態 LSTM 的 Keras 實現在每批之后重置網絡狀態。這表明，如果我們的批量大小足以容納所有輸入模式，并且如果所有輸入模式都是按順序排序的，那么 LSTM 可以使用批量中序列的上下文來更好地學習序列。我們可以通過修改學習一對一映射的第一個示例并將批量大小從 1 增加到訓練數據集的大小來輕松演示這一點。此外，Keras 在每個訓練時期之前對訓練數據集進行混洗。為確保訓練數據模式保持連續，我們可以禁用此改組。 ```py model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False) ``` 網絡將使用批內序列學習字符映射，但在進行預測時，網絡將無法使用此上下文。我們可以評估網絡隨機和按順序進行預測的能力。完整性代碼示例如下所示。 ```py # Naive LSTM to learn one-char to one-char mapping with all data in each batch import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # convert list of lists to array and pad sequences if needed X = pad_sequences(dataX, maxlen=seq_length, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (X.shape[0], seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model model = Sequential() model.add(LSTM(16, input_shape=(X.shape[1], X.shape[2]))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for pattern in dataX: x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) # demonstrate predicting random patterns print("Test a Random Pattern:") for i in range(0,20): pattern_index = numpy.random.randint(len(dataX)) pattern = dataX[pattern_index] x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 運行該示例提供以下輸出。 ```py Model Accuracy: 100.00% ['A'] -> B ['B'] -> C ['C'] -> D ['D'] -> E ['E'] -> F ['F'] -> G ['G'] -> H ['H'] -> I ['I'] -> J ['J'] -> K ['K'] -> L ['L'] -> M ['M'] -> N ['N'] -> O ['O'] -> P ['P'] -> Q ['Q'] -> R ['R'] -> S ['S'] -> T ['T'] -> U ['U'] -> V ['V'] -> W ['W'] -> X ['X'] -> Y ['Y'] -> Z Test a Random Pattern: ['T'] -> U ['V'] -> W ['M'] -> N ['Q'] -> R ['D'] -> E ['V'] -> W ['T'] -> U ['U'] -> V ['J'] -> K ['F'] -> G ['N'] -> O ['B'] -> C ['M'] -> N ['F'] -> G ['F'] -> G ['P'] -> Q ['A'] -> B ['K'] -> L ['W'] -> X ['E'] -> F ``` 正如我們所料，網絡能夠使用序列內上下文來學習字母表，從而實現訓練數據的 100％準確性。重要的是，網絡可以對隨機選擇的字符中的下一個字母進行準確的預測。非常令人印象深刻。 ## 用于單字符到單字符映射的有狀態 LSTM 我們已經看到，我們可以將原始數據分解為固定大小的序列，并且這種表示可以由 LSTM 學習，但僅用于學習 3 個字符到 1 個字符的隨機映射。我們還看到，我們可以通過批量大小來為網絡提供更多序列，但僅限于訓練期間。理想情況下，我們希望將網絡暴露給整個序列，讓它學習相互依賴關系，而不是在問題框架中明確定義這些依賴關系。我們可以在 Keras 中通過使 LSTM 層有狀態并在時期結束時手動重置網絡狀態來執行此操作，這也是訓練序列的結束。這確實是如何使用 LSTM 網絡的。我們首先需要將 LSTM 層定義為有狀態。這樣，我們必須明確指定批量大小作為輸入形狀的維度。這也意味著，當我們評估網絡或進行預測時，我們還必須指定并遵守相同的批量大小。現在這不是一個問題，因為我們使用批量大小為 1.當批量大小不是一個時，這可能會在進行預測時帶來困難，因為需要批量和按順序進行預測。 ```py batch_size = 1 model.add(LSTM(50, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)) ``` 訓練有狀態 LSTM 的一個重要區別是我們一次手動訓練一個時期并在每個時期后重置狀態。我們可以在 for 循環中執行此操作。同樣，我們不會改變輸入，保留輸入訓練數據的創建順序。 ```py for i in range(300): model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False) model.reset_states() ``` 如上所述，我們在評估整個訓練數據集的網絡表現時指定批量大小。 ```py # summarize performance of the model scores = model.evaluate(X, y, batch_size=batch_size, verbose=0) model.reset_states() print("Model Accuracy: %.2f%%" % (scores[1]*100)) ``` 最后，我們可以證明網絡確實學會了整個字母表。我們可以用第一個字母“A”播種它，請求預測，將預測反饋作為輸入，并一直重復該過程到“Z”。 ```py # demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1): x = numpy.reshape(seed, (1, len(seed), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() ``` 我們還可以看到網絡是否可以從任意字母開始進行預測。 ```py # demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print("New start: ", letter) for i in range(0, 5): x = numpy.reshape(seed, (1, len(seed), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() ``` 下面提供了整個代碼清單，以確保完整性。 ```py # Stateful LSTM to learn one-char to one-char mapping import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers seq_length = 1 dataX = [] dataY = [] for i in range(0, len(alphabet) - seq_length, 1): seq_in = alphabet[i:i + seq_length] seq_out = alphabet[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) print(seq_in, '->', seq_out) # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (len(dataX), seq_length, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model batch_size = 1 model = Sequential() model.add(LSTM(50, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) for i in range(300): model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False) model.reset_states() # summarize performance of the model scores = model.evaluate(X, y, batch_size=batch_size, verbose=0) model.reset_states() print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions seed = [char_to_int[alphabet[0]]] for i in range(0, len(alphabet)-1): x = numpy.reshape(seed, (1, len(seed), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() # demonstrate a random starting point letter = "K" seed = [char_to_int[letter]] print("New start: ", letter) for i in range(0, 5): x = numpy.reshape(seed, (1, len(seed), 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) print(int_to_char[seed[0]], "->", int_to_char[index]) seed = [index] model.reset_states() ``` 運行該示例提供以下輸出。 ```py Model Accuracy: 100.00% A -> B B -> C C -> D D -> E E -> F F -> G G -> H H -> I I -> J J -> K K -> L L -> M M -> N N -> O O -> P P -> Q Q -> R R -> S S -> T T -> U U -> V V -> W W -> X X -> Y Y -> Z New start: K K -> B B -> C C -> D D -> E E -> F ``` 我們可以看到網絡完全記住了整個字母表。它使用了樣本本身的上下文，并學習了預測序列中下一個字符所需的依賴性。我們還可以看到，如果我們用第一個字母為網絡播種，那么它可以正確地敲擊字母表的其余部分。我們還可以看到，它只是從冷啟動中學習了完整的字母序列。當被要求預測來自“K”的下一個字母時，它預測“B”并且重新回到整個字母表的反芻。為了真實地預測“K”，需要將網絡的狀態反復加熱，將字母從“A”加到“J”。這告訴我們，通過準備以下訓練數據，我們可以通過“無狀態”LSTM 實現相同的效果： ```py ---a -> b --ab -> c -abc -> d abcd -> e ``` 輸入序列固定為 25（a-to-y 預測 z）并且模式以零填充為前綴。最后，這提出了使用可變長度輸入序列訓練 LSTM 網絡以預測下一個字符的問題。 ## 具有可變長度輸入到單字符輸出的 LSTM 在上一節中，我們發現 Keras“有狀態”LSTM 實際上只是重放第一個 n 序列的捷徑，但并沒有真正幫助我們學習字母表的通用模型。在本節中，我們將探索“無狀態”LSTM 的變體，它可以學習字母表的隨機子序列，并努力構建一個可以給出任意字母或字母子序列的模型，并預測字母表中的下一個字母。首先，我們正在改變問題的框架。為簡化起見，我們將定義最大輸入序列長度并將其設置為小值，如 5，以加快訓練速度。這定義了為訓練繪制的字母表子序列的最大長度。在擴展中，如果我們允許循環回到序列的開頭，這可以設置為完整字母表（26）或更長。我們還需要定義要創建的隨機序列的數量，在本例中為 1000.這也可能更多或更少。我希望實際上需要更少的模式。 ```py # prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 dataX = [] dataY = [] for i in range(num_inputs): start = numpy.random.randint(len(alphabet)-2) end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1)) sequence_in = alphabet[start:end+1] sequence_out = alphabet[end + 1] dataX.append([char_to_int[char] for char in sequence_in]) dataY.append(char_to_int[sequence_out]) print(sequence_in, '->', sequence_out) ``` 在更廣泛的上下文中運行此代碼將創建如下所示的輸入模式： ```py PQRST -> U W -> X O -> P OPQ -> R IJKLM -> N QRSTU -> V ABCD -> E X -> Y GHIJ -> K ``` 輸入序列的長度在 1 和 **max_len** 之間變化，因此需要零填充。這里，我們使用左側（前綴）填充和 **pad_sequences（）**函數中內置的 Keras。 ```py X = pad_sequences(dataX, maxlen=max_len, dtype='float32') ``` 在隨機選擇的輸入模式上評估訓練的模型。這可能很容易成為新的隨機生成的字符序列。我也相信這也可以是一個帶有“A”的線性序列，輸出 fes 作為單個字符輸入。完整性代碼清單如下所示。 ```py # LSTM with Variable Length Input Sequences to One Character Output import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.utils import np_utils from keras.preprocessing.sequence import pad_sequences from theano.tensor.shared_randomstreams import RandomStreams # fix random seed for reproducibility numpy.random.seed(7) # define the raw dataset alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" # create mapping of characters to integers (0-25) and the reverse char_to_int = dict((c, i) for i, c in enumerate(alphabet)) int_to_char = dict((i, c) for i, c in enumerate(alphabet)) # prepare the dataset of input to output pairs encoded as integers num_inputs = 1000 max_len = 5 dataX = [] dataY = [] for i in range(num_inputs): start = numpy.random.randint(len(alphabet)-2) end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1)) sequence_in = alphabet[start:end+1] sequence_out = alphabet[end + 1] dataX.append([char_to_int[char] for char in sequence_in]) dataY.append(char_to_int[sequence_out]) print(sequence_in, '->', sequence_out) # convert list of lists to array and pad sequences if needed X = pad_sequences(dataX, maxlen=max_len, dtype='float32') # reshape X to be [samples, time steps, features] X = numpy.reshape(X, (X.shape[0], max_len, 1)) # normalize X = X / float(len(alphabet)) # one hot encode the output variable y = np_utils.to_categorical(dataY) # create and fit the model batch_size = 1 model = Sequential() model.add(LSTM(32, input_shape=(X.shape[1], 1))) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy']) model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2) # summarize performance of the model scores = model.evaluate(X, y, verbose=0) print("Model Accuracy: %.2f%%" % (scores[1]*100)) # demonstrate some model predictions for i in range(20): pattern_index = numpy.random.randint(len(dataX)) pattern = dataX[pattern_index] x = pad_sequences([pattern], maxlen=max_len, dtype='float32') x = numpy.reshape(x, (1, max_len, 1)) x = x / float(len(alphabet)) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] print(seq_in, "->", result) ``` 運行此代碼將生成以下輸出： ```py Model Accuracy: 98.90% ['Q', 'R'] -> S ['W', 'X'] -> Y ['W', 'X'] -> Y ['C', 'D'] -> E ['E'] -> F ['S', 'T', 'U'] -> V ['G', 'H', 'I', 'J', 'K'] -> L ['O', 'P', 'Q', 'R', 'S'] -> T ['C', 'D'] -> E ['O'] -> P ['N', 'O', 'P'] -> Q ['D', 'E', 'F', 'G', 'H'] -> I ['X'] -> Y ['K'] -> L ['M'] -> N ['R'] -> T ['K'] -> L ['E', 'F', 'G'] -> H ['Q'] -> R ['Q', 'R', 'S'] -> T ``` 我們可以看到，盡管模型沒有從隨機生成的子序列中完美地學習字母表，但它確實做得很好。該模型未經過調整，可能需要更多訓練或更大的網絡，或兩者兼而有之（為讀者練習）。這是“_ 所有順序輸入示例中每個批次 _”字母模型的一個很好的自然擴展，它可以處理即席查詢，但這次任意序列長度（最大長度）。 ## 摘要在這篇文章中，您發現了 Keras 中的 LSTM 循環神經網絡以及它們如何管理狀態。具體來說，你學到了： * 如何為一個字符到一個字符的預測開發一個樸素的 LSTM 網絡。 * 如何配置一個樸素的 LSTM 來學習樣本中跨時間步的序列。 * 如何通過手動管理狀態來配置 LSTM 以跨樣本學習序列。您對管理 LSTM 州或此帖有任何疑問嗎？在評論中提出您的問題，我會盡力回答。