了解Keras中LSTM的返回序列和返回狀態之間的差異 · Machine Learning Mastery 博客文章翻譯

# 了解Keras中LSTM的返回序列和返回狀態之間的差異 > 原文： [https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/](https://machinelearningmastery.com/return-sequences-and-return-states-for-lstms-in-keras/) Keras深度學習庫提供了長期短期記憶或LSTM循環神經網絡的實現。作為此實現的一部分，Keras API提供對返回序列和返回狀態的訪問。在設計復雜的循環神經網絡模型（例如編碼器 - 解碼器模型）時，這些數據之間的使用和差異可能會令人困惑。在本教程中，您將發現Keras深度學習庫中LSTM層的返回序列和返回狀態的差異和結果。完成本教程后，您將了解： * 返回序列返回每個輸入時間步的隱藏狀態輸出。 * 該返回狀態返回上一個輸入時間步的隱藏狀態輸出和單元狀態。 * 返回序列和返回狀態可以同時使用。讓我們開始吧。 ![Understand the Difference Between Return Sequences and Return States for LSTMs in Keras](img/c1155fe1f0943d49c96799fa94c66cb1.jpg) 理解Keras中LSTM的返回序列和返回狀態之間的差異照片由 [Adrian Curt Dannemann](https://www.flickr.com/photos/12327992@N06/33431042255/) ，保留一些權利。 ## 教程概述本教程分為4個部分;他們是： 1. 長短期記憶 2. 返回序列 3. 返回國家 4. 返回狀態和序列 ## 長短期記憶長短期記憶（LSTM）是一種由內部門組成的循環神經網絡。與其他循環神經網絡不同，網絡的內部門允許使用[反向傳播通過時間](https://machinelearningmastery.com/gentle-introduction-backpropagation-time/)或BPTT成功訓練模型，并避免消失的梯度問題。在Keras深度學習庫中，可以使用 [LSTM（）類](https://keras.io/layers/recurrent/#lstm)創建LSTM層。創建一層LSTM內存單元允許您指定層中的內存單元數。層內的每個單元或單元具有內部單元狀態，通常縮寫為“ _c_ ”，并輸出隱藏狀態，通常縮寫為“ _h_ ”。 Keras API允許您訪問這些數據，這在開發復雜的循環神經網絡架構（如編碼器 - 解碼器模型）時非常有用甚至是必需的。在本教程的其余部分中，我們將查看用于訪問這些數據的API。 ## 返回序列每個LSTM單元將為每個輸入輸出一個隱藏狀態 _h_ 。 ```py h = LSTM(X) ``` 我們可以在Keras中使用一個非常小的模型來演示這一點，該模型具有單個LSTM層，該層本身包含單個LSTM單元。在這個例子中，我們將有一個帶有3個時間步長的輸入樣本，并在每個時間步驟觀察到一個特征： ```py t1 = 0.1 t2 = 0.2 t3 = 0.3 ``` 下面列出了完整的示例。注意：本文中的所有示例都使用 [Keras功能API](https://keras.io/getting-started/functional-api-guide/) 。 ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from numpy import array # define model inputs1 = Input(shape=(3, 1)) lstm1 = LSTM(1)(inputs1) model = Model(inputs=inputs1, outputs=lstm1) # define input data data = array([0.1, 0.2, 0.3]).reshape((1,3,1)) # make and show prediction print(model.predict(data)) ``` 運行該示例為輸入序列輸出單個隱藏狀態，具有3個時間步長。鑒于LSTM權重和單元狀態的隨機初始化，您的特定輸出值將有所不同。 ```py [[-0.0953151]] ``` 可以訪問每個輸入時間步的隱藏狀態輸出。這可以通過在定義LSTM層時將 _return_sequences_ 屬性設置為 _True_ 來完成，如下所示： ```py LSTM(1, return_sequences=True) ``` 我們可以使用此更改更新上一個示例。完整的代碼清單如下。 ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from numpy import array # define model inputs1 = Input(shape=(3, 1)) lstm1 = LSTM(1, return_sequences=True)(inputs1) model = Model(inputs=inputs1, outputs=lstm1) # define input data data = array([0.1, 0.2, 0.3]).reshape((1,3,1)) # make and show prediction print(model.predict(data)) ``` 運行該示例將返回一個3個值的序列，一個隱藏狀態輸出，用于層中單個LSTM單元的每個輸入時間步長。 ```py [[[-0.02243521] [-0.06210149] [-0.11457888]]] ``` 堆疊LSTM層時必須設置 _return_sequences = True_ ，以便第二個LSTM層具有三維序列輸入。有關更多詳細信息，請參閱帖子： * [堆疊長短期內存網絡](https://machinelearningmastery.com/stacked-long-short-term-memory-networks/) 在使用包含在TimeDistributed層中的 _Dense_ 輸出層預測輸出序列時，您可能還需要訪問隱藏狀態輸出序列。有關詳細信息，請參閱此帖子： * [如何在Python](https://machinelearningmastery.com/timedistributed-layer-for-long-short-term-memory-networks-in-python/) 中為長期短期內存網絡使用時間分布層 ## 返回國家 LSTM單元或單元層的輸出稱為隱藏狀態。這很令人困惑，因為每個LSTM單元都保留一個不輸出的內部狀態，稱為單元狀態，或 _c_ 。通常，我們不需要訪問單元狀態，除非我們正在開發復雜模型，其中后續層可能需要使用另一層的最終單元狀態初始化其單元狀態，例如在編碼器 - 解碼器模型中。 Keras為LSTM層提供了return_state參數，該參數將提供對隱藏狀態輸出（ _state_h_ ）和單元狀態（ _state_c_ ）的訪問。例如： ```py lstm1, state_h, state_c = LSTM(1, return_state=True) ``` 這可能看起來很混亂，因為lstm1和 _state_h_ 都指向相同的隱藏狀態輸出。這兩個張量分離的原因將在下一節中明確。我們可以使用下面列出的工作示例演示對LSTM層中單元格的隱藏和單元格狀態的訪問。 ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from numpy import array # define model inputs1 = Input(shape=(3, 1)) lstm1, state_h, state_c = LSTM(1, return_state=True)(inputs1) model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c]) # define input data data = array([0.1, 0.2, 0.3]).reshape((1,3,1)) # make and show prediction print(model.predict(data)) ``` 運行該示例返回3個數組： 1. 最后一個步驟的LSTM隱藏狀態輸出。 2. LSTM隱藏狀態輸出為最后一個時間步驟（再次）。 3. 最后一個步驟的LSTM單元狀態。 ```py [array([[ 0.10951342]], dtype=float32), array([[ 0.10951342]], dtype=float32), array([[ 0.24143776]], dtype=float32)] ``` 隱藏狀態和單元狀態又可以用于初始化具有相同數量單元的另一個LSTM層的狀態。 ## 返回狀態和序列我們可以同時訪問隱藏狀態序列和單元狀態。這可以通過將LSTM層配置為返回序列和返回狀態來完成。 ```py lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True) ``` The complete example is listed below. ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from numpy import array # define model inputs1 = Input(shape=(3, 1)) lstm1, state_h, state_c = LSTM(1, return_sequences=True, return_state=True)(inputs1) model = Model(inputs=inputs1, outputs=[lstm1, state_h, state_c]) # define input data data = array([0.1, 0.2, 0.3]).reshape((1,3,1)) # make and show prediction print(model.predict(data)) ``` 運行該示例，我們現在可以看到為什么LSTM輸出張量和隱藏狀態輸出張量是可分離地聲明的。該層返回每個輸入時間步的隱藏狀態，然后分別返回上一個時間步的隱藏狀態輸出和最后一個輸入時間步的單元狀態。這可以通過查看返回序列中的最后一個值（第一個數組）與隱藏狀態（第二個數組）中的值匹配來確認。 ```py [array([[[-0.02145359], [-0.0540871 ], [-0.09228823]]], dtype=float32), array([[-0.09228823]], dtype=float32), array([[-0.19803026]], dtype=float32)] ``` ## 進一步閱讀如果您希望深入了解，本節將提供有關該主題的更多資源。 * [Keras功能API](https://keras.io/getting-started/functional-api-guide/) * [Keras的LSTM API](https://keras.io/layers/recurrent/#lstm) * [長期短期記憶](http://www.bioinf.jku.at/publications/older/2604.pdf)，1997年。 * [了解LSTM網絡](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)，2015年。 * [Keras](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) 中序列到序列學習的十分鐘介紹 ## 摘要在本教程中，您發現了Keras深度學習庫中LSTM層的返回序列和返回狀態的差異和結果。具體來說，你學到了： * 返回序列返回每個輸入時間步的隱藏狀態輸出。 * 該返回狀態返回上一個輸入時間步的隱藏狀態輸出和單元狀態。 * 返回序列和返回狀態可以同時使用。你有任何問題嗎？在下面的評論中提出您的問題，我會盡力回答。