如何在 Keras 中定義神經機器翻譯的編碼器 - 解碼器序列 - 序列模型 · Machine Learning Mastery 博客文章翻譯

# 如何在 Keras 中定義神經機器翻譯的編碼器 - 解碼器序列 - 序列模型 > 原文： [https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/](https://machinelearningmastery.com/define-encoder-decoder-sequence-sequence-model-neural-machine-translation-keras/) 編碼器 - 解碼器模型提供了使用循環神經網絡來解決具有挑戰性的序列到序列預測問題（例如機器翻譯）的模式。可以在 Keras Python 深度學習庫中開發編碼器 - 解碼器模型，并且在 Keras 博客上描述了使用該模型開發的神經機器翻譯系統的示例，其中示例代碼與 Keras 項目一起分發。在本文中，您將了解如何定義用于機器翻譯的編碼器 - 解碼器序列到序列預測模型，如 Keras 深度學習庫的作者所述。閱讀這篇文章后，你會知道： * 神奇機器翻譯示例與 Keras 一起提供并在 Keras 博客上進行了描述。 * 如何正確定義編碼器 - 解碼器 LSTM 以訓練神經機器翻譯模型。 * 如何正確定義推理模型以使用經過訓練的編碼器 - 解碼器模型來轉換新序列。讓我們開始吧。 * **更新 Apr / 2018** ：有關應用此復雜模型的示例，請參閱帖子：[如何開發 Keras 中序列到序列預測的編碼器 - 解碼器模型](https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/) ![How to Define an Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation in Keras](img/38cae6eb1536c9b0a1ba0bb9d1c6906e.jpg) 如何在 Keras 中定義用于神經機器翻譯的編碼器 - 解碼器序列 - 序列模型 [Tom Lee](https://www.flickr.com/photos/68942208@N02/16012752622/) ，保留一些權利。 ## Keras 中的序列到序列預測 [Keras 深度學習庫的作者 Francois Chollet](https://twitter.com/fchollet) 最近發布了一篇博文，其中介紹了一個代碼示例，用于開發一個序列到序列預測的編碼器 - 解碼器 LSTM，標題為“ [A ten - 對 Keras](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) 中序列到序列學習的細致介紹。博客文章中開發的代碼也已添加到 Keras 中，作為文件 [lstm_seq2seq.py](https://github.com/fchollet/keras/blob/master/examples/lstm_seq2seq.py) 中的示例。該帖子開發了編碼器 - 解碼器 LSTM 的復雜實現，如關于該主題的規范論文中所述： * [用神經網絡進行序列學習的序列](https://arxiv.org/abs/1409.3215)，2014。 * [使用 RNN 編碼器 - 解碼器進行統計機器翻譯的學習短語表示](https://arxiv.org/abs/1406.1078)，2014。該模型適用于機器翻譯問題，與首次描述該方法的源文件相同。從技術上講，該模型是神經機器翻譯模型。 Francois 的實現提供了一個模板，用于在編寫本文時在 Keras 深度學習庫中如何（正確地）實現序列到序列預測。在這篇文章中，我們將詳細了解訓練和推理模型的設計方式以及它們的工作原理。您將能夠利用這種理解為您自己的序列到序列預測問題開發類似的模型。 ## 機器翻譯數據該示例中使用的數據集涉及閃存卡軟件 [Anki](https://apps.ankiweb.net/) 中使用的簡短的法語和英語句子對。該數據集被稱為“[制表符分隔的雙語句子對](http://www.manythings.org/anki/)”，并且是 [Tatoeba 項目](http://tatoeba.org/home)的一部分，并列在 [ManyThings.org](http://www.manythings.org/) 網站上，用于幫助英語作為第二語言學生。可以從此處下載本教程中使用的數據集： * [法語 - 英語 fra-eng.zip](http://www.manythings.org/anki/fra-eng.zip) 下面是解壓縮下載的存檔后您將看到的 _fra.txt_ 數據文件的前 10 行示例。 ```py Go. Va ! Run! Cours?! Run! Courez?! Wow! ?a alors?! Fire! Au feu ! Help! à l'aide?! Jump. Saute. Stop! ?a suffit?! Stop! Stop?! Stop! Arrête-toi ! ``` 該問題被定義為序列預測問題，其中字符的輸入序列是英語并且輸出的字符序列是法語。數據集中使用了數據文件中近 150,000 個示例中的 10,000 個。準備數據的一些技術細節如下： * **輸入序列**：填充最大長度為 16 個字符，詞匯量為 71 個不同的字符（10000,16,71）。 * **輸出序列**：填充最大長度為 59 個字符，詞匯量為 93 個不同的字符（10000,59,93）。對訓練數據進行框架化，使得模型的輸入包括一個完整的英文字符輸入序列和整個法語字符輸出序列。模型的輸出是整個法語字符序列，但向前偏移一個步驟。例如（使用最小填充并且沒有單熱編碼）： * 輸入 1：['G'，'o'，'。'，“] * 輸入 2：[“，'V'，'a'，''] * 輸出：['V'，'a'，''，'！'] ## 機器翻譯模型神經翻譯模型是編碼器 - 解碼器循環神經網絡。它由讀取可變長度輸入序列的編碼器和預測可變長度輸出序列的解碼器組成。在本節中，我們將逐步介紹模型定義的每個元素，代碼直接來自 Keras 項目中的帖子和代碼示例（在撰寫本文時）。該模型分為兩個子模型：負責輸出輸入英語序列的固定長度編碼的編碼器，以及負責預測輸出序列的解碼器，每個輸出時間步長一個字符。第一步是定義編碼器。編碼器的輸入是一系列字符，每個字符編碼為長度為 _num_encoder_tokens_ 的單熱向量。編碼器中的 LSTM 層定義為 _return_state_ 參數設置為 _True_ 。這將返回 LSTM 層返回的隱藏狀態輸出，以及層中所有單元格的隱藏狀態和單元格狀態。這些在定義解碼器時使用。 ```py # Define an input sequence and process it. encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder(encoder_inputs) # We discard `encoder_outputs` and only keep the states. encoder_states = [state_h, state_c] ``` 接下來，我們定義解碼器。解碼器輸入被定義為法語字符一熱編碼到二元向量的序列，其長度為 _num_decoder_tokens_ 。 LSTM 層定義為返回序列和狀態。忽略最終的隱藏和單元狀態，僅引用隱藏狀態的輸出序列。重要的是，編碼器的最終隱藏和單元狀態用于初始化解碼器的狀態。這意味著每次編碼器模型對輸入序列進行編碼時，編碼器模型的最終內部狀態將用作輸出輸出序列中第一個字符的起始點。這也意味著編碼器和解碼器 LSTM 層必須具有相同數量的單元，在這種情況下為 256。 _Dense_ 輸出層用于預測每個字符。該 _Dense_ 用于以一次性方式產生輸出序列中的每個字符，而不是遞歸地，至少在訓練期間。這是因為在訓練期間已知輸入模型所需的整個目標序列。 Dense 不需要包含在 _TimeDistributed_ 層中。 ```py # Set up the decoder, using `encoder_states` as initial state. decoder_inputs = Input(shape=(None, num_decoder_tokens)) # We set up our decoder to return full output sequences, # and to return internal states as well. We don't use the # return states in the training model, but we will use them in inference. decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) ``` 最后，使用編碼器和解碼器的輸入以及輸出目標序列來定義模型。 ```py # Define the model that will turn # `encoder_input_data` & `decoder_input_data` into `decoder_target_data` model = Model([encoder_inputs, decoder_inputs], decoder_outputs) ``` 我們可以在一個獨立的示例中將所有這些組合在一起并修復配置并打印模型圖。下面列出了定義模型的完整代碼示例。 ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from keras.layers import Dense from keras.utils.vis_utils import plot_model # configure num_encoder_tokens = 71 num_decoder_tokens = 93 latent_dim = 256 # Define an input sequence and process it. encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder(encoder_inputs) # We discard `encoder_outputs` and only keep the states. encoder_states = [state_h, state_c] # Set up the decoder, using `encoder_states` as initial state. decoder_inputs = Input(shape=(None, num_decoder_tokens)) # We set up our decoder to return full output sequences, # and to return internal states as well. We don't use the # return states in the training model, but we will use them in inference. decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # Define the model that will turn # `encoder_input_data` & `decoder_input_data` into `decoder_target_data` model = Model([encoder_inputs, decoder_inputs], decoder_outputs) # plot the model plot_model(model, to_file='model.png', show_shapes=True) ``` 運行該示例會創建已定義模型的圖，可幫助您更好地了解所有內容是如何掛起的。注意，編碼器 LSTM 不直接將其輸出作為輸入傳遞給解碼器 LSTM;如上所述，解碼器使用最終隱藏和單元狀態作為解碼器的初始狀態。還要注意，解碼器 LSTM 僅將隱藏狀態序列傳遞給密集輸出，而不是輸出形狀信息所建議的最終隱藏狀態和單元狀態。 ![Graph of Encoder-Decoder Model For Training](img/2700043ef80aa99a679207e3c43f0a5e.jpg) 用于訓練的編碼器 - 解碼器模型圖 ## 神經機器翻譯推理一旦定義的模型適合，它就可以用于進行預測。具體而言，輸出英文源文本的法語翻譯。為訓練定義的模型已經學習了此操作的權重，但模型的結構并非設計為遞歸調用以一次生成一個字符。相反，預測步驟需要新模型，特別是用于編碼英文輸入字符序列的模型和模型，該模型采用到目前為止生成的法語字符序列和編碼作為輸入并預測序列中的下一個字符。定義推理模型需要參考示例中用于訓練的模型的元素。或者，可以定義具有相同形狀的新模型并從文件加載權重。編碼器模型被定義為從訓練模型中的編碼器獲取輸入層（ _encoder_inputs_ ）并輸出隱藏和單元狀態張量（ _encoder_states_ ）。 ```py # define encoder inference model encoder_model = Model(encoder_inputs, encoder_states) ``` 解碼器更精細。解碼器需要來自編碼器的隱藏和單元狀態作為新定義的編碼器模型的初始狀態。由于解碼器是一個單獨的獨立模型，因此這些狀態將作為模型的輸入提供，因此必須首先定義為輸入。 ```py decoder_state_input_h = Input(shape=(latent_dim,)) decoder_state_input_c = Input(shape=(latent_dim,)) ``` 然后可以指定它們用作解碼器 LSTM 層的初始狀態。 ```py decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c] decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs) ``` 對于要在翻譯序列中生成的每個字符，將遞歸地調用編碼器和解碼器。在第一次調用時，來自編碼器的隱藏和單元狀態將用于初始化解碼器 LSTM 層，作為模型的輸入直接提供。在隨后對解碼器的遞歸調用中，必須向模型提供最后隱藏和單元狀態。這些狀態值已經在解碼器內;盡管如此，我們必須在每次調用時重新初始化狀態，給定模型的定義方式，以便在第一次調用時從編碼器中獲取最終狀態。因此，解碼器必須在每次調用時輸出隱藏和單元狀態以及預測字符，以便可以將這些狀態分配給變量并在每個后續遞歸調用上用于要翻譯的給定輸入英語文本序列。 ```py decoder_states = [state_h, state_c] decoder_outputs = decoder_dense(decoder_outputs) decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states) ``` 考慮到一些元素的重用，我們可以將所有這些結合在一起，形成一個獨立的代碼示例，并結合上一節訓練模型的定義。完整的代碼清單如下。 ```py from keras.models import Model from keras.layers import Input from keras.layers import LSTM from keras.layers import Dense from keras.utils.vis_utils import plot_model # configure num_encoder_tokens = 71 num_decoder_tokens = 93 latent_dim = 256 # Define an input sequence and process it. encoder_inputs = Input(shape=(None, num_encoder_tokens)) encoder = LSTM(latent_dim, return_state=True) encoder_outputs, state_h, state_c = encoder(encoder_inputs) # We discard `encoder_outputs` and only keep the states. encoder_states = [state_h, state_c] # Set up the decoder, using `encoder_states` as initial state. decoder_inputs = Input(shape=(None, num_decoder_tokens)) # We set up our decoder to return full output sequences, # and to return internal states as well. We don't use the # return states in the training model, but we will use them in inference. decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True) decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states) decoder_dense = Dense(num_decoder_tokens, activation='softmax') decoder_outputs = decoder_dense(decoder_outputs) # Define the model that will turn # `encoder_input_data` & `decoder_input_data` into `decoder_target_data` model = Model([encoder_inputs, decoder_inputs], decoder_outputs) # plot the model plot_model(model, to_file='model.png', show_shapes=True) # define encoder inference model encoder_model = Model(encoder_inputs, encoder_states) # define decoder inference model decoder_state_input_h = Input(shape=(latent_dim,)) decoder_state_input_c = Input(shape=(latent_dim,)) decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c] decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs) decoder_states = [state_h, state_c] decoder_outputs = decoder_dense(decoder_outputs) decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states) # summarize model plot_model(encoder_model, to_file='encoder_model.png', show_shapes=True) plot_model(decoder_model, to_file='decoder_model.png', show_shapes=True) ``` 運行該示例定義了訓練模型，推理編碼器和推理解碼器。然后創建所有三個模型的圖。 ![Graph of Encoder Model For Inference](img/9bf817dc8602143f240fc3fd82f44a17.jpg) 用于推理的編碼器模型圖編碼器的圖是直截了當的。解碼器顯示解碼翻譯序列中單個字符所需的三個輸入，到目前為止的編碼轉換輸出，以及首先從編碼器然后從解碼器的輸出提供的隱藏和單元狀態，因為模型被遞歸調用給定的翻譯。 ![Graph of Decoder Model For Inference](img/35033e3c96830fc0f20c05871454d37f.jpg) 用于推理的解碼器模型圖 ## 進一步閱讀如果您希望深入了解，本節將提供有關該主題的更多資源。 * [Francois Chollet 在 Twitter](https://twitter.com/fchollet) * [Keras](https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html) 中序列到序列學習的十分鐘介紹 * [Keras seq2seq 代碼示例（lstm_seq2seq）](https://github.com/fchollet/keras/blob/master/examples/lstm_seq2seq.py) * [Keras 功能 API](https://keras.io/getting-started/functional-api-guide/) * [Keras 的 LSTM API](https://keras.io/layers/recurrent/#lstm) * [長期短期記憶](http://www.bioinf.jku.at/publications/older/2604.pdf)，1997 年。 * [了解 LSTM 網絡](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)，2015 年。 * [用神經網絡進行序列學習的序列](https://arxiv.org/abs/1409.3215)，2014。 * [使用 RNN 編碼器 - 解碼器進行統計機器翻譯的學習短語表示](https://arxiv.org/abs/1406.1078)，2014。 **更新** 有關如何在獨立問題上使用此模型的示例，請參閱此帖子： * [如何開發 Keras 中序列到序列預測的編碼器 - 解碼器模型](https://machinelearningmastery.com/develop-encoder-decoder-model-sequence-sequence-prediction-keras/) ## 摘要在這篇文章中，您發現了如何定義用于機器翻譯的編碼器 - 解碼器序列到序列預測模型，如 Keras 深度學習庫的作者所描述的。具體來說，你學到了： * 神奇機器翻譯示例與 Keras 一起提供并在 Keras 博客上進行了描述。 * 如何正確定義編碼器 - 解碼器 LSTM 以訓練神經機器翻譯模型。 * 如何正確定義推理模型以使用經過訓練的編碼器 - 解碼器模型來轉換新序列。你有任何問題嗎？在下面的評論中提出您的問題，我會盡力回答。