使用 Keras 在 Python 中進行 LSTM 循環神經網絡的文本生成 · Machine Learning Mastery 博客文章翻譯

# 使用 Keras 在 Python 中進行 LSTM 循環神經網絡的文本生成 > 原文： [https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/](https://machinelearningmastery.com/text-generation-lstm-recurrent-neural-networks-python-keras/) 循環神經網絡也可以用作生成模型。這意味著除了用于預測模型（進行預測）之外，他們還可以學習問題的序列，然后為問題域生成全新的合理序列。像這樣的生成模型不僅可用于研究模型學習問題的程度，還可以了解有關問題領域本身的更多信息。在這篇文章中，您將了解如何使用 Keras 中的 Python 中的 LSTM 循環神經網絡逐個字符地創建文本的生成模型。閱讀這篇文章后你會知道： * 在哪里下載免費的文本語料庫，您可以使用它來訓練文本生成模型。 * 如何將文本序列問題構建為循環神經網絡生成模型。 * 如何開發 LSTM 以針對給定問題生成合理的文本序列。讓我們開始吧。 **注意**：LSTM 循環神經網絡訓練速度很慢，強烈建議您在 GPU 硬件上進行訓練。您可以使用 Amazon Web Services 非常便宜地訪問云中的 GPU 硬件，[請參閱此處的教程](http://machinelearningmastery.com/develop-evaluate-large-deep-learning-models-keras-amazon-web-services/)。 * **2016 年 10 月更新**：修復了代碼中的一些小錯誤拼寫錯誤。 * **2017 年 3 月更新**：更新了 Keras 2.0.2，TensorFlow 1.0.1 和 Theano 0.9.0 的示例。 ![Text Generation With LSTM Recurrent Neural Networks in Python with Keras](img/ce1bbf908214dba8ac5ef35fd8c2b3e6.jpg) 用 Keras 在 Python 中使用 LSTM 循環神經網絡生成文本 [Russ Sanderlin](https://www.flickr.com/photos/tearstone/5028273685/) ，保留一些權利。 ## 問題描述：古騰堡項目許多經典文本不再受版權保護。這意味著您可以免費下載這些書籍的所有文本，并在實驗中使用它們，例如創建生成模型。也許獲取不受版權保護的免費書籍的最佳地點是 [Project Gutenberg](https://www.gutenberg.org) 。在本教程中，我們將使用童年時代最喜歡的書作為數據集：[劉易斯卡羅爾的愛麗絲夢游仙境](https://www.gutenberg.org/ebooks/11)。我們將學習字符之間的依賴關系和序列中字符的條件概率，這樣我們就可以生成全新的原始字符序列。這很有趣，我建議用 Project Gutenberg 的其他書重復這些實驗，[這里是網站上最受歡迎的書籍列表](https://www.gutenberg.org/ebooks/search/%3Fsort_order%3Ddownloads)。這些實驗不僅限于文本，您還可以嘗試其他 ASCII 數據，例如計算機源代碼，LaTeX 中標記的文檔，HTML 或 Markdown 等。您可以[免費下載本書的 ASCII 格式](http://www.gutenberg.org/cache/epub/11/pg11.txt)（純文本 UTF-8）全文，并將其放在工作目錄中，文件名為 **wonderland.txt** 。現在我們需要準備好數據集以進行建模。 Project Gutenberg 為每本書添加了標準頁眉和頁腳，這不是原始文本的一部分。在文本編輯器中打開文件并刪除頁眉和頁腳。標題很明顯，以文字結尾： ```py *** START OF THIS PROJECT GUTENBERG EBOOK ALICE'S ADVENTURES IN WONDERLAND *** ``` 頁腳是文本行后面的所有文本： ```py THE END ``` 您應該留下一個包含大約 3,330 行文本的文本文件。 ## 開發小型 LSTM 循環神經網絡在本節中，我們將開發一個簡單的 LSTM 網絡，以學習 Alice in Wonderland 中的角色序列。在下一節中，我們將使用此模型生成新的字符序列。讓我們首先導入我們打算用來訓練模型的類和函數。 ```py import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils ``` 接下來，我們需要將書籍的 ASCII 文本加載到內存中，并將所有字符轉換為小寫，以減少網絡必須學習的詞匯量。 ```py # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename).read() raw_text = raw_text.lower() ``` 既然本書已加載，我們必須準備數據以供神經網絡建模。我們不能直接對字符進行建模，而是必須將字符轉換為整數。我們可以通過首先在書中創建一組所有不同的字符，然后創建每個字符到唯一整數的映射來輕松完成此操作。 ```py # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) ``` 例如，書中唯一排序的小寫字符列表如下： ```py ['\n', '\r', ' ', '!', '"', "'", '(', ')', '*', ',', '-', '.', ':', ';', '?', '[', ']', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '\xbb', '\xbf', '\xef'] ``` 您可以看到，我們可能會刪除某些字符以進一步清理數據集，從而減少詞匯量并可能改進建模過程。現在已經加載了本書并準備了映射，我們可以總結數據集。 ```py n_chars = len(raw_text) n_vocab = len(chars) print "Total Characters: ", n_chars print "Total Vocab: ", n_vocab ``` 將代碼運行到此點會產生以下輸出。 ```py Total Characters: 147674 Total Vocab: 47 ``` 我們可以看到這本書的字符數不到 150,000，當轉換為小寫時，網絡詞匯表中只有 47 個不同的字符供網絡學習。遠遠超過字母表中的 26。我們現在需要定義網絡的訓練數據。在訓練過程中，如何選擇拆分文本并將其暴露給網絡，有很多靈活性。在本教程中，我們將書本文本拆分為子序列，其長度固定為 100 個字符，任意長度。我們可以輕松地按句子分割數據并填充較短的序列并截斷較長的序列。網絡的每個訓練模式由 100 個時間步長組成，一個字符（X）后跟一個字符輸出（y）。在創建這些序列時，我們一次一個字符地沿著整本書滑動這個窗口，允許每個角色從它前面的 100 個字符中學習（當然前 100 個字符除外）。例如，如果序列長度為 5（為簡單起見），則前兩個訓練模式如下： ```py CHAPT -> E HAPTE -> R ``` 當我們將書分成這些序列時，我們使用我們之前準備的查找表將字符轉換為整數。 ```py # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print "Total Patterns: ", n_patterns ``` 運行代碼到這一點向我們展示了當我們將數據集拆分為網絡的訓練數據時，我們知道我們只有不到 150,000 個訓練模式。這有意義，因為排除前 100 個字符，我們有一個訓練模式來預測每個剩余的字符。 ```py Total Patterns: ?147574 ``` 現在我們已經準備好了訓練數據，我們需要對其進行轉換，以便它適合與 Keras 一起使用。首先，我們必須將輸入序列列表轉換為 LSTM 網絡所期望的 _[樣本，時間步長，特征]_ 形式。接下來，我們需要將整數重新縮放到 0 到 1 的范圍，以使默認情況下使用 sigmoid 激活函數的 LSTM 網絡更容易學習模式。最后，我們需要將輸出模式（轉換為整數的單個字符）轉換為一個熱編碼。這樣我們就可以配置網絡來預測詞匯表中 47 個不同字符中每個字符的概率（更容易表示），而不是試圖強制它準確地預測下一個字符。每個 y 值都被轉換為一個長度為 47 的稀疏向量，除了在模式所代表的字母（整數）的列中有 1 之外，它們都是零。例如，當“n”（整數值 31）是一個熱編碼時，它看起來如下： ```py [ 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 1\. 0\. 0\. 0\. 0. 0\. 0\. 0\. 0\. 0\. 0\. 0\. 0.] ``` 我們可以執行以下步驟。 ```py # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab) # one hot encode the output variable y = np_utils.to_categorical(dataY) ``` 我們現在可以定義我們的 LSTM 模型。在這里，我們定義了一個具有 256 個內存單元的隱藏 LSTM 層。網絡使用概率為 20 的丟失。輸出層是密集層，使用 softmax 激活函數輸出 0 和 1 之間的 47 個字符中的每一個的概率預測。問題實際上是 47 個類的單個字符分類問題，因此被定義為優化日志損失（交叉熵），這里使用 ADAM 優化算法來提高速度。 ```py # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam') ``` 沒有測試數據集。我們正在對整個訓練數據集進行建模，以了解序列中每個字符的概率。我們對訓練數據集的最準確（分類準確性）模型不感興趣。這將是一個完美預測訓練數據集中每個角色的模型。相反，我們感興趣的是最小化所選損失函數的數據集的概括。我們正在尋求在泛化和過度擬合之間取得平衡，但缺乏記憶。網絡訓練緩慢（Nvidia K520 GPU 上每個迭代約 300 秒）。由于速度緩慢以及由于我們的優化要求，我們將使用模型檢查點來記錄每次在時期結束時觀察到損失改善時的所有網絡權重。我們將在下一節中使用最佳權重集（最低損失）來實例化我們的生成模型。 ```py # define the checkpoint filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] ``` 我們現在可以將模型與數據相匹配。在這里，我們使用適度數量的 20 個時期和 128 個模式的大批量大小。 ```py model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list) ``` 完整性代碼清單如下所示。 ```py # Small LSTM Network to Generate Text for Alice in Wonderland import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename).read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print "Total Characters: ", n_chars print "Total Vocab: ", n_vocab # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print "Total Patterns: ", n_patterns # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab) # one hot encode the output variable y = np_utils.to_categorical(dataY) # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam') # define the checkpoint filepath="weights-improvement-{epoch:02d}-{loss:.4f}.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(X, y, epochs=20, batch_size=128, callbacks=callbacks_list) ``` 由于模型的隨機性，您將看到不同的結果，并且因為很難為 LSTM 模型修復隨機種子以獲得 100％可重復的結果。這不是這個生成模型的關注點。運行該示例后，您應該在本地目錄中有許多權重檢查點文件。除了丟失值最小的那個之外，您可以刪除它們。例如，當我運行這個例子時，下面是我實現的損失最小的檢查點。 ```py weights-improvement-19-1.9435.hdf5 ``` 網絡損失幾乎每個時代都在減少，我預計網絡可以從更多時代的訓練中受益。在下一節中，我們將介紹如何使用此模型生成新的文本序列。 ## 使用 LSTM 網絡生成文本使用經過訓練的 LSTM 網絡生成文本相對簡單。首先，我們以完全相同的方式加載數據并定義網絡，除了從檢查點文件加載網絡權重并且不需要訓練網絡。 ```py # load the network weights filename = "weights-improvement-19-1.9435.hdf5" model.load_weights(filename) model.compile(loss='categorical_crossentropy', optimizer='adam') ``` 此外，在準備將唯一字符映射到整數時，我們還必須創建一個反向映射，我們可以使用它將整數轉換回字符，以便我們可以理解預測。 ```py int_to_char = dict((i, c) for i, c in enumerate(chars)) ``` 最后，我們需要實際做出預測。使用 Keras LSTM 模型進行預測的最簡單方法是首先以種子序列作為輸入開始，生成下一個字符然后更新種子序列以在末尾添加生成的字符并修剪第一個字符。只要我們想要預測新字符（例如，長度為 1,000 個字符的序列），就重復該過程。我們可以選擇隨機輸入模式作為種子序列，然后在生成它們時打印生成的字符。 ```py # pick a random seed start = numpy.random.randint(0, len(dataX)-1) pattern = dataX[start] print "Seed:" print "\"", ''.join([int_to_char[value] for value in pattern]), "\"" # generate characters for i in range(1000): x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(n_vocab) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] sys.stdout.write(result) pattern.append(index) pattern = pattern[1:len(pattern)] print "\nDone." ``` 下面列出了使用加載的 LSTM 模型生成文本的完整代碼示例，以確保完整性。 ```py # Load LSTM network and generate text import sys import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename).read() raw_text = raw_text.lower() # create mapping of unique chars to integers, and a reverse mapping chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) int_to_char = dict((i, c) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print "Total Characters: ", n_chars print "Total Vocab: ", n_vocab # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print "Total Patterns: ", n_patterns # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab) # one hot encode the output variable y = np_utils.to_categorical(dataY) # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]))) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) # load the network weights filename = "weights-improvement-19-1.9435.hdf5" model.load_weights(filename) model.compile(loss='categorical_crossentropy', optimizer='adam') # pick a random seed start = numpy.random.randint(0, len(dataX)-1) pattern = dataX[start] print "Seed:" print "\"", ''.join([int_to_char[value] for value in pattern]), "\"" # generate characters for i in range(1000): x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(n_vocab) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] sys.stdout.write(result) pattern.append(index) pattern = pattern[1:len(pattern)] print "\nDone." ``` 運行此示例首先輸出所選的隨機種子，然后輸出生成的每個字符。例如，下面是此文本生成器的一次運行的結果。隨機種子是： ```py be no mistake about it: it was neither more nor less than a pig, and she felt that it would be quit ``` 隨機種子生成的文本（清理后用于演示）是： ```py be no mistake about it: it was neither more nor less than a pig, and she felt that it would be quit e aelin that she was a little want oe toiet ano a grtpersent to the tas a little war th tee the tase oa teettee the had been tinhgtt a little toiee at the cadl in a long tuiee aedun thet sheer was a little tare gereen to be a gentle of the tabdit soenee the gad ouw ie the tay a tirt of toiet at the was a little anonersen, and thiu had been woite io a lott of tueh a tiie and taede bot her aeain she cere thth the bene tith the tere bane to tee toaete to tee the harter was a little tire the same oare cade an anl ano the garee and the was so seat the was a little gareen and the sabdit, and the white rabbit wese tilel an the caoe and the sabbit se teeteer, and the white rabbit wese tilel an the cade in a lonk tfne the sabdi ano aroing to tea the was sf teet whitg the was a little tane oo thete the sabeit she was a little tartig to the tar tf tee the tame of the cagd, and the white rabbit was a little toiee to be anle tite thete ofs and the tabdit was the wiite rabbit, and ``` 我們可以注意到有關生成文本的一些觀察。 * 它通常符合原始文本中觀察到的行格式，在新行之前少于 80 個字符。 * 字符被分成單詞組，大多數組是實際的英語單詞（例如“the”，“little”和“was”），但許多組不是（例如“lott”，“tiie”和“taede”）。 * 順序中的一些詞是有意義的（例如“_ 和白兔 _”），但許多詞沒有（例如“ _wese tilel_ ”）。這本基于角色的本書模型產生這樣的輸出這一事實令人印象深刻。它讓您了解 LSTM 網絡的學習能力。結果并不完美。在下一節中，我們將通過開發更大的 LSTM 網絡來提高結果的質量。 ## 更大的 LSTM 循環神經網絡我們得到了結果，但在上一節中沒有出色的結果。現在，我們可以嘗試通過創建更大的網絡來提高生成文本的質量。我們將內存單元的數量保持為 256，但添加第二層。 ```py model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(256)) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam') ``` 我們還將更改檢查點權重的文件名，以便我們可以區分此網絡和之前的權重（通過在文件名中附加“更大”一詞）。 ```py filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5" ``` 最后，我們將訓練時期的數量從 20 個增加到 50 個，并將批量大小從 128 個減少到 64 個，以便為網絡提供更多的機會進行更新和學習。完整代碼清單如下所示。 ```py # Larger LSTM Network to Generate Text for Alice in Wonderland import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename).read() raw_text = raw_text.lower() # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print "Total Characters: ", n_chars print "Total Vocab: ", n_vocab # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print "Total Patterns: ", n_patterns # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab) # one hot encode the output variable y = np_utils.to_categorical(dataY) # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(256)) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam') # define the checkpoint filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint] # fit the model model.fit(X, y, epochs=50, batch_size=64, callbacks=callbacks_list) ``` 運行此示例需要一些時間，每個時期至少 700 秒。運行此示例后，您可能會損失大約 1.2。例如，我通過運行此模型獲得的最佳結果存儲在一個名稱為的檢查點文件中： ```py weights-improvement-47-1.2219-bigger.hdf5 ``` 在 47 迭代實現虧損 1.2219。與上一節一樣，我們可以使用運行中的最佳模型來生成文本。我們需要對上一節中的文本生成腳本進行的唯一更改是在網絡拓撲的規范中以及從哪個文件中為網絡權重設定種子。完整性代碼清單如下所示。 ```py # Load Larger LSTM network and generate text import sys import numpy from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.callbacks import ModelCheckpoint from keras.utils import np_utils # load ascii text and covert to lowercase filename = "wonderland.txt" raw_text = open(filename).read() raw_text = raw_text.lower() # create mapping of unique chars to integers, and a reverse mapping chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) int_to_char = dict((i, c) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars) print "Total Characters: ", n_chars print "Total Vocab: ", n_vocab # prepare the dataset of input to output pairs encoded as integers seq_length = 100 dataX = [] dataY = [] for i in range(0, n_chars - seq_length, 1): seq_in = raw_text[i:i + seq_length] seq_out = raw_text[i + seq_length] dataX.append([char_to_int[char] for char in seq_in]) dataY.append(char_to_int[seq_out]) n_patterns = len(dataX) print "Total Patterns: ", n_patterns # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab) # one hot encode the output variable y = np_utils.to_categorical(dataY) # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(256)) model.add(Dropout(0.2)) model.add(Dense(y.shape[1], activation='softmax')) # load the network weights filename = "weights-improvement-47-1.2219-bigger.hdf5" model.load_weights(filename) model.compile(loss='categorical_crossentropy', optimizer='adam') # pick a random seed start = numpy.random.randint(0, len(dataX)-1) pattern = dataX[start] print "Seed:" print "\"", ''.join([int_to_char[value] for value in pattern]), "\"" # generate characters for i in range(1000): x = numpy.reshape(pattern, (1, len(pattern), 1)) x = x / float(n_vocab) prediction = model.predict(x, verbose=0) index = numpy.argmax(prediction) result = int_to_char[index] seq_in = [int_to_char[value] for value in pattern] sys.stdout.write(result) pattern.append(index) pattern = pattern[1:len(pattern)] print "\nDone." ``` 運行此文本生成腳本的一個示例生成下面的輸出。隨機選擇的種子文本是： ```py d herself lying on the bank, with her head in the lap of her sister, who was gently brushing away s ``` 生成的文本與種子（清理用于演示）是： ```py herself lying on the bank, with her head in the lap of her sister, who was gently brushing away so siee, and she sabbit said to herself and the sabbit said to herself and the sood way of the was a little that she was a little lad good to the garden, and the sood of the mock turtle said to herself, 'it was a little that the mock turtle said to see it said to sea it said to sea it say it the marge hard sat hn a little that she was so sereated to herself, and she sabbit said to herself, 'it was a little little shated of the sooe of the coomouse it was a little lad good to the little gooder head. and said to herself, 'it was a little little shated of the mouse of the good of the courte, and it was a little little shated in a little that the was a little little shated of the thmee said to see it was a little book of the was a little that she was so sereated to hare a little the began sitee of the was of the was a little that she was so seally and the sabbit was a little lad good to the little gooder head of the gad seared to see it was a little lad good to the little good ``` 我們可以看到，通常拼寫錯誤較少，文本看起來更逼真，但仍然是非常荒謬的。例如，相同的短語一次又一次地重復，例如“_ 對自己說 _”和“_ 少 _”。行情已經開啟但尚未平倉。這些都是更好的結果，但仍有很大的改進空間。 ## 改進模型的 10 個擴展思路以下是可以進一步改進您可以嘗試的模型的 10 個想法： * 預測少于 1,000 個字符作為給定種子的輸出。 * 從源文本中刪除所有標點符號，從而從模型的詞匯表中刪除。 * 嘗試對輸入序列進行熱編碼。 * 在填充句子而不是隨機字符序列上訓練模型。 * 將訓練時期的數量增加到 100 或數百。 * 將 dropout 添加到可見輸入層并考慮調整丟失百分比。 * 調整批量大小，嘗試批量大小為 1 作為（非常慢）基線，并從那里開始更大的尺寸。 * 向層和/或更多層添加更多內存單元。 * 在解釋預測概率時，對比例因子（[溫度](https://en.wikipedia.org/wiki/Softmax_function#Reinforcement_learning)）進行實驗。 * 將 LSTM 層更改為“有狀態”以維護批次之間的狀態。你嘗試過這些擴展嗎？在評論中分享您的結果。 ## 資源該字符文本模型是使用循環神經網絡生成文本的流行方式。如果您有興趣深入了解，下面是一些關于該主題的更多資源和教程。也許最受歡迎的是 Andrej Karpathy 的教程，題為“[循環神經網絡的不合理效力](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)”。 * [使用循環神經網絡生成文本](http://www.cs.utoronto.ca/~ilya/pubs/2011/LANG-RNN.pdf) [pdf]，2011 * [用于文本生成的 LSTM 的 Keras 代碼示例](https://github.com/fchollet/keras/blob/master/examples/lstm_text_generation.py)。 * [用于文本生成的 LSTM 的烤寬面條代碼示例](https://github.com/Lasagne/Recipes/blob/master/examples/lstm_text_generation.py)。 * [MXNet 教程，用于使用 LSTM 進行文本生成](http://mxnetjl.readthedocs.io/en/latest/tutorial/char-lstm.html)。 * [使用循環神經網絡](https://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/)自動生成 Clickbait。 ## 摘要在這篇文章中，您了解了如何使用 Keras 深度學習庫開發用于 Python 文本生成的 LSTM 循環神經網絡。閱讀這篇文章后你知道： * 在哪里可以免費下載經典書籍的 ASCII 文本，以便進行訓練。 * 如何在文本序列上訓練 LSTM 網絡以及如何使用訓練有素的網絡生成新序列。 * 如何開發堆疊 LSTM 網絡并提升模型的表現。您對 LSTM 網絡或此帖子的文本生成有任何疑問嗎？在下面的評論中提出您的問題，我會盡力回答。