TensorFlow 高效編程 · ApacheCN 深度學習譯文集

# TensorFlow 高效編程 > 原文：[vahidk/EffectiveTensorflow](https://github.com/vahidk/EffectiveTensorflow) > 譯者：[FesianXu](https://my.csdn.net/loseinvain)、[飛龍](https://github.com/wizardforcel) > 協議：[CC BY-NC-SA 4.0](http://creativecommons.org/licenses/by-nc-sa/4.0/) ## 一、TensorFlow 基礎 TensorFlow 和其他數字計算庫（如 numpy）之間最明顯的區別在于 TensorFlow 中操作的是符號。這是一個強大的功能，這保證了 TensorFlow 可以做很多其他庫（例如 numpy）不能完成的事情（例如自動區分）。這可能也是它更復雜的原因。今天我們來一步步探秘 TensorFlow，并為更有效地使用 TensorFlow 提供了一些指導方針和最佳實踐。我們從一個簡單的例子開始，我們要乘以兩個隨機矩陣。首先我們來看一下在 numpy 中如何實現： ```py import numpy as np x = np.random.normal(size=[10, 10]) y = np.random.normal(size=[10, 10]) z = np.dot(x, y) print(z) ``` 現在我們使用 TensorFlow 中執行完全相同的計算： ```py import TensorFlow as tf x = tf.random_normal([10, 10]) y = tf.random_normal([10, 10]) z = tf.matmul(x, y) sess = tf.Session() z_val = sess.run(z) print(z_val) ``` 與立即執行計算并將結果復制給輸出變量`z`的 numpy 不同，TensorFlow 只給我們一個可以操作的張量類型。如果我們嘗試直接打印`z`的值，我們得到這樣的東西： ```py Tensor("MatMul:0", shape=(10, 10), dtype=float32) ``` 由于兩個輸入都是已經定義的類型，TensorFlow 能夠推斷張量的符號及其類型。為了計算張量的值，我們需要創建一個會話并使用`Session.run`方法進行評估。要了解如此強大的符號計算到底是什么，我們可以看看另一個例子。假設我們有一個曲線的樣本（例如`f(x)= 5x ^ 2 + 3`），并且我們要估計`f(x)`在不知道它的參數的前提下。我們定義參數函數為`g(x，w)= w0 x ^ 2 + w1 x + w2`，它是輸入`x`和潛在參數`w`的函數，我們的目標是找到潛在參數，使得`g(x, w)≈f(x)`。這可以通過最小化損失函數來完成：`L(w)=(f(x)-g(x，w))^ 2`。雖然這問題有一個簡單的封閉式的解決方案，但是我們選擇使用一種更為通用的方法，可以應用于任何可以區分的任務，那就是使用隨機梯度下降。我們在一組采樣點上簡單地計算相對于`w`的`L(w)`的平均梯度，并沿相反方向移動。以下是在 TensorFlow 中如何完成： ```py import numpy as np import TensorFlow as tf x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.nn.l2_loss(yhat - y) + 0.1 * tf.nn.l2_loss(w) train_op = tf.train.AdamOptimizer(0.1).minimize(loss) def generate_data(): x_val = np.random.uniform(-10.0, 10.0, size=100) y_val = 5 * np.square(x_val) + 3 return x_val, y_val sess = tf.Session() sess.run(tf.global_variables_initializer()) for _ in range(1000): x_val, y_val = generate_data() _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) print(loss_val) print(sess.run([w])) ``` 通過運行這段代碼，我們可以看到下面這組數據： ``` [4.9924135, 0.00040895029, 3.4504161] ``` 這與我們的參數已經相當接近。這只是 TensorFlow 可以做的冰山一角。許多問題，如優化具有數百萬個參數的大型神經網絡，都可以在 TensorFlow 中使用短短的幾行代碼高效地實現。而且 TensorFlow 可以跨多個設備和線程進行擴展，并支持各種平臺。 ## 二、理解靜態和動態形狀在 **TensorFlow** 中，`tensor`有一個在圖構建過程中就被決定的**靜態形狀屬性**，這個靜態形狀可以是**未規定的**，比如，我們可以定一個具有形狀`[None, 128]`大小的`tensor`。 ```python import TensorFlow as tf a = tf.placeholder(tf.float32, [None, 128]) ``` 這意味著`tensor`的第一個維度可以是任何尺寸，這個將會在`Session.run()`中被動態定義。當然，你可以查詢一個`tensor`的靜態形狀，如： ```python static_shape = a.shape.as_list() # returns [None, 128] ``` 為了得到一個`tensor`的動態形狀，你可以調用`tf.shape`操作，這將會返回指定tensor的形狀，如： ```python dynamic_shape = tf.shape(a) ``` `tensor`的靜態形狀可以通過方法`Tensor_name.set_shape()`設定，如： ```python a.set_shape([32, 128]) # static shape of a is [32, 128] a.set_shape([None, 128]) # first dimension of a is determined dynamically ``` 調用`tf.reshape()`方法，你可以動態地重塑一個`tensor`的形狀，如： ```python a = tf.reshape(a, [32, 128]) ``` 可以定義一個函數，當靜態形狀的時候返回其靜態形狀，當靜態形狀不存在時，返回其動態形狀，如： ```python def get_shape(tensor): static_shape = tensor.shape.as_list() dynamic_shape = tf.unstack(tf.shape(tensor)) dims = [s[1] if s[0] is None else s[0] for s in zip(static_shape, dynamic_shape)] return dims ``` 現在，如果我們需要將一個三階的`tensor`轉變為 2 階的`tensor`，通過折疊第二維和第三維成一個維度，我們可以通過我們剛才定義的`get_shape()`方法進行，如： ```python b = tf.placeholder(tf.float32, [None, 10, 32]) shape = get_shape(b) b = tf.reshape(b, [shape[0], shape[1] * shape[2]]) ``` 注意到無論這個`tensor`的形狀是靜態指定的還是動態指定的，這個代碼都是有效的。事實上，我們可以寫出一個通用的`reshape`函數，用于折疊維度的任意列表: ```python import TensorFlow as tf import numpy as np def reshape(tensor, dims_list): shape = get_shape(tensor) dims_prod = [] for dims in dims_list: if isinstance(dims, int): dims_prod.append(shape[dims]) elif all([isinstance(shape[d], int) for d in dims]): dims_prod.append(np.prod([shape[d] for d in dims])) else: dims_prod.append(tf.prod([shape[d] for d in dims])) tensor = tf.reshape(tensor, dims_prod) return tensor ``` 然后折疊第二個維度就變得特別簡單了。 ```python b = tf.placeholder(tf.float32, [None, 10, 32]) b = reshape(b, [0, [1, 2]]) ``` ## 三、作用域和何時使用它在 TensorFlow 中，變量和張量有一個名字屬性，用于作為他們在圖中的標識。如果你在創造變量或者張量的時候，不給他們顯式地指定一個名字，那么 TF 將會自動地，隱式地給他們分配名字，如： ```python a = tf.constant(1) print(a.name) # prints "Const:0" b = tf.Variable(1) print(b.name) # prints "Variable:0" ``` 你也可以在定義的時候，通過顯式地給變量或者張量命名，這樣將會重寫他們的默認名，如： ```python a = tf.constant(1, name="a") print(a.name) # prints "b:0" b = tf.Variable(1, name="b") print(b.name) # prints "b:0" ``` TF 引進了兩個不同的上下文管理器，用于更改張量或者變量的名字，第一個就是`tf.name_scope`，如： ```python with tf.name_scope("scope"): a = tf.constant(1, name="a") print(a.name) # prints "scope/a:0" b = tf.Variable(1, name="b") print(b.name) # prints "scope/b:0" c = tf.get_variable(name="c", shape=[]) print(c.name) # prints "c:0" ``` 我們注意到，在 TF 中，我們有兩種方式去定義一個新的變量，通過`tf.Variable()`或者調用`tf.get_variable()`。在調用`tf.get_variable()`的時候，給予一個新的名字，將會創建一個新的變量，但是如果這個名字并不是一個新的名字，而是已經存在過這個變量作用域中的，那么就會拋出一個`ValueError`異常，意味著重復聲明一個變量是不被允許的。 `tf.name_scope()`只會影響到**通過調用`tf.Variable`創建的**張量和變量的名字，而**不會影響到通過調用`tf.get_variable()`創建**的變量和張量。和`tf.name_scope()`不同，`tf.variable_scope()`也會修改，影響通過`tf.get_variable()`創建的變量和張量，如： ```python with tf.variable_scope("scope"): a = tf.constant(1, name="a") print(a.name) # prints "scope/a:0" b = tf.Variable(1, name="b") print(b.name) # prints "scope/b:0" c = tf.get_variable(name="c", shape=[]) print(c.name) # prints "scope/c:0" with tf.variable_scope("scope"): a1 = tf.get_variable(name="a", shape=[]) a2 = tf.get_variable(name="a", shape=[]) # Disallowed ``` 但是如果我們真的想要重復使用一個先前聲明過了變量怎么辦呢？變量管理器同樣提供了一套機制去實現這個需求： ```python with tf.variable_scope("scope"): a1 = tf.get_variable(name="a", shape=[]) with tf.variable_scope("scope", reuse=True): a2 = tf.get_variable(name="a", shape=[]) # OK This becomes handy for example when using built-in neural network layers: features1 = tf.layers.conv2d(image1, filters=32, kernel_size=3) # Use the same convolution weights to process the second image: with tf.variable_scope(tf.get_variable_scope(), reuse=True): features2 = tf.layers.conv2d(image2, filters=32, kernel_size=3) ``` 這個語法可能看起來并不是特別的清晰明了。特別是，如果你在模型中想要實現一大堆的變量共享，你需要追蹤各個變量，比如說什么時候定義新的變量，什么時候要復用他們，這些將會變得特別麻煩而且容易出錯，因此 TF 提供了 TF 模版自動解決變量共享的問題： ```python conv3x32 = tf.make_template("conv3x32", lambda x: tf.layers.conv2d(x, 32, 3)) features1 = conv3x32(image1) features2 = conv3x32(image2) # Will reuse the convolution weights. ``` 你可以將任何函數都轉換為 TF 模版。當第一次調用這個模版的時候，在這個函數內聲明的變量將會被定義，同時在接下來的連續調用中，這些變量都將自動地復用。 ## 四、廣播的優缺點 TensorFlow 支持廣播機制，可以廣播逐元素操作。正常情況下，當你想要進行一些操作如加法，乘法時，你需要確保操作數的形狀是相匹配的，如：你不能將一個具有形狀`[3, 2]`的張量和一個具有`[3,4]`形狀的張量相加。但是，這里有一個特殊情況，那就是當你的其中一個操作數是一個某個維度為一的張量的時候，TF 會隱式地填充它的單一維度方向，以確保和另一個操作數的形狀相匹配。所以，對一個`[3,2]`的張量和一個`[3,1]`的張量相加在 TF 中是合法的。 ```python import TensorFlow as tf a = tf.constant([[1., 2.], [3., 4.]]) b = tf.constant([[1.], [2.]]) # c = a + tf.tile(b, [1, 2]) c = a + b ``` 廣播機制允許我們在隱式情況下進行填充，而這可以使得我們的代碼更加簡潔，并且更有效率地利用內存，因為我們不需要另外儲存填充操作的結果。一個可以表現這個優勢的應用場景就是在結合具有不同長度的特征向量的時候。為了拼接具有不同長度的特征向量，我們一般都先填充輸入向量，拼接這個結果然后進行之后的一系列非線性操作等。這是一大類神經網絡架構的共同模式。 ```python a = tf.random_uniform([5, 3, 5]) b = tf.random_uniform([5, 1, 6]) # concat a and b and apply nonlinearity tiled_b = tf.tile(b, [1, 3, 1]) c = tf.concat([a, tiled_b], 2) d = tf.layers.dense(c, 10, activation=tf.nn.relu) ``` 但是這個可以通過廣播機制更有效地完成。我們利用事實`f(m(x+y))=f(mx+my)f(m(x+y))=f(mx+my)f(m(x+y))=f(mx+my)`，簡化我們的填充操作。因此，我們可以分離地進行這個線性操作，利用廣播機制隱式地完成拼接操作。 ```python pa = tf.layers.dense(a, 10, activation=None) pb = tf.layers.dense(b, 10, activation=None) d = tf.nn.relu(pa + pb) ``` 事實上，這個代碼足夠通用，并且可以在具有任意形狀的張量間應用： ```python def merge(a, b, units, activation=tf.nn.relu): pa = tf.layers.dense(a, units, activation=None) pb = tf.layers.dense(b, units, activation=None) c = pa + pb if activation is not None: c = activation(c) return c ``` 一個更為通用函數形式如上所述：目前為止，我們討論了廣播機制的優點，但是同樣的廣播機制也有其缺點，隱式假設幾乎總是使得調試變得更加困難，考慮下面的例子： ```python a = tf.constant([[1.], [2.]]) b = tf.constant([1., 2.]) c = tf.reduce_sum(a + b) ``` 你猜這個結果是多少？如果你說是 6，那么你就錯了，答案應該是 12。這是因為當兩個張量的階數不匹配的時候，在進行元素間操作之前，TF 將會自動地在更低階數的張量的第一個維度開始擴展，所以這個加法的結果將會變為`[[2, 3], [3, 4]]`，所以這個`reduce`的結果是12. 解決這種麻煩的方法就是盡可能地顯式使用。我們在需要`reduce`某些張量的時候，顯式地指定維度，然后尋找這個 bug 就會變得簡單： ```python a = tf.constant([[1.], [2.]]) b = tf.constant([1., 2.]) c = tf.reduce_sum(a + b, 0) ``` 這樣，`c`的值就是`[5, 7]`，我們就容易猜到其出錯的原因。一個更通用的法則就是總是在`reduce`操作和在使用`tf.squeeze`中指定維度。 ## 五、向 TensorFlow 投喂數據 **TensorFlow** 被設計可以在大規模的數據情況下高效地運行。所以你需要記住千萬不要“餓著”你的 TF 模型，這樣才能得到最好的表現。一般來說，一共有三種方法可以“投喂”你的模型。 ### 常數方式（`tf.constant`）最簡單的方式莫過于直接將數據當成常數嵌入你的計算圖中，如： ```python import TensorFlow as tf import numpy as np actual_data = np.random.normal(size=[100]) data = tf.constant(actual_data) ``` 這個方式非常地高效，但是卻不靈活。這個方式存在一個大問題就是為了在其他數據集上復用你的模型，你必須要重寫你的計算圖，而且你必須同時加載所有數據，并且一直保存在內存里，這意味著這個方式僅僅適用于小數劇集的情況。 ### 占位符方式（`tf.placeholder`）可以通過占位符的方式解決剛才常數投喂網絡的問題，如： ```python import TensorFlow as tf import numpy as np data = tf.placeholder(tf.float32) prediction = tf.square(data) + 1 actual_data = np.random.normal(size=[100]) tf.Session().run(prediction, feed_dict={data: actual_data}) ``` 占位符操作符返回一個張量，他的值在會話（`session`）中通過人工指定的`feed_dict`參數得到。 ### python 操作（`tf.py_func`）還可以通過利用 python 操作投喂數據： ```python def py_input_fn(): actual_data = np.random.normal(size=[100]) return actual_data data = tf.py_func(py_input_fn, [], (tf.float32)) ``` python 操作允許你將一個常規的 python 函數轉換成一個 TF 的操作。 ### 利用 TF 的自帶數據集 API 最值得推薦的方式就是通過 TF 自帶的數據集 API 進行投喂數據，如： ```python actual_data = np.random.normal(size=[100]) dataset = tf.contrib.data.Dataset.from_tensor_slices(actual_data) data = dataset.make_one_shot_iterator().get_next() ``` 如果你需要從文件中讀入數據，你可能需要將文件轉化為`TFrecord`格式，這將會使得整個過程更加有效 ```python dataset = tf.contrib.data.Dataset.TFRecordDataset(path_to_data) ``` 查看[官方文檔](https://www.TensorFlow.org/api_guides/python/reading_data#Reading_from_files)，了解如何將你的數據集轉化為`TFrecord`格式。 ```python dataset = ... dataset = dataset.cache() if mode == tf.estimator.ModeKeys.TRAIN: dataset = dataset.repeat() dataset = dataset.shuffle(batch_size * 5) dataset = dataset.map(parse, num_threads=8) dataset = dataset.batch(batch_size) ``` 在讀入了數據之后，我們使用`Dataset.cache()`方法，將其緩存到內存中，以求更高的效率。在訓練模式中，我們不斷地重復數據集，這使得我們可以多次處理整個數據集。我們也需要打亂數據集得到批量，這個批量將會有不同的樣本分布。下一步，我們使用`Dataset.map()`方法，對原始數據進行預處理，將數據轉換成一個模型可以識別，利用的格式。然后，我們就通過`Dataset.batch()`，創造樣本的批量了。 ## 六、利用運算符重載和 Numpy 一樣，TensorFlow 重載了很多 python 中的運算符，使得構建計算圖更加地簡單，并且使得代碼具有可讀性。 **切片**操作是重載的諸多運算符中的一個，它可以使得索引張量變得很容易： ```python z = x[begin:end] # z = tf.slice(x, [begin], [end-begin]) ``` 但是在使用它的過程中，你還是需要非常地小心。切片操作非常低效，因此最好避免使用，特別是在切片的數量很大的時候。為了更好地理解這個操作符有多么地低效，我們先觀察一個例子。我們想要人工實現一個對矩陣的行進行`reduce`操作的代碼： ```python import TensorFlow as tf import time x = tf.random_uniform([500, 10]) z = tf.zeros([10]) for i in range(500): z += x[i] sess = tf.Session() start = time.time() sess.run(z) print("Took %f seconds." % (time.time() - start)) ``` 在筆者的 MacBook Pro 上，這個代碼花費了 2.67 秒！那么耗時的原因是我們調用了切片操作 500 次，這個運行起來超級慢的！一個更好的選擇是使用`tf.unstack()`操作去將一個矩陣切成一個向量的列表，而這只需要一次就行！ ```python z = tf.zeros([10]) for x_i in tf.unstack(x): z += x_i ``` 這個操作花費了 0.18 秒，當然，最正確的方式去實現這個需求是使用`tf.reduce_sum()`操作： ```python z = tf.reduce_sum(x, axis=0) ``` 這個僅僅使用了 0.008 秒，是原始實現的 300 倍！ TensorFlow 除了切片操作，也重載了一系列的數學邏輯運算，如： ```python z = -x # z = tf.negative(x) z = x + y # z = tf.add(x, y) z = x - y # z = tf.subtract(x, y) z = x * y # z = tf.mul(x, y) z = x / y # z = tf.div(x, y) z = x // y # z = tf.floordiv(x, y) z = x % y # z = tf.mod(x, y) z = x ** y # z = tf.pow(x, y) z = x @ y # z = tf.matmul(x, y) z = x > y # z = tf.greater(x, y) z = x >= y # z = tf.greater_equal(x, y) z = x < y # z = tf.less(x, y) z = x <= y # z = tf.less_equal(x, y) z = abs(x) # z = tf.abs(x) z = x & y # z = tf.logical_and(x, y) z = x | y # z = tf.logical_or(x, y) z = x ^ y # z = tf.logical_xor(x, y) z = ~x # z = tf.logical_not(x) ``` 你也可以使用這些操作符的增廣版本，如 `x += y`和`x **=2`同樣是合法的。注意到 python 不允許重載`and`，`or`和`not`等關鍵字。 TensorFlow 也不允許把張量當成`boolean`類型使用，因為這個很容易出錯： ```python x = tf.constant(1.) if x: # 這個將會拋出TypeError錯誤 ... ``` 如果你想要檢查這個張量的值的話，你也可以使用`tf.cond(x,...)`，或者使用`if x is None`去檢查這個變量的值。有些操作是不支持的，比如說等于判斷`==`和不等于判斷`!=`運算符，這些在 numpy 中得到了重載，但在 TF 中沒有重載。如果需要使用，請使用這些功能的函數版本`tf.equal()`和`tf.not_equal()`。 ## 七、理解執行順序和控制依賴我們知道，TensorFlow 是屬于符號式編程的，它不會直接運行定義了的操作，而是在計算圖中創造一個相關的節點，這個節點可以用`Session.run()`進行執行。這個使得 TF 可以在優化過程中決定優化的順序，并且在運算中剔除一些不需要使用的節點，而這一切都發生在運行中。如果你只是在計算圖中使用`tf.Tensors`，你就不需要擔心依賴問題，但是你更可能會使用`tf.Variable()`，這個操作使得問題變得更加困難。筆者的建議是如果張量不能滿足這個工作需求，那么僅僅使用`Variables`就足夠了。這個可能不夠直觀，我們不妨先觀察一個例子： ```python import TensorFlow as tf a = tf.constant(1) b = tf.constant(2) a = a + b tf.Session().run(a) ``` 計算`a`將會返回 3，就像期望中的一樣。注意到我們現在有 3 個張量，兩個常數張量和一個儲存加法結果的張量。注意到我們不能重寫一個張量的值，如果我們想要改變張量的值，我們就必須要創建一個新的張量，就像我們剛才做的那樣。 > **小提示：**如果你沒有顯式地定義一個新的計算圖，TF 將會自動地為你構建一個默認的計算圖。你可以使用`tf.get_default_graph()`去獲得一個計算圖的句柄，然后，你就可以查看這個計算圖了。比如，可以打印屬于這個計算圖的所有張量之類的的操作都是可以的。如： ```python print(tf.contrib.graph_editor.get_tensors(tf.get_default_graph())) ``` 不像張量，變量可以更新，所以讓我們用變量去實現我們剛才的需求： ```python a = tf.Variable(1) b = tf.constant(2) assign = tf.assign(a, a + b) sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(assign)) ``` 同樣，我們得到了 3，正如預期一樣。注意到`tf.assign()`返回的代表這個賦值操作的張量。目前為止，所有事情都顯得很棒，但是讓我們觀察一個稍微有點復雜的例子吧： ```python a = tf.Variable(1) b = tf.constant(2) c = a + b assign = tf.assign(a, 5) sess = tf.Session() for i in range(10): sess.run(tf.global_variables_initializer()) print(sess.run([assign, c])) ``` 注意到，張量`c`并沒有一個確定性的值。這個值可能是 3 或者 7，取決于加法和賦值操作誰先運行。你應該也注意到了，你在代碼中定義操作的順序是不會影響到在 TF 運行時的執行順序的，唯一會影響到執行順序的是**控制依賴**。控制依賴對于張量來說是直接的。每一次你在操作中使用一個張量時，操作將會定義一個對于這個張量來說的隱式的依賴。但是如果你同時也使用了變量，事情就變得更糟糕了，因為變量可以取很多值。當處理這些變量時，你可能需要顯式地去通過使用`tf.control_dependencies()`去控制依賴，如： ```python a = tf.Variable(1) b = tf.constant(2) c = a + b with tf.control_dependencies([c]): assign = tf.assign(a, 5) sess = tf.Session() for i in range(10): sess.run(tf.global_variables_initializer()) print(sess.run([assign, c])) ``` 這會確保賦值操作在加法操作之后被調用。 ## 八、控制流操作：條件和循環在構建復雜模型（如循環神經網絡）時，你可能需要通過條件和循環來控制操作流。在本節中，我們將介紹一些常用的控制流操作。假設你要根據謂詞決定，是否相乘或相加兩個給定的張量。這可以簡單地用`tf.cond`實現，它充當 python "if" 函數： ```py a = tf.constant(1) b = tf.constant(2) p = tf.constant(True) x = tf.cond(p, lambda: a + b, lambda: a * b) print(tf.Session().run(x)) ``` 由于在這種情況下謂詞為`True`，因此輸出將是加法的結果，即 3。大多數情況下，使用 TensorFlow 時，你使用的是大型張量，并希望批量執行操作。相關的條件操作是`tf.where`，類似于`tf.cond`，它接受謂詞，但是基于批量中的條件來選擇輸出。 ```py a = tf.constant([1, 1]) b = tf.constant([2, 2]) p = tf.constant([True, False]) x = tf.where(p, a + b, a * b) print(tf.Session().run(x)) ``` 這將返回`[3,2]`。另一種廣泛使用的控制流操作是`tf.while_loop`。它允許在 TensorFlow 中構建動態循環，這些循環操作可變長度的序列。讓我們看看如何使用`tf.while_loops`生成斐波那契序列： ```py n = tf.constant(5) def cond(i, a, b): return i < n def body(i, a, b): return i + 1, b, a + b i, a, b = tf.while_loop(cond, body, (2, 1, 1)) print(tf.Session().run(b)) ``` 這將打印 5。除了循環變量的初始值之外，`tf.while_loops`還接受條件函數和循環體函數。然后通過多次調用循環體函數來更新這些循環變量，直到條件返回`False`。現在想象我們想要保留整個斐波那契序列。我們可以更新我們的循環體來記錄當前值的歷史： ```py n = tf.constant(5) def cond(i, a, b, c): return i < n def body(i, a, b, c): return i + 1, b, a + b, tf.concat([c, [a + b]], 0) i, a, b, c = tf.while_loop(cond, body, (2, 1, 1, tf.constant([1, 1]))) print(tf.Session().run(c)) ``` 現在，如果你嘗試運行它，TensorFlow 會報錯，第四個循環變量的形狀改變了。因此，你必須明確指出它是有意的： ```py i, a, b, c = tf.while_loop( cond, body, (2, 1, 1, tf.constant([1, 1])), shape_invariants=(tf.TensorShape([]), tf.TensorShape([]), tf.TensorShape([]), tf.TensorShape([None]))) ``` 這不僅變得丑陋，而且效率也有些低下。請注意，我們正在構建許多我們不使用的中間張量。 TensorFlow 為這種不斷增長的陣列提供了更好的解決方案。看看`tf.TensorArray`。讓我們這次用張量數組做同樣的事情： ```py n = tf.constant(5) c = tf.TensorArray(tf.int32, n) c = c.write(0, 1) c = c.write(1, 1) def cond(i, a, b, c): return i < n def body(i, a, b, c): c = c.write(i, a + b) return i + 1, b, a + b, c i, a, b, c = tf.while_loop(cond, body, (2, 1, 1, c)) c = c.stack() print(tf.Session().run(c)) ``` TensorFlow while 循環和張量數組是構建復雜的循環神經網絡的基本工具。作為練習，嘗試使用`tf.while_loops`實現[集束搜索（beam search）](https://en.wikipedia.org/wiki/Beam_search)。使用張量數組可以使效率更高嗎？ ## 九、使用 Python 操作設計核心和高級可視化 TensorFlow 中的操作核心完全用 C++ 編寫，用于提高效率。但是用 C++ 編寫 TensorFlow 核心可能會非常痛苦。因此，在花費數小時實現核心之前，你可能希望快速創建原型，但效率低下。使用`tf.py_func()`，你可以將任何一段 python 代碼轉換為 TensorFlow 操作。例如，這就是如何在 TensorFlow 中將一個簡單的 ReLU 非線性核心實現為 python 操作： ```py import numpy as np import tensorflow as tf import uuid def relu(inputs): # Define the op in python def _relu(x): return np.maximum(x, 0.) # Define the op's gradient in python def _relu_grad(x): return np.float32(x > 0) # An adapter that defines a gradient op compatible with TensorFlow def _relu_grad_op(op, grad): x = op.inputs[0] x_grad = grad * tf.py_func(_relu_grad, [x], tf.float32) return x_grad # Register the gradient with a unique id grad_name = "MyReluGrad_" + str(uuid.uuid4()) tf.RegisterGradient(grad_name)(_relu_grad_op) # Override the gradient of the custom op g = tf.get_default_graph() with g.gradient_override_map({"PyFunc": grad_name}): output = tf.py_func(_relu, [inputs], tf.float32) return output ``` 要驗證梯度是否正確，可以使用 TensorFlow 的梯度檢查器： ```py x = tf.random_normal([10]) y = relu(x * x) with tf.Session(): diff = tf.test.compute_gradient_error(x, [10], y, [10]) print(diff) ``` `compute_gradient_error()`以數值方式計算梯度，并返回提供的梯度的差。我們想要的是非常低的差。請注意，此實現效率非常低，僅適用于原型設計，因為 python 代碼不可并行化，不能在 GPU 上運行。一旦驗證了你的想法，你肯定會想把它寫成 C++ 核心。在實踐中，我們通常使用 python 操作在 Tensorboard 上進行可視化。考慮你正在構建圖像分類模型，并希望在訓練期間可視化模型的預測情況。TensorFlow 允許使用`tf.summary.image()`函數可視化圖像： ```py image = tf.placeholder(tf.float32) tf.summary.image("image", image) ``` 但這只能顯示輸入圖像。為了顯示預測，你必須找到一種向圖像添加注釋的方法，這對現有操作幾乎是不可能的。更簡單的方法是在 python 中繪制，并將其包裝在 python 操作中： ```py import io import matplotlib.pyplot as plt import numpy as np import PIL import tensorflow as tf def visualize_labeled_images(images, labels, max_outputs=3, name="image"): def _visualize_image(image, label): # Do the actual drawing in python fig = plt.figure(figsize=(3, 3), dpi=80) ax = fig.add_subplot(111) ax.imshow(image[::-1,...]) ax.text(0, 0, str(label), horizontalalignment="left", verticalalignment="top") fig.canvas.draw() # Write the plot as a memory file. buf = io.BytesIO() data = fig.savefig(buf, format="png") buf.seek(0) # Read the image and convert to numpy array img = PIL.Image.open(buf) return np.array(img.getdata()).reshape(img.size[0], img.size[1], -1) def _visualize_images(images, labels): # Only display the given number of examples in the batch outputs = [] for i in range(max_outputs): output = _visualize_image(images[i], labels[i]) outputs.append(output) return np.array(outputs, dtype=np.uint8) # Run the python op. figs = tf.py_func(_visualize_images, [images, labels], tf.uint8) return tf.summary.image(name, figs) ``` 請注意，由于摘要通常僅僅偶爾（不是每步）求值一次，因此可以在實踐中使用此實現而不必擔心效率。 ## 十、多 GPU 和數據并行如果你使用 C++ 等語言為單個 CPU 核心編寫軟件，并使其在多個 GPU 上并行運行，則需要從頭開始重寫軟件。但TensorFlow并非如此。由于其象征性，TensorFlow 可以隱藏所有這些復雜性，使得無需在多個 CPU 和 GPU 上擴展程序。讓我們以在 CPU 上相加兩個向量的簡單示例開始： ```py import tensorflow as tf with tf.device(tf.DeviceSpec(device_type="CPU", device_index=0)): a = tf.random_uniform([1000, 100]) b = tf.random_uniform([1000, 100]) c = a + b tf.Session().run(c) ``` GPU 上可以做相同的事情： ```py with tf.device(tf.DeviceSpec(device_type="GPU", device_index=0)): a = tf.random_uniform([1000, 100]) b = tf.random_uniform([1000, 100]) c = a + b ``` 但是，如果我們有兩個 GPU 并且想要同時使用它們呢？為此，我們可以拆分數據并使用單獨的 GPU 來處理每一半： ```py split_a = tf.split(a, 2) split_b = tf.split(b, 2) split_c = [] for i in range(2): with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)): split_c.append(split_a[i] + split_b[i]) c = tf.concat(split_c, axis=0) ``` 讓我們以更一般的形式重寫它，以便我們可以用任何其他操作替換加法： ```py def make_parallel(fn, num_gpus, **kwargs): in_splits = {} for k, v in kwargs.items(): in_splits[k] = tf.split(v, num_gpus) out_split = [] for i in range(num_gpus): with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)): with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0): out_split.append(fn(**{k : v[i] for k, v in in_splits.items()})) return tf.concat(out_split, axis=0) def model(a, b): return a + b c = make_parallel(model, 2, a=a, b=b) ``` 你可以使用任何接受一組張量作為輸入的函數替換模型，并在輸入和輸出都是批量的條件下，返回張量作為結果。請注意，我們還添加了一個變量作用域并將復用設置為`True`。這確保我們使用相同的變量來處理兩個分割。在我們的下一個例子中，這將變得很方便。讓我們看一個稍微更實際的例子。我們想在多個 GPU 上訓練神經網絡。在訓練期間，我們不僅需要計算正向傳播，還需要計算反向傳播（梯度）。但是我們如何并行計算梯度呢？事實證明這很簡單。回想一下第一節，我們想要將二次多項式擬合到一組樣本。我們重新組織了一些代碼，以便在模型函數中進行大量操作： ```py import numpy as np import tensorflow as tf def model(x, y): w = tf.get_variable("w", shape=[3, 1]) f = tf.stack([tf.square(x), x, tf.ones_like(x)], 1) yhat = tf.squeeze(tf.matmul(f, w), 1) loss = tf.square(yhat - y) return loss x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) loss = model(x, y) train_op = tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss)) def generate_data(): x_val = np.random.uniform(-10.0, 10.0, size=100) y_val = 5 * np.square(x_val) + 3 return x_val, y_val sess = tf.Session() sess.run(tf.global_variables_initializer()) for _ in range(1000): x_val, y_val = generate_data() _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) _, loss_val = sess.run([train_op, loss], {x: x_val, y: y_val}) print(sess.run(tf.contrib.framework.get_variables_by_name("w"))) ``` 現在讓我們使用我們剛剛編寫的`make_parallel`來并行化它。我們只需要從上面的代碼中更改兩行代碼： ```py loss = make_parallel(model, 2, x=x, y=y) train_op = tf.train.AdamOptimizer(0.1).minimize( tf.reduce_mean(loss), colocate_gradients_with_ops=True) ``` 為了更改為梯度的并行化反向傳播，我們需要的唯一的東西是，將`colocate_gradients_with_ops`標志設置為`True`。這可確保梯度操作和原始操作在相同的設備上運行。 ## 十一、調試 TensorFlow 模型與常規 python 代碼相比，TensorFlow 的符號性質使調試 TensorFlow 代碼變得相對困難。在這里，我們介紹 TensorFlow 的一些附帶工具，使調試更容易。使用 TensorFlow 時可能出現的最常見錯誤，可能是將形狀錯誤的張量傳遞給操作。許多 TensorFlow 操作可以操作不同維度和形狀的張量。這在使用 API 時很方便，但在出現問題時可能會導致額外的麻煩。例如，考慮`tf.matmul`操作，它可以相乘兩個矩陣： ```py a = tf.random_uniform([2, 3]) b = tf.random_uniform([3, 4]) c = tf.matmul(a, b) # c is a tensor of shape [2, 4] ``` 但同樣的函數也可以進行批量矩陣乘法： ```py a = tf.random_uniform([10, 2, 3]) b = tf.random_uniform([10, 3, 4]) tf.matmul(a, b) # c is a tensor of shape [10, 2, 4] ``` 我們之前在廣播部分談到的另一個例子，是支持廣播的加法操作： ```py a = tf.constant([[1.], [2.]]) b = tf.constant([1., 2.]) c = a + b # c is a tensor of shape [2, 2] ``` ### 使用`tf.assert*`操作驗證你的張量減少不必要行為的可能性的一種方法，是使用`tf.assert*`操作，明確驗證中間張量的維度或形狀。 ```py a = tf.constant([[1.], [2.]]) b = tf.constant([1., 2.]) check_a = tf.assert_rank(a, 1) # This will raise an InvalidArgumentError exception check_b = tf.assert_rank(b, 1) with tf.control_dependencies([check_a, check_b]): c = a + b # c is a tensor of shape [2, 2] ``` 請記住，斷言節點像其他操作一樣，是圖形的一部分，如果不進行求值，則會在`Session.run()`期間進行修剪。因此，請確保為斷言操作創建顯式依賴，來強制 TensorFlow 執行它們。你還可以使用斷言，在運行時驗證張量的值： ```py check_pos = tf.assert_positive(a) ``` [斷言操作的完整列表](https://www.tensorflow.org/api_guides/python/check_ops)請見官方文檔。 ### 使用`tf.Print`記錄張量的值用于調試的另一個有用的內置函數是`tf.Print`，它將給定的張量記錄到標準錯誤： ```py input_copy = tf.Print(input, tensors_to_print_list) ``` 請注意，`tf.Print`返回第一個參數的副本作為輸出。強制`tf.Print`運行的一種方法，是將其輸出傳遞給另一個執行的操作。例如，如果我們想在添加張量`a`和`b`之前，打印它們的值，我們可以這樣做： ```py a = ... b = ... a = tf.Print(a, [a, b]) c = a + b ``` 或者，我們可以手動定義控制依賴。 ### 使用`tf.compute_gradient_error`檢查梯度 TensorFlow 中并非所有操作都帶有梯度，并且很容易在無意中構建 TensorFlow 無法計算梯度的圖形。我們來看一個例子： ```py import tensorflow as tf def non_differentiable_entropy(logits): probs = tf.nn.softmax(logits) return tf.nn.softmax_cross_entropy_with_logits(labels=probs, logits=logits) w = tf.get_variable("w", shape=[5]) y = -non_differentiable_entropy(w) opt = tf.train.AdamOptimizer() train_op = opt.minimize(y) sess = tf.Session() sess.run(tf.global_variables_initializer()) for i in range(10000): sess.run(train_op) print(sess.run(tf.nn.softmax(w))) ``` 我們使用`tf.nn.softmax_cross_entropy_with_logits`來定義類別分布的熵。然后我們使用 Adam 優化器來找到具有最大熵的權重。如果你通過了信息論課程，你就會知道均勻分布的熵最大。所以你期望結果是`[0.2,0.2,0.2,0.2,0.2]`。但如果你運行這個，你可能會得到意想不到的結果： ```py [ 0.34081486 0.24287023 0.23465775 0.08935683 0.09230034] ``` 事實證明，`tf.nn.softmax_cross_entropy_with_logits`的梯度對標簽是未定義的！但如果我們不知道，我們怎么能發現它？幸運的是，TensorFlow 帶有一個數值微分器，可用于查找符號梯度誤差。讓我們看看我們如何使用它： ```py with tf.Session(): diff = tf.test.compute_gradient_error(w, [5], y, []) print(diff) ``` 如果你運行它，你會發現數值和符號梯度之間的差異非常大（在我的嘗試中為`0.06 - 0.1`）。現在讓我們使用熵的可導版本，來修復我們的函數并再次檢查： ```py import tensorflow as tf import numpy as np def entropy(logits, dim=-1): probs = tf.nn.softmax(logits, dim) nplogp = probs * (tf.reduce_logsumexp(logits, dim, keep_dims=True) - logits) return tf.reduce_sum(nplogp, dim) w = tf.get_variable("w", shape=[5]) y = -entropy(w) print(w.get_shape()) print(y.get_shape()) with tf.Session() as sess: diff = tf.test.compute_gradient_error(w, [5], y, []) print(diff) ``` 差應該約為 0.0001，看起來好多了。現在，如果再次使用正確的版本運行優化器，你可以看到最終權重為： ```py [ 0.2 0.2 0.2 0.2 0.2] ``` 這正是我們想要的。 [TensorFlow 摘要](https://www.tensorflow.org/api_guides/python/summary)和 [tfdbg（TensorFlow 調試器）](https://www.tensorflow.org/api_guides/python/tfdbg)是可用于調試的其他工具。請參閱官方文檔來了解更多信息。 ## 十二、TensorFlow 中的數值穩定性當使用任何數值計算庫（如 NumPy 或 TensorFlow）時，重要的是要注意，編寫數學上正確的代碼并不一定能產生正確的結果。你還需要確保計算穩定。讓我們從一個簡單的例子開始吧。從小學我們知道`x * y / y`等于`x`的任何非零值。但是，讓我們看看在實踐中是否總是如此： ```py import numpy as np x = np.float32(1) y = np.float32(1e-50) # y would be stored as zero z = x * y / y print(z) # prints nan ``` 結果不正確的原因是`y`對于`float32`類型來說太小了。當`y`太大時會出現類似的問題： ```py y = np.float32(1e39) # y would be stored as inf z = x * y / y print(z) # prints 0 ``` `float32`類型可以表示的最小正值是`1.4013e-45`，低于該值的任何值都將存儲為零。此外，任何超過`3.40282e+38`的數字都將存儲為`inf`。 ```py print(np.nextafter(np.float32(0), np.float32(1))) # prints 1.4013e-45 print(np.finfo(np.float32).max) # print 3.40282e+38 ``` 為確保計算穩定，你需要避免使用絕對值非常小或大的值。這可能聽起來非常明顯，但這些問題可能變得非常難以調試，尤其是在 TensorFlow 中進行梯度下降時。這是因為你不僅需要確保正向傳播中的所有值都在數據類型的有效范圍內，而且還需要確保反向傳播也相同（在梯度計算期間）。讓我們看一個真實的例子。我們想要在`logits`向量上計算 softmax。一個樸素的實現看起來像這樣： ```py import tensorflow as tf def unstable_softmax(logits): exp = tf.exp(logits) return exp / tf.reduce_sum(exp) tf.Session().run(unstable_softmax([1000., 0.])) # prints [ nan, 0.] ``` 請注意，計算`logits`中相對較小數字的指數會產生浮點范圍之外的巨大結果。我們的初始 softmax 實現的最大有效`logit`是`ln(3.40282e + 38）= 88.7`，除此之外的任何東西都會產生`nan`結果。但是我們怎樣才能讓它更穩定呢？解決方案相當簡單。很容易看出`exp(x - c)/Σexp(x - c)= exp(x)/Σexp(x)`。因此，我們可以從`logits`中減去任何常量，結果將保持不變。我們選擇此常量作為`logits`的最大值。這樣，指數函數的定義域將被限制為`[-inf，0]`，因此其值域將是`[0.0,1.0]`，這是預期的： ```py import tensorflow as tf def softmax(logits): exp = tf.exp(logits - tf.reduce_max(logits)) return exp / tf.reduce_sum(exp) tf.Session().run(softmax([1000., 0.])) # prints [ 1., 0.] ``` 讓我們來看一個更復雜的案例。考慮一下我們的分類問題。我們使用 softmax 函數從我們的`logits`中產生概率。然后，我們將損失函數定義為，我們的預測和標簽之間的交叉熵。回想一下，分類分布的交叉熵可以簡單地定義為`xe(p, q) = -∑ p_i log(q_i)`。所以交叉熵的樸素實現看起來像這樣： ```py def unstable_softmax_cross_entropy(labels, logits): logits = tf.log(softmax(logits)) return -tf.reduce_sum(labels * logits) labels = tf.constant([0.5, 0.5]) logits = tf.constant([1000., 0.]) xe = unstable_softmax_cross_entropy(labels, logits) print(tf.Session().run(xe)) # prints inf ``` 注意，在此實現中，當 softmax 輸出接近零時，`log`的輸出接近無窮大，這導致我們的計算不穩定。我們可以通過擴展 softmax 并進行一些簡化來重寫它： ```py def softmax_cross_entropy(labels, logits): scaled_logits = logits - tf.reduce_max(logits) normalized_logits = scaled_logits - tf.reduce_logsumexp(scaled_logits) return -tf.reduce_sum(labels * normalized_logits) labels = tf.constant([0.5, 0.5]) logits = tf.constant([1000., 0.]) xe = softmax_cross_entropy(labels, logits) print(tf.Session().run(xe)) # prints 500.0 ``` 我們還可以驗證梯度是否也計算正確： ```py g = tf.gradients(xe, logits) print(tf.Session().run(g)) # prints [0.5, -0.5] ``` 是正確的。讓我再次提醒一下，在進行梯度下降時必須格外小心，來確保函數范圍以及每層的梯度都在有效范圍內。指數和對數函數在樸素使用時尤其成問題，因為它們可以將小數字映射到大數字，反之亦然。 ## 十三、使用學習 API 構建神經網絡訓練框架為簡單起見，在這里的大多數示例中，我們手動創建會話，我們不關心保存和加載檢查點，但這不是我們通常在實踐中做的事情。你最有可能希望使用學習 API 來處理會話管理和日志記錄。我們提供了一個簡單但實用的框架，用于使用 TensorFlow 訓練神經網絡。在本節中，我們將解釋此框架的工作原理。在試驗神經網絡模型時，你通常需要進行訓練/測試分割。你希望在訓練集上訓練你的模型，之后在測試集上評估它并計算一些指標。你還需要將模型參數存儲為檢查點，理想情況下，你希望能夠停止和恢復訓練。TensorFlow 的學習 API 旨在使這項工作更容易，讓我們專注于開發實際模型。使用`tf.learn` API 的最基本方法是直接使用`tf.Estimator`對象。你需要定義模型函數，它定義了損失函數，訓練操作，一個或一組預測，以及一組用于求值的可選的指標操作： ```py import tensorflow as tf def model_fn(features, labels, mode, params): predictions = ... loss = ... train_op = ... metric_ops = ... return tf.estimator.EstimatorSpec( mode=mode, predictions=predictions, loss=loss, train_op=train_op, eval_metric_ops=metric_ops) params = ... run_config = tf.contrib.learn.RunConfig(model_dir=FLAGS.output_dir) estimator = tf.estimator.Estimator( model_fn=model_fn, config=run_config, params=params) ``` 要訓練模型，你只需調用`Estimator.train(0`函數，同時提供讀取數據的輸入函數。 ```py def input_fn(): features = ... labels = ... return features, labels estimator.train(input_fn=input_fn, max_steps=...) ``` 要評估模型，只需調用`Estimator.evaluate()`： ```py estimator.evaluate(input_fn=input_fn) ``` 對于簡單的情況，`Estimator`對象可能已經足夠好了，但 TensorFlow 提供了一個名為`Experiment`的更高級別的對象，它提供了一些額外的有用功能。創建實驗對象非常簡單： ```py experiment = tf.contrib.learn.Experiment( estimator=estimator, train_input_fn=train_input_fn, eval_input_fn=eval_input_fn, eval_metrics=eval_metrics) ``` 現在我們可以調用`train_and_evaluate`函數來計算訓練時的指標。 ```py experiment.train_and_evaluate() ``` 更高級別的運行實驗的方法，是使用`learn_runner.run()`函數。以下是我們的主函數在提供的框架中的樣子： ```py import tensorflow as tf tf.flags.DEFINE_string("output_dir", "", "Optional output dir.") tf.flags.DEFINE_string("schedule", "train_and_evaluate", "Schedule.") tf.flags.DEFINE_string("hparams", "", "Hyper parameters.") FLAGS = tf.flags.FLAGS def experiment_fn(run_config, hparams): estimator = tf.estimator.Estimator( model_fn=make_model_fn(), config=run_config, params=hparams) return tf.contrib.learn.Experiment( estimator=estimator, train_input_fn=make_input_fn(tf.estimator.ModeKeys.TRAIN, hparams), eval_input_fn=make_input_fn(tf.estimator.ModeKeys.EVAL, hparams), eval_metrics=eval_metrics_fn(hparams)) def main(unused_argv): run_config = tf.contrib.learn.RunConfig(model_dir=FLAGS.output_dir) hparams = tf.contrib.training.HParams() hparams.parse(FLAGS.hparams) estimator = tf.contrib.learn.learn_runner.run( experiment_fn=experiment_fn, run_config=run_config, schedule=FLAGS.schedule, hparams=hparams) if __name__ == "__main__": tf.app.run() ``` `schedule`標志決定調用`Experiment`對象的哪個成員函數。因此，如果你將`schedule`設置為`train_and_evaluate`，則會調用`experiment.train_and_evaluate()`。輸入函數可以返回兩個張量（或張量的字典），提供要傳遞給模型的特征和標簽。 ```py def input_fn(): features = ... labels = ... return features, labels ``` 對于如何使用數據集 API 讀取數據的示例，請參閱[`mnist.py`](https://github.com/vahidk/TensorflowFramework/blob/master/dataset/mnist.py)。要了解在 TensorFlow 中閱讀數據的各種方法，請參閱[這里](https://yiyibooks.cn/__trs__/wizard/effective-tf/13.html#data)。該框架還附帶了一個簡單的卷積網絡分類器，在[`cnn_classifier.py`](https://github.com/vahidk/TensorflowFramework/blob/master/model/cnn_classifier.py)中，其中包含一個示例模型。就是這樣！這就是開始使用 TensorFlow 學習 API 所需的全部內容。我建議你查看框架[源代碼](https://github.com/vahidk/TensorFlowFramework)并查看官方 python API 來了解學習 API 的更多信息。 ## 十四、TensorFlow 秘籍本節包括在 TensorFlow 中實現的一組常用操作。 ### 集束搜索 ```py import tensorflow as tf def get_shape(tensor): """Returns static shape if available and dynamic shape otherwise.""" static_shape = tensor.shape.as_list() dynamic_shape = tf.unstack(tf.shape(tensor)) dims = [s[1] if s[0] is None else s[0] for s in zip(static_shape, dynamic_shape)] return dims def log_prob_from_logits(logits, axis=-1): """Normalize the log-probabilities so that probabilities sum to one.""" return logits - tf.reduce_logsumexp(logits, axis=axis, keep_dims=True) def batch_gather(tensor, indices): """Gather in batch from a tensor of arbitrary size. In pseudocode this module will produce the following: output[i] = tf.gather(tensor[i], indices[i]) Args: tensor: Tensor of arbitrary size. indices: Vector of indices. Returns: output: A tensor of gathered values. """ shape = get_shape(tensor) flat_first = tf.reshape(tensor, [shape[0] * shape[1]] + shape[2:]) indices = tf.convert_to_tensor(indices) offset_shape = [shape[0]] + [1] * (indices.shape.ndims - 1) offset = tf.reshape(tf.range(shape[0]) * shape[1], offset_shape) output = tf.gather(flat_first, indices + offset) return output def rnn_beam_search(update_fn, initial_state, sequence_length, beam_width, begin_token_id, end_token_id, name="rnn"): """Beam-search decoder for recurrent models. Args: update_fn: Function to compute the next state and logits given the current state and ids. initial_state: Recurrent model states. sequence_length: Length of the generated sequence. beam_width: Beam width. begin_token_id: Begin token id. end_token_id: End token id. name: Scope of the variables. Returns: ids: Output indices. logprobs: Output log probabilities probabilities. """ batch_size = initial_state.shape.as_list()[0] state = tf.tile(tf.expand_dims(initial_state, axis=1), [1, beam_width, 1]) sel_sum_logprobs = tf.log([[1.] + [0.] * (beam_width - 1)]) ids = tf.tile([[begin_token_id]], [batch_size, beam_width]) sel_ids = tf.expand_dims(ids, axis=2) mask = tf.ones([batch_size, beam_width], dtype=tf.float32) for i in range(sequence_length): with tf.variable_scope(name, reuse=True if i > 0 else None): state, logits = update_fn(state, ids) logits = log_prob_from_logits(logits) sum_logprobs = ( tf.expand_dims(sel_sum_logprobs, axis=2) + (logits * tf.expand_dims(mask, axis=2))) num_classes = logits.shape.as_list()[-1] sel_sum_logprobs, indices = tf.nn.top_k( tf.reshape(sum_logprobs, [batch_size, num_classes * beam_width]), k=beam_width) ids = indices % num_classes beam_ids = indices // num_classes state = batch_gather(state, beam_ids) sel_ids = tf.concat([batch_gather(sel_ids, beam_ids), tf.expand_dims(ids, axis=2)], axis=2) mask = (batch_gather(mask, beam_ids) * tf.to_float(tf.not_equal(ids, end_token_id))) return sel_ids, sel_sum_logprobs ``` ### 合并 ```py import tensorflow as tf def merge(tensors, units, activation=tf.nn.relu, name=None, **kwargs): """Merge features with broadcasting support. This operation concatenates multiple features of varying length and applies non-linear transformation to the outcome. Example: a = tf.zeros([m, 1, d1]) b = tf.zeros([1, n, d2]) c = merge([a, b], d3) # shape of c would be [m, n, d3]. Args: tensors: A list of tensor with the same rank. units: Number of units in the projection function. """ with tf.variable_scope(name, default_name="merge"): # Apply linear projection to input tensors. projs = [] for i, tensor in enumerate(tensors): proj = tf.layers.dense( tensor, units, activation=None, name="proj_%d" % i, **kwargs) projs.append(proj) # Compute sum of tensors. result = projs.pop() for proj in projs: result = result + proj # Apply nonlinearity. if activation: result = activation(result) return result ``` ### 熵 ```py import tensorflow as tf def softmax(logits, dims=-1): """Compute softmax over specified dimensions.""" exp = tf.exp(logits - tf.reduce_max(logits, dims, keep_dims=True)) return exp / tf.reduce_sum(exp, dims, keep_dims=True) def entropy(logits, dims=-1): """Compute entropy over specified dimensions.""" probs = softmax(logits, dims) nplogp = probs * (tf.reduce_logsumexp(logits, dims, keep_dims=True) - logits) return tf.reduce_sum(nplogp, dims) ``` ### KL 散度 ```py def gaussian_kl(q, p=(0., 0.)): """Computes KL divergence between two isotropic Gaussian distributions. To ensure numerical stability, this op uses mu, log(sigma^2) to represent the distribution. If q is not provided, it's assumed to be unit Gaussian. Args: q: A tuple (mu, log(sigma^2)) representing a multi-variatie Gaussian. p: A tuple (mu, log(sigma^2)) representing a multi-variatie Gaussian. Returns: A tensor representing KL(q, p). """ mu1, log_sigma1_sq = q mu2, log_sigma2_sq = p return tf.reduce_sum( 0.5 * (log_sigma2_sq - log_sigma1_sq + tf.exp(log_sigma1_sq - log_sigma2_sq) + tf.square(mu1 - mu2) / tf.exp(log_sigma2_sq) - 1), axis=-1) ``` ### 并行化 ```py def make_parallel(fn, num_gpus, **kwargs): """Parallelize given model on multiple gpu devices. Args: fn: Arbitrary function that takes a set of input tensors and outputs a single tensor. First dimension of inputs and output tensor are assumed to be batch dimension. num_gpus: Number of GPU devices. **kwargs: Keyword arguments to be passed to the model. Returns: A tensor corresponding to the model output. """ in_splits = {} for k, v in kwargs.items(): in_splits[k] = tf.split(v, num_gpus) out_split = [] for i in range(num_gpus): with tf.device(tf.DeviceSpec(device_type="GPU", device_index=i)): with tf.variable_scope(tf.get_variable_scope(), reuse=i > 0): out_split.append(fn(**{k : v[i] for k, v in in_splits.items()})) return tf.concat(out_split, axis=0) ```