5.1.2 FCN--------優化FCN16S · 手把手教你機器學習

上一節我們匆匆忙忙的構建起來了FCN16S網絡但是測試很不理想。跟論文里的結果相差甚遠。這一節我們就來找出原因，使網絡達到論文的精度，并解決輸入慢的問題。 ## 首先我們先在git倉庫里新建一個分支。 ![](https://box.kancloud.cn/dbdac0464fb5f3844b3da0ab0050b479_706x85.png) 然后把它同步到網上： ![](https://box.kancloud.cn/96ddfa23b868c77d3d12f04cbbc73dc6_708x54.png) 以后我們就在這個分支上修改代碼完整代碼位置：https://github.com/tangzhenjie/FCN16S/tree/advance ***** 下面我們就來解決上一節中的問題 * [第一問題數據載入慢](#第一節) * [第二問題精度上不去](#第二節) * [第三問題顯示網絡學習過程](#第三節) <h3 id="第一節">第一問題數據載入慢</h5> 原因：上一節我們沒有用到通道輸入數據，是直接把數據全部讀入內存的。解決方法：使用tf.data.dataset通道輸入參考學習連接：https://www.tensorflow.org/guide/datasets 首先我們先修改:read_MITSceneParsingData.py這個文件是為了為生成dataset做準備。 ![](https://box.kancloud.cn/4eb75bc593aa46fd8f6dc8e3505ad97a_1216x345.png) 然后刪除然后執行時會重新生成：![](https://box.kancloud.cn/3576724547b2ed1dbf6a2ab0207ee328_961x340.png) 我們在項目中新建一個文件:BatchReader.py然后添加如下代碼： ``` ~~~ import tensorflow as tf import read_MITSceneParsingData as Reader import numpy as np #dataset_dir = "D:\pycharm_program\FCN16S\Data_zoo\MIT_SceneParsing\\" #測試 #train_filepaths, eval_filepaths = Reader.read_dataset(dataset_dir) #train_filepaths = tf.convert_to_tensor(train_filepaths, dtype=tf.string) #i = 0 #train_filepaths = np.array(train_filepaths) #train_filepaths1 = train_filepaths[:, 1] #print(train_filepaths1[0]) """ 讀取batch數據 :param image_filepaths tensor dtype=string 圖像路徑 annotation_filepaths tensor dtype=string 標簽圖像路徑 image_size 圖像剪裁大小 batch_size batch大小 :return tuple """ def read_batch_image(image_filepaths, label_filepaths, image_size, batch_size=2): image, label = tf.train.slice_input_producer([image_filepaths, label_filepaths], shuffle=True) # Read images from disk image = tf.read_file(image) image = tf.image.decode_jpeg(image, channels=3) # Resize images to a common size image = tf.image.resize_images(image, [image_size, image_size]) # Normalize(后期改動) #image = image * 1.0 / 127.5 - 1.0 # Read labels from disk label = tf.read_file(label) label = tf.image.decode_png(label, channels=1) # Resize labels to a common size label = tf.image.resize_images(label, [image_size, image_size]) X, Y = tf.train.batch([image, label], batch_size=batch_size, capacity=batch_size * 8, num_threads=4) return X, Y ~~~ ``` 下面我們就來替換:FCN16S.py中輸入數據的方法：首先我們把模塊引進來去掉不用的模塊： ![](https://box.kancloud.cn/fbf59812a6894d0af338d74afa906c41_716x319.png) 然后我們把FCN16S.py函數重新寫了一下： ``` ~~~ from __future__ import print_function import tensorflow as tf import numpy as np import TensorflowUtils as utils from six.moves import xrange # 兼容python2和python3 import read_MITSceneParsingData as DatasetReader import BatchReader as BatchReader # 定義一些網絡需要的參數(可以以命令行可選參數進行重新賦值) FLAGS = tf.flags.FLAGS # batch大小 tf.flags.DEFINE_integer("batch_size", "2", "batch size for training") # 定義日志文件位置 tf.flags.DEFINE_string("logs_dir", "D:\pycharm_program\FCN16S\Logs\\", "path to logs directory") # 定義圖像數據集存放的路徑 tf.flags.DEFINE_string("data_dir", "D:\pycharm_program\FCN16S\Data_zoo\MIT_SceneParsing\\", "path to the dataset") # 定義學習率 tf.flags.DEFINE_float("learning_rate", "1e-4", "learning rate for Adam Optimizer") # 存放VGG16模型的mat (我們使用matlab訓練好的VGG16參數) tf.flags.DEFINE_string("model_dir", "D:\pycharm_program\FCN16S\Model_zoo\\", "Path to vgg model mat") # 是否是調試狀態（如果是調試狀態會額外保存一些信息） tf.flags.DEFINE_bool("debug", "True", "Model Debug:True/ False") # 執行的狀態（訓練測試顯示） tf.flags.DEFINE_string("mode", "train", "Mode: train/ test/ visualize") # checkpoint目錄 tf.flags.DEFINE_string("checkpoint_dir", "D:\pycharm_program\FCN16S\Checkpoint\\", "path to the checkpoint") # 驗證結果保存圖像目錄 tf.flags.DEFINE_string("image_dir", "D:\pycharm_program\FCN16S\Image\\", "path to the checkpoint") # 模型地址 MODEL_URL = "http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-16.mat" # 最大迭代次數 MAX_ITERATION = int(1e5 + 1) # MIT數據集的類別數 NUM_OF_CLASSES = 151 # 首先VGG16網絡中的圖像輸入224*224(但是我們這個網絡理論上可以輸入任意圖片大小） IMAGE_SIZE = 224 """ 首先定義該網絡與VGG16相同的部分 :param weight 從.mat中獲得的權重 image 網絡輸入的圖像 :return 包括相同部分所有輸出的數組 """ def vgg_net(weights, image): # 首先我們定義FCN16S中使用VGG16層中的名字，用來生成相同的網絡 layers = ( "conv1_1", "relu1_1", "conv1_2", "relu1_2", "pool1", "conv2_1", "relu2_1", "conv2_2", "relu2_2", "pool2", "conv3_1", "relu3_1", "conv3_2", "relu3_2", "conv3_3", "relu3_3", "pool3", "conv4_1", "relu4_1", "conv4_2", "relu4_2", "conv4_3", "relu4_3", "pool4", "conv5_1", "relu5_1", "conv5_2", "relu5_2", "conv5_3", "relu5_3", "pool5" ) # 生成的公有層的所有接口 net = {} # 當前輸入 current = image for i, name in enumerate(layers): # 獲取前面層名字的前四個字符 kind = name[:4] if kind == "conv": kernels = weights[i][0][0][0][0][0] bias = weights[i][0][0][0][0][1] print(weights[i][0][0][0][0][0].shape) print(weights[i][0][0][0][0][1].shape) # matconvnet: weights are [width, height, in_channels, out_channels] # tensorflow: weights are [height, width, in_channels, out_channels] # 生成變量 kernels = utils.get_variable(np.transpose(kernels, (1, 0, 2, 3)), name=name + "_w") bias = utils.get_variable(bias.reshape(-1), name=name + "_b") current = utils.conv2d_basic(current, kernels, bias) elif kind == "relu": current = tf.nn.relu(current, name=name) if FLAGS.debug: utils.add_activation_summary(current) elif kind == "pool": current = utils.max_pool_2x2(current) net[name] = current return net """ 構建FCN16S :param image 網絡輸入的圖像 [batch, height, width, channels] :return 輸出與image大小相同的tensor """ def fcn16s_net(image, keep_prob): # 轉換數據類型 # 首先我們獲取相同部分構造的模型權重 model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL) weights = model_data["layers"][0] mean = model_data['normalization'][0][0][0] mean_pixel = np.mean(mean, axis=(0, 1)) image = utils.process_image(image, mean_pixel) # 首先我們padding圖片 image = utils.pading(image, 100) with tf.variable_scope("VGG16"): vgg16net_dict = vgg_net(weights, image) with tf.variable_scope("FCN16S"): pool5 = vgg16net_dict["pool5"] # 創建fc6層 w6 = utils.weight_variable([7, 7, 512, 4096], name="w6") b6 = utils.bias_variable([4096], name="b6") conv6 = tf.nn.conv2d(pool5, w6, [1, 1, 1, 1], padding="VALID") conv_bias6 = tf.nn.bias_add(conv6, b6) relu6 = tf.nn.relu(conv_bias6, name="relu6") if FLAGS.debug: utils.add_activation_summary(relu6) relu_dropout6 = tf.nn.dropout(relu6, keep_prob=keep_prob) # 創建fc7層 w7 = utils.weight_variable([1, 1, 4096, 4096], name="w7") b7 = utils.bias_variable([4096], name="b7") conv7 = utils.conv2d_basic(relu_dropout6, w7, b7) relu7 = tf.nn.relu(conv7, name="relu7") if FLAGS.debug: utils.add_activation_summary(relu7) conv_dropout7 = tf.nn.dropout(relu7, keep_prob=keep_prob) # 定義score_fr層 w8 = utils.weight_variable([1, 1, 4096, NUM_OF_CLASSES], name="w8") b8 = utils.bias_variable([NUM_OF_CLASSES], name="b8") score_fr = utils.conv2d_basic(conv_dropout7, w8, b8) # 定義upscore2層 w9 = utils.weight_variable([4, 4, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w9") b9 = utils.bias_variable([NUM_OF_CLASSES], name="b9") upscore2 = utils.conv2d_transpose_strided(score_fr, w9, b9) # 定義score_pool4 pool4_shape = vgg16net_dict["pool4"].get_shape() w10 = utils.weight_variable([1, 1, pool4_shape[3].value, NUM_OF_CLASSES], name="w10") b10 = utils.bias_variable([NUM_OF_CLASSES], name="b10") score_pool4 = utils.conv2d_basic(vgg16net_dict["pool4"], w10, b10) # 定義score_pool4c upscore2_shape = upscore2.get_shape() upscore2_target_height = upscore2_shape[1].value upscore2_target_width = upscore2_shape[2].value score_pool4c = tf.image.crop_to_bounding_box(score_pool4, 5, 5, upscore2_target_height, upscore2_target_width) # 定義fuse_pool4 fuse_pool4 = tf.add(upscore2, score_pool4c, name="fuse_pool4") # 定義upscore16 fuse_pool4_shape = fuse_pool4.get_shape() w11 = utils.weight_variable([32, 32, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w11") b11 = utils.bias_variable([NUM_OF_CLASSES], name="b11") output_shape = tf.stack([tf.shape(fuse_pool4)[0], fuse_pool4_shape[1].value * 16, fuse_pool4_shape[2].value * 16, NUM_OF_CLASSES]) upscore16 = utils.conv2d_transpose_strided(fuse_pool4, w11, b11, output_shape=output_shape , stride=16) # 定義score層 image_shape = image.get_shape() score_target_height = image_shape[1].value - 200 # 因為輸入網絡的圖片需要先padding100，所以減去200 score_target_width = image_shape[2].value - 200 # 因為輸入網絡的圖片需要先padding100，所以減去200 score = tf.image.crop_to_bounding_box(upscore16, 27, 27, score_target_height, score_target_width) annotation_pred = tf.argmax(score, dimension=3, name="prediction") return tf.expand_dims(annotation_pred, dim=3), score def train(loss_val, var_list): optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate) grads = optimizer.compute_gradients(loss_val, var_list=var_list) if FLAGS.debug: for grad, var in grads: utils.add_gradient_summary(grad, var) return optimizer.apply_gradients(grads) def main(argv=None): ##########################構建網絡部分#################### # 我們首先定義網絡的輸入部分 keep_probability = tf.placeholder(tf.float32, name="keep_probability") train_filepaths, eval_filepaths = DatasetReader.read_dataset(FLAGS.data_dir) if FLAGS.mode == "train": train_filepaths = np.array(train_filepaths, dtype=np.string_) image_filepaths = train_filepaths[:, 0] label_filepaths = train_filepaths[:, 1] else: eval_filepaths = np.array(eval_filepaths, dtype=np.string_) image_filepaths = eval_filepaths[:, 0] label_filepaths = eval_filepaths[:, 1] images, labels = BatchReader.read_batch_image(image_filepaths, label_filepaths, IMAGE_SIZE, FLAGS.batch_size) labels = tf.cast(labels, tf.int64) tf.summary.image("images", images, max_outputs=3) tf.summary.image("labels", tf.cast(labels, tf.uint8), max_outputs=3) pred_annotation, logits = fcn16s_net(images, keep_probability) tf.summary.image("pre", tf.cast(pred_annotation, tf.uint8), max_outputs=3) # 定義損失函數 loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.squeeze(labels, squeeze_dims=[3])), name="entropy") if FLAGS.debug: tf.summary.scalar("loss", loss) # 定義m_iou m_iou, confusion_matrix = tf.metrics.mean_iou(labels=tf.squeeze(labels, squeeze_dims=[3]),predictions=tf.squeeze(pred_annotation, squeeze_dims=[3]), num_classes=NUM_OF_CLASSES) if FLAGS.debug: tf.summary.scalar("m_iou", m_iou) # 獲取要訓練的變量 trainable_var = tf.trainable_variables() train_op = train(loss, trainable_var) # tensorboard op summary = tf.summary.merge_all() #################到此我們網絡構建完畢################# ###################構建運行對話################## sess = tf.Session() print("Setting up Saver.....") saver = tf.train.Saver() # 首先給變量初始化進行訓練驗證前的的準備 sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) train_summary_writer = tf.summary.FileWriter(FLAGS.logs_dir + "\\train", sess.graph) # 判斷有沒有checkpoint ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) print("Model restored .....") # Start the data queue tf.train.start_queue_runners(sess=sess) # 開始訓練或者驗證 if FLAGS.mode == "train": feed_dict = {keep_probability: 0.5} for itr in xrange(MAX_ITERATION): # 運行 _, loss_value, mIOU, _ = sess.run([train_op, loss, m_iou, confusion_matrix], feed_dict=feed_dict) print("the %d time loss: %g" % (itr, loss_value)) print("the %d time m_iou: %g" % (itr, mIOU)) # 下面是保存一些能反映訓練中的過程的一些信息 if itr % 500 == 0: saver.save(sess, FLAGS.checkpoint_dir + "model.ckpt", itr) print("model saved") summary_str = sess.run(summary, feed_dict={keep_probability: 1.0}) train_summary_writer.add_summary(summary_str, itr) train_summary_writer.flush() print("summary saved") elif FLAGS.mode == "visualize": feed_dict={keep_probability: 1.0} # 運行 loss_value, mIOU, _ = sess.run([loss, m_iou, confusion_matrix], feed_dict=feed_dict) print("validate loss: %g" % loss_value) print("validate m_iou: %g" % mIOU) if __name__ == "__main__": tf.app.run() ~~~ ``` 測試結果： ![](https://box.kancloud.cn/f3451ccb4bcd96c16e6d6b9517846096_435x276.png) > 我們運行會發現運行快了很多 <h3 id="第二節">第二問題精度上不去</h5> 經過排查代碼沒有問題，我們把代碼放到華為云上可以明顯看到m_iou在不斷的升高。 ![](https://box.kancloud.cn/ecfd0b22830086dae9d8bd60f6ef3602_1142x717.png) 我們增大批處理量后發現m-iou提升加快了。 ![](https://box.kancloud.cn/c2180e39b9222d8ce0ac3a58beaed818_1142x717.png) ### 結果：同時我們在上面的代碼中添加了顯示summary信息的代碼。最后我們經過訓練結果如下： ![](https://box.kancloud.cn/6ade79a68f4b4875e4cf398198e17138_805x851.png)