5.1.1 FCN--------實現FCN16S · 手把手教你機器學習

論文Fully Convolutional Networks for Semantic Segmentation 是圖像分割的里程碑論文。論文原文地址：[https://people.eecs.berkeley.edu/~jonlong/long\_shelhamer\_fcn.pdf](https://people.eecs.berkeley.edu/~jonlong/long_shelhamer_fcn.pdf) FCN論文開源caffe代碼:[https://github.com/shelhamer/fcn.berkeleyvision.org](https://github.com/shelhamer/fcn.berkeleyvision.org) 本教程的tensorflow實現的FCN16S的代碼：[https://github.com/tangzhenjie/FCN16S](https://github.com/tangzhenjie/FCN16S) ## 前沿 FCN論文的內容我們這里就不介紹了，可以自行閱讀論文原文或者是別人寫的博客。總之我們往下看的前提假設是你已經了解了論文的內容。我們這一節的目的是手把手教你實現論文的FCN 16s的實驗。由于論文中提供的代碼是Caffe的代碼。我們將用tensorflow來實現原論文的實驗。 ## FCN 16S 實驗過程 * [第一部分準備數據](#%E7%AC%AC%E4%B8%80%E8%8A%82) * [第二部分定義網絡結構](#%E7%AC%AC%E4%BA%8C%E8%8A%82) * [第三部分定義損失函數](#%E7%AC%AC%E4%B8%89%E8%8A%82) * [第四部分優化算法](#%E7%AC%AC%E5%9B%9B%E8%8A%82) * [第五部分運行結果](#%E7%AC%AC%E4%BA%94%E8%8A%82) <h3 id="第一節">第一部分：準備數據</h5> 我們使用由MIT提供的Scene Parsing Challenge dataset [http://sceneparsing.csail.mit.edu/](http://sceneparsing.csail.mit.edu/) ### **創建項目** 首先我們在github上創建一個項目名為**FCN16S**如下圖：![](https://box.kancloud.cn/f25a541abda6946789b983aeda426a9d_1920x866.png) 然后打開pycharm把該項目克隆下來如下圖： ![](https://box.kancloud.cn/a57bf0d0d5e1e33e760ab0a7c680c256_783x488.png) ![](https://box.kancloud.cn/8897b75877cef21ecb05401bde3b2363_783x488.png) 修改項目運行環境： ![](https://box.kancloud.cn/453530285fb5e07c893b39b0746bd15d_521x543.png) ![](https://box.kancloud.cn/7e38f98df4d892e004d3a21fd4c4c72e_1046x721.png) ### **到現在我們有了一個空項目并配置好了運行環境，下面我們一步一步書寫項目代碼**。 #### 首先我們創建項目主體文件名為:FCN16S.py 并加到版本控制里面。如下圖：![](https://box.kancloud.cn/2299f6b55d3190dd1591c0492a525693_736x258.png) 可以輸入下面代碼測試tensorflow環境是夠安裝完成： ``` import tensorflow as tf hello = tf.constant('hello,tensorf') sess = tf.Session() print(sess.run(hello)) #如果正常運行，輸出 b'hello,tensorf' ，則TensorFlow安裝成功。 ``` 下面我們創建準備數據的文件并加入版本控制：read\_MITSceneParsingData.py 如下圖： ![](https://box.kancloud.cn/81af6551ebe17ff70470e4b1fd86ede8_386x269.png) > 首先我們應該知道我們使用的數據集是Scene Parsing Challenge dataset，Training set：20,210 images Validation set：2,000 images 首先我們在read\_MITSceneParsingData.py中定義一個函數： ``` ~~~ __author__ = 'tangzhenjie' import os # 數據集下載URL DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip' """ 從.pickle文件讀取訓練集和驗證機文件名數組 param： data_dir: 文件存放的文件夾 return：訓練集和驗證機文件名數組（tuple） """ def read_dataset(data_dir): pickle_filename = "MITSceneParsing.pickle" pickle_filepath = os.path.join(data_dir, pickle_filename) # 驗證文件如果不存在就去下載 if not os.path.exists(pickle_filepath): ~~~ ``` 我們現在需要去下載文件,為了使代碼可讀性強，我們另新建一個文件來處理下載文件:TensorflowUtils.py 然后在TensorflowUtils.py中添加下面代碼： ``` __author__ = 'tangzhenjie' import os, sys from six.moves import urllib import tarfile import zipfile import scipy.io import tensorflow as tf import scipy.misc as misc """ 下載對應url的文件 param： dir_path: 下載和解壓文件的位置 url_name: 要下載的文件的url is_tarfile: 是不是tar文件 is_zipfile: 是不是zip文件 """ def maybe_download_and_extract(dir_path, url_name, is_tarfile=False, is_zipfile=False): #首先驗證要下載到的解壓到的文件夾是否是存在 if not os.path.exists(dir_path): os.makedirs(dir_path) # 判斷有沒有下載，沒有再去下載 file_name = url_name.split('/')[-1] file_path = os.path.join(dir_path, file_name) if not os.path.exists(file_path): # 定義一個下載過程中顯示進度的函數 def _progress(count, block_size, total_size): sys.stdout.write( '\r>> Downloading %s %.1f%%' % (file_name, float(count * block_size) / float(total_size) * 100.0) ) # 刷新輸出 sys.stdout.flush() file_path, _ = urllib.request.urlretrieve(url_name, file_path, reporthook=_progress) # 獲取文件信息 statinfo = os.stat(file_path) print('Succesfully downloaded', file_name, statinfo.st_size, 'bytes.') if is_tarfile: tarfile.open(file_path, 'r:gz').extractall(dir_path) if is_zipfile: with zipfile.ZipFile(file_path) as zf: zip_dir = zf.namelist()[0] zf.extractall(dir_path) ``` 然后在read\_MITSceneParsingData.py文件中調用該方法并測試：目前read\_MITSceneParsingData.py內容為： ``` __author__ = 'tangzhenjie' import os import TensorflowUtils as utils # 數據集下載URL DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip' """ 從.pickle文件讀取訓練集和驗證機文件名數組 param： data_dir: 文件存放的文件夾 return：訓練集和驗證機文件名數組（tuple） """ def read_dataset(data_dir): pickle_filename = "MITSceneParsing.pickle" pickle_filepath = os.path.join(data_dir, pickle_filename) # 驗證文件如果不存在就去下載 if not os.path.exists(pickle_filepath): utils.maybe_download_and_extract(data_dir, DATA_URL, is_zipfile=True) read_dataset("\\") ``` 顯示如下表示代碼沒錯： ![](https://box.kancloud.cn/b26f81f914dcd7c97194259e21f962f4_1345x328.png) 現在我們在read\_MITSceneParsingData.py文件中添加獲取訓練集和驗證機文件名數組的代碼如下： ``` ~~~ __author__ = 'tangzhenjie' import os from tensorflow.python.platform import gfile from six.moves import cPickle as pickle import glob import random import TensorflowUtils as utils # 數據集下載URL DATA_URL = 'http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip' """ 從.pickle文件讀取訓練集和驗證機文件名數組 param： data_dir: 文件存放的文件夾 return：訓練集和驗證機文件名數組（tuple） """ def read_dataset(data_dir): pickle_filename = "MITSceneParsing.pickle" pickle_filepath = os.path.join(data_dir, pickle_filename) # 驗證文件如果不存在就去下載 if not os.path.exists(pickle_filepath): utils.maybe_download_and_extract(data_dir, DATA_URL, is_zipfile=True) #下載并解壓好文件后獲取訓練集合驗證集文件名數組 SceneParsing_folder = os.path.splitext(DATA_URL.split("/")[-1])[0] result = create_image_lists(os.path.join(data_dir, SceneParsing_folder)) print("序列化 ...") with open(pickle_filepath, 'wb') as f: pickle.dump(result, f, pickle.HIGHEST_PROTOCOL) else: print ("Found pickle file!") with open(pickle_filepath, 'rb') as f: result = pickle.load(f) training_records = result['training'] validation_records = result['validation'] del result return training_records, validation_records def create_image_lists(image_dir): if not gfile.Exists(image_dir): print("Image directory '" + image_dir + "' not found.") return None directories = ['training', 'validation'] image_list = {} for directory in directories: file_list = [] image_list[directory] = [] file_glob = os.path.join(image_dir, "images", directory, '*.' + 'jpg') file_list.extend(glob.glob(file_glob)) if not file_list: print('No files found') else: for f in file_list: filename = os.path.splitext(f.split("\\")[-1])[0] annotation_file = os.path.join(image_dir, "annotations", directory, filename + '.png') if os.path.exists(annotation_file): record = {'image': f, 'annotation': annotation_file, 'filename': filename} image_list[directory].append(record) else: print("Annotation file not found for %s - Skipping" % filename) random.shuffle(image_list[directory]) no_of_images = len(image_list[directory]) print ('No. of %s files: %d' % (directory, no_of_images)) return image_list # 我下載解壓好的文件在D:\dataSet\MIT test, val = read_dataset("D:\dataSet\MIT") ~~~ ``` 打斷點調試運行結果如下： 1.第一次執行看看是否生成.MITSceneParsing.pickle文件 ![](https://box.kancloud.cn/139d8f93cc98b04d145497b498d64ac7_791x276.png) 2.看看結果是你想要的嗎 ![](https://box.kancloud.cn/6df46078cabc94b273393d68142f6a9f_991x519.png) 刪除下測試語句： ``` # 我下載解壓好的文件在D:\dataSet\MIT test, val = read_dataset("D:\dataSet\MIT") end = 2 ``` **到此我們已經獲得了訓練集和驗證機文件名數組** **下一步我們就準備輸入到網絡中的圖像數據**：新建一個文件:BatchDatsetReader.py輸入以下代碼： ``` ~~~ """ Code ideas from https://github.com/Newmu/dcgan and tensorflow mnist dataset reader """ import numpy as np import scipy.misc as misc # 測試代碼 import read_MITSceneParsingData as Reader # 測試代碼 class BatchDatset: files = [] # 存放圖像文件路徑 images = [] # 存放圖像數據數組 annotations = [] # 存放標簽圖s像數據 image_options = {} # 改變圖像的選擇 batch_offset = 0 # 獲取batch數據開始的偏移量 epochs_completed = 0 # 記錄epoch的次數 # 構造函數 def __init__(self, record_list, image_options = {}): print("Initializing Batch Dataset Reader...") print(image_options) self.files = record_list self.image_options = image_options self._read_images() def _read_images(self): self._channels = True self.images = np.array([self._transform(filename['image']) for filename in self.files]) self._channels = False self.annotations = np.array([np.expand_dims(self._transform(filename['annotation']), axis=3) for filename in self.files]) print(self.images.shape) print(self.annotations.shape) def _transform(self, filename): # 讀取圖像數據到ndarray image = misc.imread(filename) # 保證圖像通道數為3 if self._channels and len(image.shape) < 3: image = np.array([image for i in range(3)]) if self.image_options.get("resize", False) and self.image_options["resize"]: resize_size = int(self.image_options["resize_size"]) resize_image = misc.imresize(image, [resize_size, resize_size], interp='nearest') else: resize_image = image return np.array(resize_image) # 獲取全部的圖像和標記圖像 def get_records(self): return self.images, self.annotations # 修改偏移量 def reset_batch_offset(self, offset=0): self.batch_offset = offset # 獲取下一個batch def next_batch(self, batch_size): # 開始位置 start = self.batch_offset # 下一個batch的開始位置（也是這次的結束位置） self.batch_offset += batch_size # 判斷位置是否超出界限 if self.batch_offset > self.images.shape[0]: # 超出界限證明完成一次epoch self.epochs_completed += 1 print("****************** Epochs completed: " + str(self.epochs_completed) + "******************") # 準備下一次數據 # 首先打亂數據 perm = np.arange(self.images.shape[0]) np.random.shuffle(perm) self.images = self.images[perm] self.annotations = self.annotations[perm] # 開始下一次epoch start = 0 self.batch_offset = batch_size # 生成數據 end = self.batch_offset return self.images[start:end], self.annotations[start:end] # 獲取一組隨機的batch def get_random_batch(self, batch_size): indexs = np.random.randint(0, self.images.shape[0], size=batch_size).tolist() return self.images[indexs], self.annotations[indexs] # 測試代碼 record_lists = Reader.read_dataset("D:\dataSet\MIT") BatchDatsetObject = BatchDatset(record_lists[0][0:1000], {}) BatchData = BatchDatsetObject.next_batch(10) i = 0 # 測試代碼 ~~~ ``` 測試結果如下（由于數據集大我們選擇一部分來進行測試，首先我們應該知道這種數據讀取的方式不好因為占用內存太大，后期我們將使用tensorflow自帶的讀取數據的方法來解決這個問題）記得刪除測試代碼： ![](https://box.kancloud.cn/071c243d3480d2868f9cd2541e5a3179_1841x911.png) **好的到目前為止我們已經完成了數據準備的部分。** <h3 id="第二節">第二部分：定義網絡結構</h5> ### 這里有一個網絡可視化的小工具可以清楚地看到網絡的結構：[https://dgschwend.github.io/netscope/](https://dgschwend.github.io/netscope/) 可以先看看網絡的具體結構 1. 首先打開網址：[https://dgschwend.github.io/netscope/](https://dgschwend.github.io/netscope/) 點擊下面按鈕 2. ![](https://box.kancloud.cn/a3cb8be385ad43fba9d5d6e1a55722df_1069x361.png) 3. ![](https://box.kancloud.cn/b54315b5899b65898539881877c032ff_730x210.png) 4. 輸入文件：[https://github.com/tangzhenjie/FCN16S/blob/master/ppt/FCN16S.txt](https://github.com/tangzhenjie/FCN16S/blob/master/ppt/FCN16S.txt) 內容能看到官方的FCN16S結構圖，我們就按照這個實現。我們就來書寫網絡結構，回到我們開始創建的:FCN16S.py在其中補全代碼：我們先定義網絡所需要的參數和需要導入的包： ``` from __future__ import print_function import tensorflow as tf import numpy as np import TensorflowUtils as utils import read_MITSceneParsingData as scene_parsing import datetime import BatchDatsetReader as dataset from six.moves import xrange # 兼容python2和python3 # 定義一些網絡需要的參數(可以以命令行可選參數進行重新賦值) FLAGS = tf.flags.FLAGS # batch大小 tf.flags.DEFINE_integer("batch_size", "2", "batch size for training") # 定義日志文件位置 tf.flags.DEFINE_string("logs_dir", "D:\pycharm_program\FCN16S\Logs\\", "path to logs directory") # 定義圖像數據集存放的路徑 tf.flags.DEFINE_string("data_dir", "D:\pycharm_program\FCN16S\Data_zoo\MIT_SceneParsing\\", "path to the dataset") # 定義學習率 tf.flags.DEFINE_float("learning_rate", "1e-4", "learning rate for Adam Optimizer") # 存放VGG16模型的mat (我們使用matlab訓練好的VGG16參數) tf.flags.DEFINE_string("model_dir", "D:\pycharm_program\FCN16S\Model_zoo\\", "Path to vgg model mat") # 是否是調試狀態（如果是調試狀態會額外保存一些信息） tf.flags.DEFINE_bool("debug", "False", "Model Debug:True/ False") # 執行的狀態（訓練測試顯示） tf.flags.DEFINE_string("mode", "train", "Mode: train/ test/ visualize") # checkpoint目錄 tf.flags.DEFINE_string("checkpoint_dir", "D:\pycharm_program\FCN16S\Checkpoint\\", "path to the checkpoint") # 驗證結果保存圖像目錄 tf.flags.DEFINE_string("image_dir", "D:\pycharm_program\FCN16S\Image\\", "path to the checkpoint") # 模型地址 MODEL_URL = "http://www.vlfeat.org/matconvnet/models/beta16/imagenet-vgg-verydeep-16.mat" ``` 我們下一步就是去首先看看下載下來的訓練好的VGG16的權重結構。第一步我們先把模型下載下來，所以在:TensorflowUtils.py中添加以下方法： ``` import scipy.io """ 獲取模型數據 :param dir_path 下載的位置 model_url 模型的網絡位置 """ def get_model_data(dir_path, model_url): maybe_download_and_extract(dir_path, model_url) # 判斷是否下載下來 filename = model_url.split("/")[-1] file_path = os.path.join(dir_path, filename) if not os.path.exists(file_path): raise IOError("VGG16 model not found") data = scipy.io.loadmat(file_path) return data ``` 在FCN16S.py中書寫測試代碼如下： ``` # 測試代碼 model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL) # 測試代碼 ``` 第一次運行結果如下： ![](https://box.kancloud.cn/5285fb086fd62a26efa3c2897415efc5_1087x378.png) 然后我們看看.mat中存儲的數據樣子：如下 ![](https://box.kancloud.cn/f3d348c49c5837fd59efb6f2ae79beee_974x370.png) 我們只關心layers中的信息。所以我們先測試layers中有什么東西，在:FCN16S.py中繼續添加測試代碼如下： > 參考的鏈接是：[https://zhuanlan.zhihu.com/p/40492866](https://zhuanlan.zhihu.com/p/40492866) ``` # 測試代碼 model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL) layers = model_data["layers"] vgg_layers = model_data["layers"][0] # type 1*37 (37層） for element in xrange(0, 37): layer = vgg_layers[element] struct = layer[0][0] number = len(struct) if number == 5: # weights pad type name stride print(struct[3]) if number == 2: # relu層信息 print(struct[1]) if number == 6: # pool層信息或者是最后一層信息 print(struct[0]) # 測試代碼 ``` 運行結果如下（由于太長截不全請自行運行）： ![](https://box.kancloud.cn/8aa3211def28fcc161fdbb9347b158ac_765x318.png) > 結果解釋：打印出了每一層的名字。我們構建網絡只需要其中的卷積層權重即可，所以我們要會獲取W 和 B即可。下面我們獲得W和B繼續添加下面測試代碼： ``` # 第0層是卷積層，我們直接給出第0層w和b的位置 layer0 = vgg_layers[0] # w w_shape = layer0[0][0][0][0][0].shape b_shape = layer0[0][0][0][0][1].shape print(w_shape) print(b_shape) ``` 運行結果如下： ![](https://box.kancloud.cn/adc2e04a5dbf83598fa02c56692ebae5_421x146.png) > 結果說明：我們從網絡結構中可以看出第一層卷積核為3\*3 輸入為3channel輸出為64channel **到此我們清楚了.mat文件中的東西和位置**。我們現在著手開始搭建網絡。因為FCN16S網絡前面的卷積層都沒有動，所以我們先把前面的卷積層搭建起來。繼續回到FCN16S.py這個文件中。在編寫網絡之前我們先在:TensorflowUtils.py中添加幾個功能函數。代碼如下： ``` # 有權重初始值定義在網絡中生成變量的函數 def get_variable(weights, name): # 定義常數初始化器 init = tf.constant_initializer(weights, dtype=tf.float32) # 生成變量 var = tf.get_variable(name=name, initializer=init, shape=weights.shape) return var # 有變量的shape生成平均值為0標準差為0.02的截斷的正態分布數值的變量 def weight_variable(shape, stddev=0.02, name=None): initial = tf.truncated_normal(shape, stddev=stddev) if name is None: return tf.Variable(initial) else: return tf.get_variable(name, initializer=initial) # 生成b值的變量 def bias_variable(shape, name=None): initial = tf.constant(0.0, shape=shape) if name is None: return tf.Variable(initial) else: return tf.get_variable(name, initializer=initial) ####################下面定義操作######################### # 定義卷積輸入和輸出大小不變（通道可能變化）操作 def conv2d_basic(x, W, bias): # stride 1 padding same保證卷積輸入和輸出相同 conv = tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME") return tf.nn.bias_add(conv, bias) # 定義卷積輸出是輸入的二分之一 def conv2d_strided(x, W, bias): conv = tf.nn.conv2d(x, W, strides=[1, 2, 2, 1], padding="SAME") return tf.nn.bias_add(conv, bias) # 定義maxpool層使圖像縮小一半 def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2 , 1], strides=[1, 2, 2, 1], padding="SAME") # 定義平均池化使圖像縮小一半 def avg_pool_2x2(x): return tf.nn.avg_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME") ######################圖像處理方法####################### def process_image(image, mean_pixel): return image - mean_pixel def unprocess_image(image, mean_pixel): return image + mean_pixel ~~~ #######################padding操作#################### # 因為官方caffe代碼說是先padding100 def pading(image, paddingdata): if len(image.shape) == 3: # tensor的shape為[height, width, channels] target_height = image.shape[0] + paddingdata * 2 target_width = image.shape[1] + paddingdata * 2 return tf.image.pad_to_bounding_box(image,offset_height=paddingdata, offset_width=paddingdata, target_height=target_height,target_width=target_width) elif len(image.shape) == 4: # [batch, height, width, channels] target_height = image.shape[1] + paddingdata * 2 target_width = image.shape[2] + paddingdata * 2 return tf.image.pad_to_bounding_box(image, offset_height=paddingdata, offset_width=paddingdata, target_height=target_height,target_width=target_width) else: raise ValueError("image tensor shape error") # 保存圖像 def save_image(image, save_dir, name, mean=None): """ Save image by unprocessing if mean given else just save :param image: :param save_dir: :param name: :param mean: :return: """ if mean: image = unprocess_image(image, mean) misc.imsave(os.path.join(save_dir, name + ".png"), image) ``` **有了這些工具函數我們接著構建網絡** 在FCN16S中添加下面代碼補充完成vgg\_net函數： ``` def vgg_net(weights, image): # 首先我們定義FCN16S中使用VGG16層中的名字，用來生成相同的網絡 layers = ( "conv1_1", "relu1_1", "conv1_2", "relu1_2", "pool1", "conv2_1", "relu2_1", "conv2_2", "relu2_2", "pool2", "conv3_1", "relu3_1", "conv3_2", "relu3_2", "conv3_3", "relu3_3", "pool3", "conv4_1", "relu4_1", "conv4_2", "relu4_2", "conv4_3", "relu4_3" "pool4", "conv5_1", "relu5_1", "conv5_2", "relu5_2", "conv5_3", "relu5_3", "pool5" ) # 生成的公有層的所有接口 net = {} # 當前輸入 current = image for i, name in enumerate(layers): # 獲取前面層名字的前四個字符 kind = name[:4] if kind == "conv": kernels = weights[i][0][0][0][0][0] bias = weights[i][0][0][0][0][1] # matconvnet: weights are [width, height, in_channels, out_channels] # tensorflow: weights are [height, width, in_channels, out_channels] # 生成變量 kernels = utils.get_variable(np.transpose(kernels, (1, 0, 2, 3)), name=name + "_w") bias = utils.get_variable(bias.reshape(-1), name=name + "_b") current = utils.conv2d_basic(current, kernels, bias) elif kind == "relu": current = tf.nn.relu(current, name=name) if FLAGS.debug: utils.add_activation_summary(current) elif kind == "pool": current = utils.avg_pool_2x2(current)\ net[name] = current return net ``` 現在我們把VGG16的前5層結構寫出來了，現在測試是否正確添加測試代碼如下： ``` ####################### 測試代碼 ################################ # 構建圖 model_data = utils.get_model_data("D:\pycharm_program\FCN16S\VGG16MODEL", MODEL_URL) weights = model_data["layers"][0] image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image") net = vgg_net(weights,image) # 獲取數據 training_records, validation_records = scene_parsing.read_dataset("D:\dataSet\MIT") datsetObject = dataset.BatchDatset(validation_records, {"resize":True, "resize_size": 224}) batchdataset = datsetObject.get_random_batch(2) imagedata = batchdataset[0] feed_dict = {image: imagedata} # 運行圖 sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(net["pool5"], feed_dict=feed_dict).shape) ########################## 測試代碼 ########################### ``` 結果： ![](https://box.kancloud.cn/37717f6b300c37b9679f1fe82a9289b5_1001x919.png) > 結果解釋：因為卷積層使圖片大小不變而pool操作會使圖片縮小一半。所以224\*224經過5個pool后變成了7\*7 **到此為止我們實現了FCN16S與VGG16相同的結構下面我們就去完整的構造FCN16S網絡** 在FCN16.py中輸入下面代碼： ``` """ 構建FCN16S :param image 網絡輸入的圖像 [batch, height, width, channels] :return 輸出與image大小相同的tensor """ def fcn16s_net(image, keep_prob): # 首先我們padding圖片 image = utils.pading(image, 100) # 轉換數據類型 # 首先我們獲取相同部分構造的模型權重 model_data = utils.get_model_data(FLAGS.model_dir, MODEL_URL) weights = model_data["layers"][0] with tf.variable_scope("VGG16"): vgg16net_dict = vgg_net(weights, image) with tf.variable_scope("FCN16S"): pool5 = vgg16net_dict["pool5"] # 創建fc6層 w6 = utils.weight_variable([7, 7, 512, 4096], name="w6") b6 = utils.bias_variable([4096], name="b6") conv6 = tf.nn.conv2d(pool5, w6, [1, 1, 1, 1], padding="VALID") conv_bias6 = tf.nn.bias_add(conv6, b6) relu6 = tf.nn.relu(conv_bias6, name="relu6") if FLAGS.debug: utils.add_activation_summary(relu6) relu_dropout6 = tf.nn.dropout(relu6, keep_prob=keep_prob) # 創建fc7層 w7 = utils.weight_variable([1, 1, 4096, 4096], name="w7") b7 = utils.bias_variable([4096], name="b7") conv7 = utils.conv2d_basic(relu_dropout6, w7, b7) relu7 = tf.nn.relu(conv7, name="relu7") if FLAGS.debug: utils.add_activation_summary(relu7) conv_dropout7 = tf.nn.dropout(relu7, keep_prob=keep_prob) # 定義score_fr層 w8 = utils.weight_variable([1, 1, 4096, NUM_OF_CLASSES], name="w8") b8 = utils.bias_variable([NUM_OF_CLASSES], name="b8") score_fr = utils.conv2d_basic(conv_dropout7, w8, b8) # 定義upscore2層 ``` 因為我們需要反卷積層所以我們先在:TensorflowUtils.py中添加下面功能函數來執行反卷積： ``` # 反卷積操作 def conv2d_transpose_strided(x, w, b, output_shape=None, stride=2): if output_shape is None: # 如果默認就讓反卷積的輸出圖片大小擴大一倍，通道為卷積核上的輸出通道 tmp_shape = x.get_shape().as_list() tmp_shape[1] *= 2 tmp_shape[2] *= 2 x_shape = tf.shape(x) output_shape = tf.stack([x_shape[0], tmp_shape[1], tmp_shape[2], w.get_shape().as_list()[2]]) conv = tf.nn.conv2d_transpose(x, w, output_shape, strides=[1, stride, stride, 1], padding="SAME") return tf.nn.bias_add(conv, b) ``` > tensorflow反卷積操作的解釋參考文檔：[https://blog.csdn.net/mao\_xiao\_feng/article/details/71713358](https://blog.csdn.net/mao_xiao_feng/article/details/71713358) 我們在:TensorflowUtils.py文件中測試中添加測試代碼測試卷積操作： ``` ~~~ ###########測試代碼############ # 卷積操作 conv_image = tf.zeros([1, 12, 12, 3], dtype=tf.float32) conv_kernel = tf.Variable(initial_value=tf.ones([2, 2, 3, 2], dtype=tf.float32)) out_image = tf.nn.conv2d(conv_image, conv_kernel, [1,2,2,1], padding="SAME") #反卷積操作 transpose_kernel = tf.Variable(initial_value=tf.ones([2,2,3,2], dtype=tf.float32)) transpose_b = tf.Variable(initial_value=tf.zeros([3], dtype=tf.float32)) image = conv2d_transpose_strided(out_image, transpose_kernel, transpose_b) sess = tf.Session() sess.run(tf.initialize_all_variables()) print(sess.run(image).shape) ###########測試代碼############ ~~~ ``` 正確結果如下：![](https://box.kancloud.cn/0ca8bed347b39be967c70906c66a9052_1423x822.png) > 反卷積是卷積逆操作（傳入的參數卷積核、stride、padding不變，圖片和偏執需要改變）刪除測試代碼我們繼續回到FCN16S.py構建我們的網絡: ``` ~~~ # 定義upscore2層 w9 = utils.weight_variable([4, 4, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w9") b9 = utils.bias_variable([NUM_OF_CLASSES], name="b9") upscore2 = utils.conv2d_transpose_strided(score_fr, w9, b9) # 定義score_pool4 pool4_shape = vgg16net_dict["pool4"].get_shape() w10 = utils.weight_variable([1, 1, pool4_shape[3].value, NUM_OF_CLASSES], name="w10") b10 = utils.bias_variable([NUM_OF_CLASSES], name="b10") score_pool4 = utils.conv2d_basic(vgg16net_dict["pool4"], w10, b10) # 定義score_pool4c upscore2_shape = upscore2.get_shape() upscore2_target_height = upscore2_shape[1].value upscore2_target_width = upscore2_shape[2].value score_pool4c = tf.image.crop_to_bounding_box(score_pool4, 5, 5, upscore2_target_height, upscore2_target_width) # 定義fuse_pool4 fuse_pool4 = tf.add(upscore2, score_pool4c, name="fuse_pool4") # 定義upscore16 fuse_pool4_shape = fuse_pool4.get_shape() w11 = utils.weight_variable([32, 32, NUM_OF_CLASSES, NUM_OF_CLASSES], name="w11") b11 = utils.bias_variable([NUM_OF_CLASSES], name="b11") output_shape = tf.stack([tf.shape(fuse_pool4)[0], fuse_pool4_shape[1].value * 16, fuse_pool4_shape[2].value * 16, NUM_OF_CLASSES]) upscore16 = utils.conv2d_transpose_strided(fuse_pool4, w11, b11, output_shape=output_shape , stride=16) # 定義score層 image_shape = image.get_shape() score_target_height = image_shape[1].value - 200 # 因為輸入網絡的圖片需要先padding100，所以減去200 score_target_width = image_shape[2].value - 200 # 因為輸入網絡的圖片需要先padding100，所以減去200 score = tf.image.crop_to_bounding_box(upscore16, 27, 27, score_target_height, score_target_width) annotation_pred = tf.argmax(score, dimension=3, name="prediction") return tf.expand_dims(annotation_pred, dim=3), score ~~~ ``` > 注意由于tensorflow中的反卷積和caffe中的有區別，這里我們中間反卷積時操作的輸出可能與原網絡有區別。不過應該不影響網絡的最終性能，我們到最后就能看出來。到此我們寫完了fcn16s\_net函數。我們構建完了網絡實現了：從一個圖像到經過卷積、池化和上卷積、剪切生成與原圖像一樣的特征圖。我們先測試一下，在:FCN16S.py中添加如下代碼： ``` ####################### 測試代碼 ################################ # 構建圖 image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image") predict, score = fcn16s_net(image, 0.5) # 獲取數據 training_records, validation_records = scene_parsing.read_dataset("D:\dataSet\MIT") datsetObject = dataset.BatchDatset(validation_records, {"resize":True, "resize_size": 224}) batchdataset = datsetObject.get_random_batch(2) imagedata = batchdataset[0] feed_dict = {image: imagedata} # 運行圖 sess = tf.Session() sess.run(tf.global_variables_initializer()) print(sess.run(score, feed_dict=feed_dict).shape) ########################## 測試代碼 ########################### ``` > 注意記得修改model\_dir的值，否則你還得下載一次模型數據（模型數據有點大）測試結果如下： ![](https://box.kancloud.cn/04511fb12c3de566aaa672ac6e4c8ef1_671x291.png) **到此我們已經實現了定義網絡結構的一部分。** <h3 id="第三節">第三部分：定義損失函數</h5> 這一節我們就來實現訓練該網絡的一部分。我們先寫main函數： ``` ~~~ def main(argv=None): #構建網絡部分 # 我們首先定義網絡的輸入部分 keep_probability = tf.placeholder(tf.float32, name="keep_probability") image = tf.placeholder(tf.float32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 3], name="input_image") annotation = tf.placeholder(tf.int32, shape=[None, IMAGE_SIZE, IMAGE_SIZE, 1], name="annotation") pred_annotation, logits = fcn16s_net(image, keep_probability) # 把我們需要觀察的圖片和生成的結果圖保存下來 tf.summary.image("input_image", image, max_outputs=2) tf.summary.image("ground_truth", tf.cast(annotation, tf.uint8), max_outputs=2) tf.summary.image(pred_annotation, tf.cast(pred_annotation, tf.uint8), max_outputs=2) # 定義損失函數 loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits, labels=tf.squeeze(annotation, squeeze_dims=[3])), name="entropy") # 把損失保存下來 loss_summary = tf.summary.scalar("entropy", loss) # 獲取要訓練的變量 trainable_var = tf.trainable_variables() # 如果是調試運行下保存變量 if FLAGS.debug: for var in trainable_var: utils.add_to_regularization_and_summary(var) ~~~ ``` <h3 id="第四節">第四部分：優化算法</h5> 有了損失函數我們現在就去使用優化算法來減少損失，我們在FCN16S.py文件中添加優化損失的函數： ``` ~~~ def train(loss_val, var_list): optimizer = tf.train.AdamOptimizer(FLAGS.learning_rate) grads = optimizer.compute_gradients(loss_val, var_list=var_list) if FLAGS.debug: for grad, var in grads: utils.add_gradient_summary(grad, var) return optimizer.apply_gradients(grads) ~~~ ``` 有了優化算法我們繼續在main函數中構建網絡： > 參考鏈接學習tensorboard：[https://jhui.github.io/2017/03/12/TensorBoard-visualize-your-learning/](https://jhui.github.io/2017/03/12/TensorBoard-visualize-your-learning/) ``` # 如果是調試運行下保存變量 if FLAGS.debug: for var in trainable_var: utils.add_to_regularization_and_summary(var) train_op = train(loss, trainable_var) #創建把所有要保存的調試信息集中起來的操作（以備存入文件） print("Setting up summary op....") summary_op = tf.summary.merge_all() #################到此我們網絡構建完畢################# #################下面我們構建數據########## print("Setting up image reader...") train_records, valid_records = scene_parsing.read_dataset(FLAGS.data_dir) # 打印出來看看數據條數是否正確 print(len(train_records)) print(len(valid_records)) print("Setting up dataset reader...") image_options = {'resize':True, 'resize_size':IMAGE_SIZE} if FLAGS.mode == "train": train_dataset_reader = dataset.BatchDatset(train_records, image_options) validation_dataset_reader = dataset.BatchDatset(valid_records, image_options) #################構建數據完成#################################### ###################構建運行對話################## sess = tf.Session() print("Setting up Saver.....") saver = tf.train.Saver() # create two summary writers to show training loss and validation loss in the same graph # need to create two folders 'train' and 'validation' inside FLAGS.logs_dir train_writer = tf.summary.FileWriter(FLAGS.logs_dir + "/train", sess.graph) validation_writer = tf.summary.FileWriter(FLAGS.logs_dir + "validation") # 首先給變量初始化進行訓練驗證前的的準備 sess.run(tf.global_variables_initializer()) # 判斷有沒有checkpoint ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir) if ckpt and ckpt.model_checkpoint_path: saver.restore(sess, ckpt.model_checkpoint_path) print("Model restored .....") # 開始訓練或者驗證 if FLAGS.mode == "train": for itr in xrange(MAX_ITERATION): # 先生成batch數據 train_images, train_annotation = train_dataset_reader.next_batch(FLAGS.batch_size) feed_dict = {image: train_images, annotation: train_annotation, keep_probability:0.85} # 運行 sess.run(train_op, feed_dict=feed_dict) # 下面是保存一些能反映訓練中的過程的一些信息 if itr % 10 == 0: train_loss, summary_str = sess.run([loss, loss_summary], feed_dict=feed_dict) print("Step: %d, Train_loss: %d" % (itr, train_loss)) train_writer.add_summary(summary_str, itr) train_writer.flush() if itr % 500 == 0: valid_images, valid_annotations = validation_dataset_reader.next_batch(FLAGS.batch_size) valid_loss, summary_sva = sess.run([loss, loss_summary], feed_dict={image: valid_images, annotation: valid_annotations, keep_probability: 1.0}) print("%s------> Validation_loss: %g" % (datetime.datetime.now(), valid_loss)) saver.save(sess, FLAGS.checkpoint_dir + "model.ckpt", itr) # add validation loss to TensorBoard validation_writer.add_summary(summary_sva, itr) validation_writer.flush() elif FLAGS.mode == "visualize": valid_images, valid_annotations = validation_dataset_reader.get_random_batch(FLAGS.batch_size) pred = sess.run(pred_annotation, feed_dict={image: valid_images, annotation: valid_annotations, keep_probability: 1.0}) valid_annotations = np.squeeze(valid_annotations, axis=3) pred = np.squeeze(pred, axis=3) # 保存結果 for itr in range(FLAGS.batch_size): utils.save_image(valid_images[itr].astype(np.uint8), FLAGS.image_dir, name="inp_" + str(5+itr)) utils.save_image(valid_annotations[itr].astype(np.uint8), FLAGS.image_dir, name="gt_" + str(5+itr)) utils.save_image(pred[itr].astype(np.uint8), FLAGS.image_dir, name="pred_" + str(5+itr)) print("Saved image: %d" % itr) ~~~ ``` 到此我們main函數就寫完了。下面我們就可以運行該網絡了，添加運行代碼： ``` ~~~ if __name__ == "__main__": tf.app.run() ~~~ ``` 下面就是見證奇跡的時刻了。運行:FCN16S.py結果如下圖所示： ![](https://box.kancloud.cn/87d7a09b0f2c4424033aa20c75d0bd3f_965x712.png) > 注意：至此我們就完全實現了FCN16S網絡。注意上面代碼運行的時候會特別吃內存，因為該代碼會先把全部的數據集讀入內存。后期我們會換成tensorflow中的讀取方式來解決此問題 <h3 id="第五節">第五部分：運行結果測試</h5> 我們在代碼里加上計算m\_iou的節點然后測試： ``` ~~~ # 計算m_iou re_shape = tf.stack([tf.shape(pred_annotation)[0], IMAGE_SIZE * IMAGE_SIZE, 1]) annotation_new = tf.reshape(annotation, re_shape) pred_annotation_new = tf.reshape(pred_annotation, re_shape) mean_iou, endarray = tf.metrics.mean_iou(annotation_new, pred_annotation_new, NUM_OF_CLASSES) ~~~ ``` 然后在訓練的代碼中添加如下代碼： ``` ~~~ sess.run(tf.local_variables_initializer()) ~~~ ~~~ # miou m_iou, array_end = sess.run([mean_iou, endarray], feed_dict={image: train_images, annotation: train_annotation, keep_probability:1.0}) print(m_iou) print(array_end) ~~~ ``` 然后運行結果不好。我們下一節修改讀入方法，和調試該網路與論文結果一直。最后還是把到目前為止實現的代碼位置分享給大家：[https://github.com/tangzhenjie/FCN16S](https://github.com/tangzhenjie/FCN16S)