COCO 動物數據集和預處理圖像 · 精通 TensorFlow 1.x

# COCO 動物數據集和預處理圖像對于我們的例子，我們將使用 COCO 動物數據集，這是 COCO 數據集的一小部分，由斯坦福大學的研究人員提供，鏈接如下： [http://cs231n.stanford.edu/coco- animals.zip](http://cs231n.stanford.edu/coco-animals.zip) 。 COCO 動物數據集有 800 個訓練圖像和 200 個動物類別的測試圖像：熊，鳥，貓，狗，長頸鹿，馬，綿羊和斑馬。為 VGG16 和 Inception 模型下載和預處理圖像。對于 VGG 模型，圖像大小為 224 x 224，預處理步驟如下： 1. 將圖像調整為 224×224，其函數類似于來自 TensorFlow 的 `tf.image.resize_image_with_crop_or_pad` 函數。我們實現了這個函數如下： ```py def resize_image(self,in_image:PIL.Image, new_width, new_height, crop_or_pad=True): img = in_image if crop_or_pad: half_width = img.size[0] // 2 half_height = img.size[1] // 2 half_new_width = new_width // 2 half_new_height = new_height // 2 img = img.crop((half_width-half_new_width, half_height-half_new_height, half_width+half_new_width, half_height+half_new_height )) img = img.resize(size=(new_width, new_height)) return img ``` 1. 調整大小后，將圖像從 PIL.Image 轉換為 NumPy 數組并檢查圖像是否有深度通道，因為數據集中的某些圖像僅為灰度。 ```py img = self.pil_to_nparray(img) if len(img.shape)==2: # greyscale or no channels then add three channels h=img.shape[0] w=img.shape[1] img = np.dstack([img]*3) ``` 1. 然后我們從圖像中減去 VGG 數據集平均值以使數據居中。我們將新訓練圖像的數據居中的原因是這些特征具有與用于降雨模型的初始數據類似的范圍。通過在相似范圍內制作特征，我們確保再訓練期間的梯度不會變得太高或太低。同樣通過使數據居中，學習過程變得更快，因為對于以零均值為中心的每個通道，梯度變得均勻。 ```py means = np.array([[[123.68, 116.78, 103.94]]]) #shape=[1, 1, 3] img = img - means ``` 完整的預處理函數如下： ```py def preprocess_for_vgg(self,incoming, height, width): if isinstance(incoming, six.string_types): img = self.load_image(incoming) else: img=incoming img_size = vgg.vgg_16.default_image_size height = img_size width = img_size img = self.resize_image(img,height,width) img = self.pil_to_nparray(img) if len(img.shape)==2: # greyscale or no channels then add three channels h=img.shape[0] w=img.shape[1] img = np.dstack([img]*3) means = np.array([[[123.68, 116.78, 103.94]]]) #shape=[1, 1, 3] try: img = img - means except Exception as ex: print('Error preprocessing ',incoming) print(ex) return img ``` 對于 Inception 模型，圖像大小為 299 x 299，預處理步驟如下： 1. 圖像大小調整為 299 x 299，其函數類似于來自 TensorFlow 的 `tf.image.resize_image_with_crop_or_pad` 函數。我們實現了之前在 VGG 預處理步驟中定義的此函數。 2. 然后使用以下代碼將圖像縮放到范圍`(-1, +1)`： ```py img = ((img/255.0) - 0.5) * 2.0 ``` 完整的預處理函數如下： ```py def preprocess_for_inception(self,incoming): img_size = inception.inception_v3.default_image_size height = img_size width = img_size if isinstance(incoming, six.string_types): img = self.load_image(incoming) else: img=incoming img = self.resize_image(img,height,width) img = self.pil_to_nparray(img) if len(img.shape)==2: # greyscale or no channels then add three channels h=img.shape[0] w=img.shape[1] img = np.dstack([img]*3) img = ((img/255.0) - 0.5) * 2.0 return img ``` 讓我們加載 COCO 動物數據集： ```py from datasetslib.coco import coco_animals coco = coco_animals() x_train_files, y_train, x_val_files, x_val = coco.load_data() ``` 我們從驗證集中的每個類中取一個圖像，制作列表， `x_test` 并預處理圖像以制作列表 `images_test`： ```py x_test = [x_val_files[25*x] for x in range(8)] images_test=np.array([coco.preprocess_for_vgg(x) for x in x_test]) ``` 我們使用這個輔助函數來顯示與圖像相關的前五個類的圖像和概率： ```py # helper function def disp(images,id2label=None,probs=None,n_top=5,scale=False): if scale: imgs = np.abs(images + np.array([[[[123.68, 116.78, 103.94]]]]))/255.0 else: imgs = images ids={} for j in range(len(images)): if scale: plt.figure(figsize=(5,5)) plt.imshow(imgs[j]) else: plt.imshow(imgs[j].astype(np.uint8) ) plt.show() if probs is not None: ids[j] = [i[0] for i in sorted(enumerate(-probs[j]), key=lambda x:x[1])] for k in range(n_top): id = ids[j][k] print('Probability {0:1.2f}% of[{1:}]' .format(100*probs[j,id],id2label[id])) ``` 上述函數中的以下代碼恢復為預處理的效果，以便顯示原始圖像而不是預處理圖像： ```py imgs = np.abs(images + np.array([[[[123.68, 116.78, 103.94]]]]))/255.0 ``` 在 Inception 模型的情況下，用于反轉預處理的代碼如下： ```py imgs = (images / 2.0) + 0.5 ``` 您可以使用以下代碼查看測試圖像： ```py images=np.array([mpimg.imread(x) for x in x_test]) disp(images) ``` 按照 Jupyter 筆記本中的代碼查看圖像。它們看起來都有不同的尺寸，所以讓我們打印它們的原始尺寸： ```py print([x.shape for x in images]) ``` 尺寸是： ```py [(640, 425, 3), (373, 500, 3), (367, 640, 3), (427, 640, 3), (428, 640, 3), (426, 640, 3), (480, 640, 3), (612, 612, 3)] ``` 讓我們預處理測試圖像并查看尺寸： ```py images_test=np.array([coco.preprocess_for_vgg(x) for x in x_test]) print(images_test.shape) ``` 維度為： ```py (8, 224, 224, 3) ``` 在 Inception 的情況下，維度是： ```py (8, 299, 299, 3) ``` Inception 的預處理圖像不可見，但讓我們打印 VGG 的預處理圖像，以了解它們的外觀： ```py disp(images_test) ``` | | | | --- | --- | | ![](https://img.kancloud.cn/d8/1b/d81b0224b5f651c612f32686060c8d1e_587x578.png) | ![](https://img.kancloud.cn/48/08/4808a07e973dee42ebfcbab632fa57ba_587x578.png) | | ![](https://img.kancloud.cn/45/a9/45a96740489b443f9bd5b4d3ec1d9ef8_587x578.png) | ![](https://img.kancloud.cn/b0/35/b03562699889663e76726aee549a908e_587x578.png) | | ![](https://img.kancloud.cn/f4/87/f487fd20120c690a84f38cb6db0933df_587x578.png) | ![](https://img.kancloud.cn/dc/9b/dc9b25ed80cfe35ea0e4e7fdb935de33_587x578.png) | | ![](https://img.kancloud.cn/71/45/7145da5edfdecba49c23c339412f8c10_587x578.png) | ![](https://img.kancloud.cn/90/78/907824e28a2e67733d1fb84efd9d3951_587x578.png) | 實際上圖像被裁剪了，我們可以看到當我們在保持裁剪的同時反轉預處理時它們的樣子： | | | | --- | --- | | ![](https://img.kancloud.cn/c4/66/c4669ed0842c81e97029556b3a36aca4_315x306.png) | ![](https://img.kancloud.cn/12/42/12426863efb94a00d851da28d5f64417_315x306.png) | | ![](https://img.kancloud.cn/b3/8a/b38ad24f3d9b8c1e2f04635e3f5b50aa_315x306.png) | ![](https://img.kancloud.cn/d8/1b/d81b914c2b4e2e73d0d077a3ba283dc6_315x306.png) | | ![](https://img.kancloud.cn/eb/0e/eb0e0bc6e52b1827c631f68c782d92b4_315x306.png) | ![](https://img.kancloud.cn/e8/f1/e8f1ff8a3616f1445b1db02acd502693_315x306.png) | | ![](https://img.kancloud.cn/17/b8/17b87919a2fe4e27cd3456cec9f42635_315x306.png) | ![](https://img.kancloud.cn/81/80/8180c0af9bcc1d84bfbd8d6644917bbf_315x306.png) | 現在我們已經有來自 ImageNet 的標簽以及來自 COCO 圖像數據集的圖像和標簽，我們試試遷移學習示例。