(1)_基礎介紹 · 深度學習與計算機視覺

作者： [寒小陽](http://blog.csdn.net/han_xiaoyang?viewmode=contents) &&[龍心塵](http://blog.csdn.net/longxinchen_ml?viewmode=contents) 時間：2015年11月。出處：[http://blog.csdn.net/han_xiaoyang/article/details/49876119](http://blog.csdn.net/han_xiaoyang/article/details/49876119) 聲明：版權所有，轉載請注明出處，謝謝。 ### 1.背景 [計算機視覺](http://library.kiwix.org/wikipedia_zh_all/A/html/%E8%AE%A1/%E7%AE%97/%E6%9C%BA/%E8%A7%86/%E8%AE%A1%E7%AE%97%E6%9C%BA%E8%A7%86%E8%A7%89.html)/[computer vision](https://en.wikipedia.org/wiki/Computer_vision)是一個火了N年的topic。持續化升溫的原因也非常簡單：在搜索/影像內容理解/醫學應用/地圖識別等等領域應用太多，大家都有一個愿景『讓計算機能夠像人一樣去”看”一張圖片，甚至”讀懂”一張圖片』。有幾個比較重要的計算機視覺任務，比如圖片的分類,物體識別，物體定位于檢測等等。而近年來的[神經網絡/深度學習](https://en.wikipedia.org/wiki/Deep_learning)使得上述任務的準確度有了非常大的提升。加之最近做了幾個不大不小的計算機視覺上的項目，愛湊熱鬧的博主自然不打算放過此領域，也邊學邊做點筆記總結，寫點東西，寫的不正確的地方，歡迎大家提出和指正。 ### 2.基礎知識為了簡單易讀易懂，這個系列中絕大多數的代碼都使用python完成。這里稍微介紹一下python和Numpy/Scipy(**python中的科學計算包**)的一些基礎。 #### 2.1 python基礎 python是一種長得像偽代碼，具備高可讀性的編程語言。優點挺多：可讀性相當好，寫起來也簡單，所想立馬可以轉為實現代碼，且社區即為活躍，可用的package相當多；缺點：效率一般。 #### 2.1.1 基本數據類型最常用的有數值型(Numbers),布爾型(Booleans)和字符串(String)三種。 - 數值型(Numbers) 可進行簡單的運算，如下： ~~~ x = 5 print type(x) # Prints "<type 'int'>" print x # Prints "5" print x + 1 # 加; prints "6" print x - 1 # 減; prints "4" print x * 2 # 乘; prints "10" print x ** 2 # 冪; prints "25" x += 1 #自加 print x # Prints "6" x *= 2 #自乘 print x # Prints "12" y = 2.5 print type(y) # Prints "<type 'float'>" print y, y + 1, y * 2, y ** 2 # Prints "2.5 3.5 5.0 6.25" ~~~ PS：python中沒有x++ 和 x– 操作 - 布爾型(Booleans) 包含True False和常見的與或非操作 ~~~ t = True f = False print type(t) # Prints "<type 'bool'>" print t and f # 邏輯與; prints "False" print t or f # 邏輯或; prints "True" print not t # 邏輯非; prints "False" print t != f # XOR; prints "True" ~~~ - 字符串型(String) 字符串可以用單引號/雙引號/三引號聲明 ~~~ hello = 'hello' world = "world" print hello # Prints "hello" print len(hello) # 字符串長度; prints "5" hw = hello + ' ' + world # 字符串連接 print hw # prints "hello world" hw2015 = '%s %s %d' % (hello, world, 2015) # 格式化字符串 print hw2015 # prints "hello world 2015" ~~~ 字符串對象有很有有用的函數： ~~~ s = "hello" print s.capitalize() # 首字母大寫; prints "Hello" print s.upper() # 全大寫; prints "HELLO" print s.rjust(7) # 以7為長度右對齊，左邊補空格; prints " hello" print s.center(7) # 居中補空格; prints " hello " print s.replace('l', '(ell)') # 字串替換;prints "he(ell)(ell)o" print ' world '.strip() # 去首位空格; prints "world" ~~~ #### 2.1.2 基本容器 **列表/List** 和數組類似的一個東東，不過可以包含不同類型的元素，同時大小也是可以調整的。 ~~~ xs = [3, 1, 2] # 創建 print xs, xs[2] # Prints "[3, 1, 2] 2" print xs[-1] # 第-1個元素，即最后一個 xs[2] = 'foo' # 下標從0開始，這是第3個元素 print xs # 可以有不同類型，Prints "[3, 1, 'foo']" xs.append('bar') # 尾部添加一個元素 print xs # Prints x = xs.pop() # 去掉尾部的元素 print x, xs # Prints "bar [3, 1, 'foo']" ~~~ 列表最常用的操作有： **切片/slicing** 即取子序列/一部分元素，如下： ~~~ nums = range(5) # 從1到5的序列 print nums # Prints "[0, 1, 2, 3, 4]" print nums[2:4] # 下標從2到4-1的元素 prints "[2, 3]" print nums[2:] # 下標從2到結尾的元素 print nums[:2] # 從開頭到下標為2-1的元素 [0, 1] print nums[:] # 恩，就是全取出來了 print nums[:-1] # 從開始到第-1個元素(最后的元素) nums[2:4] = [8, 9] # 對子序列賦值 print nums # Prints "[0, 1, 8, 8, 4]" ~~~ **循環/loops** 即遍歷整個list，做一些操作，如下： ~~~ animals = ['cat', 'dog', 'monkey'] for animal in animals: print animal # 依次輸出 "cat", "dog", "monkey"，每個一行. ~~~ 可以用enumerate取出元素的同時帶出下標 ~~~ animals = ['cat', 'dog', 'monkey'] for idx, animal in enumerate(animals): print '#%d: %s' % (idx + 1, animal) # 輸出 "#1: cat", "#2: dog", "#3: monkey"，一個一行。 ~~~ **List comprehension** 這個相當相當相當有用，在很長的list生成過程中，效率完勝for循環： ~~~ # for 循環 nums = [0, 1, 2, 3, 4] squares = [] for x in nums: squares.append(x ** 2) print squares # Prints [0, 1, 4, 9, 16] # list comprehension nums = [0, 1, 2, 3, 4] squares = [x ** 2 for x in nums] print squares # Prints [0, 1, 4, 9, 16] ~~~ 你猜怎么著，list comprehension也是可以加多重條件的： ~~~ nums = [0, 1, 2, 3, 4] even_squares = [x ** 2 for x in nums if x % 2 == 0] print even_squares # Prints "[0, 4, 16]" ~~~ **字典/Dict** 和Java中的Map一樣的東東，用于存儲key-value對： ~~~ d = {'cat': 'cute', 'dog': 'furry'} # 創建 print d['cat'] # 根據key取出value print 'cat' in d # 判斷是否有'cat'這個key d['fish'] = 'wet' # 添加元素 print d['fish'] # Prints "wet" # print d['monkey'] # KeyError: 'monkey'非本字典的key print d.get('monkey', 'N/A') # 有key返回value，無key返回"N/A" print d.get('fish', 'N/A') # prints "wet" del d['fish'] # 刪除某個key以及對應的value print d.get('fish', 'N/A') # prints "N/A" ~~~ 對應list的那些操作，你在dict里面也能找得到： **循環/loops** ~~~ # for循環 d = {'person': 2, 'cat': 4, 'spider': 8} for animal in d: legs = d[animal] print 'A %s has %d legs' % (animal, legs) # Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs" # 通過iteritems d = {'person': 2, 'cat': 4, 'spider': 8} for animal, legs in d.iteritems(): print 'A %s has %d legs' % (animal, legs) # Prints "A person has 2 legs", "A spider has 8 legs", "A cat has 4 legs" ~~~ ~~~ # Dictionary comprehension nums = [0, 1, 2, 3, 4] even_num_to_square = {x: x ** 2 for x in nums if x % 2 == 0} print even_num_to_square # Prints "{0: 0, 2: 4, 4: 16}" ~~~ **元組/turple** 本質上說，還是一個list，只不過里面的每個元素都是一個兩元組對。 ~~~ d = {(x, x + 1): x for x in range(10)} # 創建 t = (5, 6) # Create a tuple print type(t) # Prints "<type 'tuple'>" print d[t] # Prints "5" print d[(1, 2)] # Prints "1" ~~~ #### 2.1.3 函數用def可以定義一個函數： ~~~ def sign(x): if x > 0: return 'positive' elif x < 0: return 'negative' else: return 'zero' for x in [-1, 0, 1]: print sign(x) # Prints "negative", "zero", "positive" ~~~ ~~~ def hello(name, loud=False): if loud: print 'HELLO, %s' % name.upper() else: print 'Hello, %s!' % name hello('Bob') # Prints "Hello, Bob" hello('Fred', loud=True) # Prints "HELLO, FRED!" ~~~ **類/Class** python里面的類定義非常的直接和簡潔： ~~~ class Greeter: # Constructor def __init__(self, name): self.name = name # Create an instance variable # Instance method def greet(self, loud=False): if loud: print 'HELLO, %s!' % self.name.upper() else: print 'Hello, %s' % self.name g = Greeter('Fred') # Construct an instance of the Greeter class g.greet() # Call an instance method; prints "Hello, Fred" g.greet(loud=True) # Call an instance method; prints "HELLO, FRED!" ~~~ #### 2.2.NumPy基礎 NumPy是Python的科學計算的一個核心庫。它提供了一個高性能的多維數組(矩陣)對象，可以完成在其之上的很多操作。很多機器學習中的計算問題，把數據vectorize之后可以進行非常高效的運算。 #### 2.2.1 數組一個NumPy數組是一些類型相同的元素組成的類矩陣數據。用list或者層疊的list可以初始化： ~~~ import numpy as np a = np.array([1, 2, 3]) # 一維Numpy數組 print type(a) # Prints "<type 'numpy.ndarray'>" print a.shape # Prints "(3,)" print a[0], a[1], a[2] # Prints "1 2 3" a[0] = 5 # 重賦值 print a # Prints "[5, 2, 3]" b = np.array([[1,2,3],[4,5,6]]) # 二維Numpy數組 print b.shape # Prints "(2, 3)" print b[0, 0], b[0, 1], b[1, 0] # Prints "1 2 4" ~~~ 生成一些特殊的Numpy數組(矩陣)時，我們有特定的函數可以調用： ~~~ import numpy as np a = np.zeros((2,2)) # 全0的2*2 Numpy數組 print a # Prints "[[ 0. 0.] # [ 0. 0.]]" b = np.ones((1,2)) # 全1 Numpy數組 print b # Prints "[[ 1. 1.]]" c = np.full((2,2), 7) # 固定值Numpy數組 print c # Prints "[[ 7. 7.] # [ 7. 7.]]" d = np.eye(2) # 2*2 對角Numpy數組 print d # Prints "[[ 1. 0.] # [ 0. 1.]]" e = np.random.random((2,2)) # 2*2 的隨機Numpy數組 print e # 隨機輸出 ~~~ #### 2.2.2 Numpy數組索引與取值可以通過像list一樣的分片/slicing操作取出需要的數值部分。 ~~~ import numpy as np # 創建如下的3*4 Numpy數組 # [[ 1 2 3 4] # [ 5 6 7 8] # [ 9 10 11 12]] a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) # 通過slicing取出前兩行的2到3列: # [[2 3] # [6 7]] b = a[:2, 1:3] # 需要注意的是取出的b中的數據實際上和a的這部分數據是同一份數據. print a[0, 1] # Prints "2" b[0, 0] = 77 # b[0, 0] 和 a[0, 1] 是同一份數據 print a[0, 1] # a也被修改了，Prints "77" ~~~ ~~~ import numpy as np a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]]) row_r1 = a[1, :] # a 的第二行 row_r2 = a[1:2, :] # 同上 print row_r1, row_r1.shape # Prints "[5 6 7 8] (4,)" print row_r2, row_r2.shape # Prints "[[5 6 7 8]] (1, 4)" col_r1 = a[:, 1] col_r2 = a[:, 1:2] print col_r1, col_r1.shape # Prints "[ 2 6 10] (3,)" print col_r2, col_r2.shape # Prints "[[ 2] # [ 6] # [10]] (3, 1)" ~~~ 還可以這么著取： ~~~ import numpy as np a = np.array([[1,2], [3, 4], [5, 6]]) # 取出(0,0) (1,1) (2,0)三個位置的值 print a[[0, 1, 2], [0, 1, 0]] # Prints "[1 4 5]" # 和上面一樣 print np.array([a[0, 0], a[1, 1], a[2, 0]]) # Prints "[1 4 5]" # 取出(0,1) (0,1) 兩個位置的值 print a[[0, 0], [1, 1]] # Prints "[2 2]" # 同上 print np.array([a[0, 1], a[0, 1]]) # Prints "[2 2]" ~~~ 我們還可以通過條件得到bool型的Numpy數組結果，再通過這個數組取出符合條件的值，如下： ~~~ import numpy as np a = np.array([[1,2], [3, 4], [5, 6]]) bool_idx = (a > 2) # 判定a大于2的結果矩陣 print bool_idx # Prints "[[False False] # [ True True] # [ True True]]" # 再通過bool_idx取出我們要的值 print a[bool_idx] # Prints "[3 4 5 6]" # 放在一起我們可以這么寫 print a[a > 2] # Prints "[3 4 5 6]" ~~~ #### Numpy數組的類型 ~~~ import numpy as np x = np.array([1, 2]) print x.dtype # Prints "int64" x = np.array([1.0, 2.0]) print x.dtype # Prints "float64" x = np.array([1, 2], dtype=np.int64) # 強制使用某個type print x.dtype # Prints "int64" ~~~ #### 2.2.3 Numpy數組的運算矩陣的加減開方和(元素對元素)乘除如下： ~~~ import numpy as np x = np.array([[1,2],[3,4]], dtype=np.float64) y = np.array([[5,6],[7,8]], dtype=np.float64) # [[ 6.0 8.0] # [10.0 12.0]] print x + y print np.add(x, y) # [[-4.0 -4.0] # [-4.0 -4.0]] print x - y print np.subtract(x, y) # 元素對元素，點對點的乘積 # [[ 5.0 12.0] # [21.0 32.0]] print x * y print np.multiply(x, y) # 元素對元素，點對點的除法 # [[ 0.2 0.33333333] # [ 0.42857143 0.5 ]] print x / y print np.divide(x, y) # 開方 # [[ 1. 1.41421356] # [ 1.73205081 2. ]] print np.sqrt(x) ~~~ 矩陣的內積是通過下列方法計算的： ~~~ import numpy as np x = np.array([[1,2],[3,4]]) y = np.array([[5,6],[7,8]]) v = np.array([9,10]) w = np.array([11, 12]) # 向量內積，得到 219 print v.dot(w) print np.dot(v, w) # 矩陣乘法，得到 [29 67] print x.dot(v) print np.dot(x, v) # 矩陣乘法 # [[19 22] # [43 50]] print x.dot(y) print np.dot(x, y) ~~~ 特別特別有用的一個操作是，sum/求和(對某個維度)： ~~~ import numpy as np x = np.array([[1,2],[3,4]]) print np.sum(x) # 整個矩陣的和，得到 "10" print np.sum(x, axis=0) # 每一列的和得到 "[4 6]" print np.sum(x, axis=1) # 每一行的和得到 "[3 7]" ~~~ 還有一個經常會用到操作是矩陣的轉置，在Numpy數組里用.T實現： ~~~ import numpy as np x = np.array([[1,2], [3,4]]) print x # Prints "[[1 2] # [3 4]]" print x.T # Prints "[[1 3] # [2 4]]" # 1*n的Numpy數組，用.T之后其實啥也沒做: v = np.array([1,2,3]) print v # Prints "[1 2 3]" print v.T # Prints "[1 2 3]" ~~~ #### 2.2.4 Broadcasting Numpy還有一個非常牛逼的機制，你想想，如果你現在有一大一小倆矩陣，你想使用小矩陣在大矩陣上做多次操作。額，舉個例子好了，假如你想將一個1 * n的矩陣，加到m * n的矩陣的每一行上： ~~~ #你如果要用for循環實現是醬紫的(下面用y的原因是，你不想改變原來的x) import numpy as np x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) v = np.array([1, 0, 1]) y = np.empty_like(x) # 設置一個和x一樣維度的Numpy數組y # 逐行相加 for i in range(4): y[i, :] = x[i, :] + v # 恩，y就是你想要的了 # [[ 2 2 4] # [ 5 5 7] # [ 8 8 10] # [11 11 13]] print y ~~~ ~~~ #上一種方法如果for的次數非常多，會很慢，于是我們改進了一下 import numpy as np x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) v = np.array([1, 0, 1]) vv = np.tile(v, (4, 1)) # 變形，重復然后疊起來 print vv # Prints "[[1 0 1] # [1 0 1] # [1 0 1] # [1 0 1]]" y = x + vv # 相加 print y # Prints "[[ 2 2 4 # [ 5 5 7] # [ 8 8 10] # [11 11 13]]" ~~~ ~~~ #其實因為Numpy的Broadcasting，你可以直接醬紫操作 import numpy as np x = np.array([[1,2,3], [4,5,6], [7,8,9], [10, 11, 12]]) v = np.array([1, 0, 1]) y = x + v # 直接加！！！ print y # Prints "[[ 2 2 4] # [ 5 5 7] # [ 8 8 10] # [11 11 13]]" ~~~ 更多Broadcasting的例子請看下面： ~~~ import numpy as np v = np.array([1,2,3]) # v has shape (3,) w = np.array([4,5]) # w has shape (2,) # 首先把v變成一個列向量 # v現在的形狀是(3, 1); # 作用在w上得到的結果形狀是(3, 2)，如下 # [[ 4 5] # [ 8 10] # [12 15]] print np.reshape(v, (3, 1)) * w # 逐行相加 x = np.array([[1,2,3], [4,5,6]]) # 得到如下結果: # [[2 4 6] # [5 7 9]] print x + v # 先逐行相加再轉置，得到以下結果: # [[ 5 6 7] # [ 9 10 11]] print (x.T + w).T # 恩，也可以這么做 print x + np.reshape(w, (2, 1)) ~~~ ### 2.3 SciPy Numpy提供了一個非常方便操作和計算的高維向量對象，并提供基本的操作方法，而Scipy是在Numpy的基礎上，提供很多很多的函數和方法去直接完成你需要的矩陣操作。有興趣可以瀏覽[Scipy方法索引](http://docs.scipy.org/doc/scipy/reference/index.html)查看具體的方法，函數略多，要都記下來有點困難，隨用隨查吧。 #### 向量距離計算需要特別拎出來說一下的是，向量之間的距離計算，這個Scipy提供了很好的接口[scipy.spatial.distance.pdist](http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist)： ~~~ import numpy as np from scipy.spatial.distance import pdist, squareform # [[0 1] # [1 0] # [2 0]] x = np.array([[0, 1], [1, 0], [2, 0]]) print x # 計算矩陣每一行和每一行之間的歐氏距離 # d[i, j] 是 x[i, :] 和 x[j, :] 之間的距離, # 結果如下： # [[ 0. 1.41421356 2.23606798] # [ 1.41421356 0. 1. ] # [ 2.23606798 1. 0. ]] d = squareform(pdist(x, 'euclidean')) print d ~~~ ### 2.4 Matplotlib 這是python中的一個作圖工具包。如果你熟悉matlab的語法的話，應該會用得挺順手。可以通過[matplotlib.pyplot.plot](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot)了解更多繪圖相關的設置和參數。 ~~~ import numpy as np import matplotlib.pyplot as plt # 計算x和對應的sin值作為y x = np.arange(0, 3 * np.pi, 0.1) y = np.sin(x) # 用matplotlib繪出點的變化曲線 plt.plot(x, y) plt.show() # 只有調用plt.show()之后才能顯示 ~~~ 結果如下： ![sin圖像](https://box.kancloud.cn/2016-03-16_56e90acedf8dc.png "") ~~~ # 在一個圖中畫出2條曲線 import numpy as np import matplotlib.pyplot as plt # 計算x對應的sin和cos值 x = np.arange(0, 3 * np.pi, 0.1) y_sin = np.sin(x) y_cos = np.cos(x) # 用matplotlib作圖 plt.plot(x, y_sin) plt.plot(x, y_cos) plt.xlabel('x axis label') plt.ylabel('y axis label') plt.title('Sine and Cosine') plt.legend(['Sine', 'Cosine']) plt.show() ~~~ ![sin和cos](https://box.kancloud.cn/2016-03-16_56e90acef2b5e.png "") ~~~ # 用subplot分到子圖里 import numpy as np import matplotlib.pyplot as plt # 得到x對應的sin和cos值 x = np.arange(0, 3 * np.pi, 0.1) y_sin = np.sin(x) y_cos = np.cos(x) # 2*1個子圖，第一個位置. plt.subplot(2, 1, 1) # 畫第一個子圖 plt.plot(x, y_sin) plt.title('Sine') # 畫第2個子圖 plt.subplot(2, 1, 2) plt.plot(x, y_cos) plt.title('Cosine') plt.show() ~~~ ![subplot](https://box.kancloud.cn/2016-03-16_56e90acf11c87.png "") #### 2.5 簡單圖片讀寫可以使用`imshow`來顯示圖片。 ~~~ import numpy as np from scipy.misc import imread, imresize import matplotlib.pyplot as plt img = imread('/Users/HanXiaoyang/Comuter_vision/computer_vision.jpg') img_tinted = img * [1, 0.95, 0.9] # 顯示原始圖片 plt.subplot(1, 2, 1) plt.imshow(img) # 顯示調色后的圖片 plt.subplot(1, 2, 2) plt.imshow(np.uint8(img_tinted)) plt.show() ~~~ ![computer_vision](https://box.kancloud.cn/2016-03-16_56e90acf25547.png "")