大話題小函數(2) · 零基礎學Python（第一版）

上一講和本講的標題是“大話題小函數”，所謂大話題，就是這些函數如果溯源，都會找到聽起來更高大上的東西。這種思維方式絕對我堅定地繼承了中華民族的優良傳統的。自從天朝的臣民看到英國人開始踢足球，一直到現在所謂某國勃起了，都一直在試圖論證足球起源于該朝的前前前朝的某國時代，并且還搬出了那時候的一個叫做高俅的球星來論證，當然了，勃起的某國是擋不住該國家隊在世界杯征程上的陽痿，只能用高俅來意淫一番了。這種思維方式，我是堅定地繼承，因為在我成長過程中，它一直被奉為優良傳統。阿Q本來是姓趙的，和趙老爺是本家，比秀才要長三輩，雖然被趙老爺打了嘴。廢話少說，書接前文，已經研究了map，下面來看reduce。忍不住還得來點廢話。不知道看官是不是聽說過MapReduc，如果沒有，那么Hadoop呢？如果還沒有，就google一下。下面是我從[維基百科](http://zh.wikipedia.org/wiki/MapReduce)上抄下來的，共賞之。 > MapReduce是Google提出的一個軟件架構，用于大規模數據集（大于1TB）的并行運算。概念“Map（映射）”和“Reduce（化簡）”，及他們的主要思想，都是從函數式編程語言借來的，還有從矢量編程語言借來的特性。不用管是不是看懂，總之又可以用開頭的思想意淫一下了，原來今天要鼓搗的這個reduce還跟大數據有關呀。不管怎么樣，你有夢一般的感覺就行。 ## reduce 回到現實，清醒一下，繼續敲代碼： ~~~ >>> reduce(lambda x,y: x+y,[1,2,3,4,5]) 15 ~~~ 請看官仔細觀察，是否能夠看出是如何運算的呢？畫一個圖： [![](https://box.kancloud.cn/2015-07-07_559b39bc304d1.png)](https://github.com/qiwsir/ITArticles/blob/master/Pictures/21001.png) 還記得map是怎么運算的嗎？忘了？看代碼： ~~~ >>> list1 = [1,2,3,4,5,6,7,8,9] >>> list2 = [9,8,7,6,5,4,3,2,1] >>> map(lambda x,y: x+y, list1,list2) [10, 10, 10, 10, 10, 10, 10, 10, 10] ~~~ 看官對比一下，就知道兩個的區別了。原來map是上下運算，reduce是橫著逐個元素進行運算。權威的解釋來自官網： > reduce(function, iterable[, initializer]) > > Apply function of two arguments cumulatively to the items of iterable, from left to right, so as to reduce the iterable to a single value. For example, reduce(lambda x, y: x+y, [1, 2, 3, 4, 5]) calculates ((((1+2)+3)+4)+5). The left argument, x, is the accumulated value and the right argument, y, is the update value from the iterable. If the optional initializer is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If initializer is not given and iterable contains only one item, the first item is returned. Roughly equivalent to: ~~~ def reduce(function, iterable, initializer=None): it = iter(iterable) if initializer is None: try: initializer = next(it) except StopIteration: raise TypeError('reduce() of empty sequence with no initial value') accum_value = initializer for x in it: accum_value = function(accum_value, x) return accum_value ~~~ 如果用我們熟悉的for循環來做上面reduce的事情，可以這樣來做： ~~~ >>> lst = range(1,6) >>> lst [1, 2, 3, 4, 5] >>> r = 0 >>> for i in range(len(lst)): ... r += lst[i] ... >>> r 15 ~~~ for普世的，reduce是簡潔的。為了鍛煉思維，看這么一個問題，有兩個list，a = [3,9,8,5,2],b=[1,4,9,2,6],計算：a[0]_b[0]+a[1]_b[1]+...的結果。 ~~~ >>> a [3, 9, 8, 5, 2] >>> b [1, 4, 9, 2, 6] >>> zip(a,b) #復習一下zip，下面的方法中要用到 [(3, 1), (9, 4), (8, 9), (5, 2), (2, 6)] >>> sum(x*y for x,y in zip(a,b)) #解析后直接求和 133 >>> new_list = [x*y for x,y in zip(a,b)] #可以看做是上面方法的分布實施 >>> #這樣解析也可以：new_tuple = (x*y for x,y in zip(a,b)) >>> new_list [3, 36, 72, 10, 12] >>> sum(new_list) #或者:sum(new_tuple) 133 >>> reduce(lambda sum,(x,y): sum+x*y,zip(a,b),0) #這個方法是在耍酷呢嗎？ 133 >>> from operator import add,mul #耍酷的方法也不止一個 >>> reduce(add,map(mul,a,b)) 133 >>> reduce(lambda x,y: x+y, map(lambda x,y: x*y, a,b)) #map,reduce,lambda都齊全了，更酷嗎？ 133 ~~~ ## filter filter的中文含義是“過濾器”，在python中，它就是起到了過濾器的作用。首先看官方說明： > filter(function, iterable) > > Construct a list from those elements of iterable for which function returns true. iterable may be either a sequence, a container which supports iteration, or an iterator. If iterable is a string or a tuple, the result also has that type; otherwise it is always a list. If function is None, the identity function is assumed, that is, all elements of iterable that are false are removed. > > Note that filter(function, iterable) is equivalent to [item for item in iterable if function(item)] if function is not None and [item for item in iterable if item] if function is None. 這次真的不翻譯了（好像以往也沒有怎么翻譯呀），而且也不解釋要點了。請列位務必自己閱讀上面的文字，并且理解其含義。英語，無論怎么強調都是不過分的，哪怕是做乞丐，說兩句英語，沒準還可以討到英鎊美元呢。通過下面代碼體會： ~~~ >>> numbers = range(-5,5) >>> numbers [-5, -4, -3, -2, -1, 0, 1, 2, 3, 4] >>> filter(lambda x: x>0, numbers) [1, 2, 3, 4] >>> [x for x in numbers if x>0] #與上面那句等效 [1, 2, 3, 4] >>> filter(lambda c: c!='i', 'qiwsir') #能不能對應上面文檔說明那句話呢？ 'qwsr' #“If iterable is a string or a tuple, the result also has that type;” ~~~ 至此，用兩此介紹了幾個小函數，這些函數在對程序的性能提高上，并沒有顯著或者穩定預期，但是，在代碼的簡潔上，是有目共睹的。有時候是可以用來秀一秀，彰顯python的優雅和自己耍酷。