序列 · Clojure 入門教程

## 序列序列可以看成是集合的一個邏輯視圖。許多事物可以看成是序列。包括Java的集合，Clojure提供的集合，字符串，流，目錄結構以及XML樹。很多Clojure的函數返回一個lazy序列(LazySeq), 這種序列里面的元素不是實際的數據，而是一些方法，它們直到用戶真正需要數據的時候才會被調用。LazySeq的一個好處是在你創建這個序列的時候你不用太擔心這個序列到底會有多少元素。下面是會返回lazySeq的一些函數: `cache-seq` , `concat` , `cycle` , `distinct` , `drop` , `drop-last` , `drop-while` , `filter` , `for` , `interleave` , `interpose` , `iterate` , `lazy-cat` , `lazy-seq` , `line-seq` , `map` , `partition` , `range` , `re-seq` , `remove` , `repeat` , `replicate` , `take` , `take-nth` , `take-while` and `tree-seq` 。 LazySeq是剛接觸Clojure的人比較容易弄不清楚的一個東西。比如你們覺得下面這個代碼的輸出是什么？ ``` (map #(println %) [1 2 3]) ``` 當在一個REPL里面運行的時候，它會輸出 1, 2 和 3 在單獨的行上面，以及三個nil(三個println的返回結果)。REPL總是立即解析/調用我們所輸入的所有的表達式。但是當作為一個腳本來運行的時候，這句代碼不會輸出任何東西。因為 `map` 函數返回的是一個LazySeq。有很多方法可以強制LazySeq對它里面的方法進行調用。比如從序列里面獲取一個元素的方法 `first` , `second` , `nth` 以及 `last` 都能達到這個效果。序列里面的方法是按順序調用的，所以你如果要獲取最后一個元素，那么整個LazySeq里面的方法都會被調用。如果LazySeq的頭被存在一個binding里面，那么一旦一個元素的方法被調用了，那么這個元素的值會被緩存起來，下次我們再來獲取這個元素的時候就不用再調用函數了。 `dorun` 和 `doall` 函數迫使一個LazySeq里面的函數被調用。 `doseq` 宏, 我們在 "迭代" 那一節提到過的, 會迫使一個或者多個LazySeq里面的函數調用。 `for` 宏, 也在是"迭代”那一節提到的，不會強制調用LazySeq里面的方法，相反，他會返回另外一個LazySeq。為了只是簡單的想要迫使LazySeq里面的方法被調用，那么 `doseq` 或者 `dorun` 就夠了。調用的結果不會被保留的，所以占用的內存也就比較少。這兩個方法的返回值都是 `nil` . 如果你想調用的結果被緩存，那么你應該使用 `doall` . 下面的表格列出來了強制LazySeq里面的方法被調用的幾個辦法。 | 結果要緩存 | 只要求方法被執行，不需要緩存 | | --- | :-- | | 操作單個序列 | `doall` | `dorun` | | 利用list comprehension語法來操作多個序列 | N/A | `doseq` | 一般來說我們比較推薦使用 `doseq` 而不是 `dorun` 函數，因為這樣代碼更加易懂。同時代碼效率也更高，因為dorun內部使用map又創建了另外一個序列。比如下面的兩會的結果是一樣的。 ``` (dorun (map #(println %) [1 2 3])) (doseq [i [1 2 3]] (println i)) ``` 如果一個方法會返回一個LazySeq并且在它的方法被調用的時候還會有副作用，那么大多數情況下我們應該使用 `doall` 來調用并且返回它的結果。這使得副作用的出現時間更容易確定。否則的話別的調用者可能會調用這個LazySeq多次，那么副作用也就會出現多次 -- 從而可能出現錯誤的結果。下面的幾個表達式都會在不同的行輸出1, 2, 3, 但是它們的返回值是不一樣的。 `do` special form 是用來實現一個匿名函數，這個函數先打印這個值，然后再把這個值返回。 ``` (doseq [item [1 2 3]] (println item)) ; -> nil (dorun (map #(println %) [1 2 3])) ; -> nil (doall (map #(do (println %) %) [1 2 3])) ; -> (1 2 3) ``` LazySeq使得創建無限序列成為可能。因為只有需要使用的數據才會在用到的時候被調用創建。比如 ``` (defn f "square the argument and divide by 2" [x] (println "calculating f of" x) (/ (* x x) 2.0)) ; Create an infinite sequence of results from the function f ; for the values 0 through infinity. ; Note that the head of this sequence is being held in the binding "f-seq". ; This will cause the values of all evaluated items to be cached. (def f-seq (map f (iterate inc 0))) ; Force evaluation of the first item in the infinite sequence, (f 0). (println "first is" (first f-seq)) ; -> 0.0 ; Force evaluation of the first three items in the infinite sequence. ; Since the (f 0) has already been evaluated, ; only (f 1) and (f 2) will be evaluated. (doall (take 3 f-seq)) (println (nth f-seq 2)) ; uses cached result -> 2.0 ``` 下面的代碼和上面的代碼不一樣的地方是，在下面的代碼里面LazySeq的頭沒有被保持在一個binding里面，所以被調用過的方法的返回值不會被緩存。所以它所需要的內存比較少，但是如果同一個元素被請求多次，那么它的效率會低一點。 ``` (defn f-seq [] (map f (iterate inc 0))) (println (first (f-seq))) ; evaluates (f 0), but doesn't cache result (println (nth (f-seq) 2)) ; evaluates (f 0), (f 1) and (f 2) ``` 另外一種避免保持LazySeq的頭的辦法是把這個LazySeq直接傳給函數： ``` (defn consumer [seq] ; Since seq is a local binding, the evaluated items in it ; are cached while in this function and then garbage collected. (println (first seq)) ; evaluates (f 0) (println (nth seq 2))) ; evaluates (f 1) and (f 2) (consumer (map f (iterate inc 0))) ```