Word Ladder · 數據結構與算法/leetcode/lintcode題解

# Word Ladder ### Source - leetcode: [Word Ladder | LeetCode OJ](https://leetcode.com/problems/word-ladder/) - lintcode: [(120) Word Ladder](http://www.lintcode.com/en/problem/word-ladder/) ### Problem Given two words (*start* and *end*), and a dictionary, find the length ofshortest transformation sequence from *start* to *end*, such that: 1. Only one letter can be changed at a time 1. Each intermediate word must exist in the dictionary #### Example Given:*start* = `"hit"`*end* = `"cog"`*dict* = `["hot","dot","dog","lot","log"]` As one shortest transformation is `"hit" -> "hot" -> "dot" -> "dog" -> "cog"`,return its length `5`. #### Note - Return 0 if there is no such transformation sequence. - All words have the same length. - All words contain only lowercase alphabetic characters. ### 題解咋一看還以為是 Edit Distance 的變體，仔細審題后發現和動態規劃沒啥關系。題中有兩大關鍵點：一次只能改動一個字符；改動的中間結果必須出現在詞典中。那么大概總結下來共有四種情形： 1. start 和 end 相等。 1. end 在 dict 中，且 start 可以轉換為 dict 中的一個單詞。 1. end 不在 dict 中，但可由 start 或者 dict 中的一個單詞轉化而來。 1. end 無法由 start 轉化而來。由于中間結果也必須出現在詞典中，故此題相當于圖搜索問題，將 start, end, dict 中的單詞看做圖中的節點，節點與節點（單詞與單詞）可通過一步轉化得到，可以轉換得到的節點相當于邊的兩個節點，邊的權重為1（都是通過1步轉化）。到這里問題就比較明確了，相當于搜索從 start 到 end 兩點間的最短距離，即 Dijkstra 最短路徑算法。**通過 [BFS](# "Breadth-First Search, 廣度優先搜索") 和哈希表實現。** 首先將 start 入隊，隨后彈出該節點，比較其和 end 是否相同；再從 dict 中選出所有距離為1的單詞入隊，并將所有與當前節點距離為1且未訪問過的節點（需要使用哈希表）入隊，方便下一層遍歷時使用，直至隊列為空。 ### Java ~~~ public class Solution { /** * @param start, a string * @param end, a string * @param dict, a set of string * @return an integer */ public int ladderLength(String start, String end, Set<String> dict) { if (start == null && end == null) return 0; if (start.length() == 0 && end.length() == 0) return 0; assert(start.length() == end.length()); if (dict == null || dict.size() == 0) { return 0; } int ladderLen = 1; dict.add(end); // add end to dict, important! Queue<String> q = new LinkedList<String>(); Set<String> hash = new HashSet<String>(); q.offer(start); hash.add(start); while (!q.isEmpty()) { ladderLen++; int qLen = q.size(); for (int i = 0; i < qLen; i++) { String strTemp = q.poll(); for (String nextWord : getNextWords(strTemp, dict)) { if (nextWord.equals(end)) return ladderLen; // filter visited word in the dict if (hash.contains(nextWord)) continue; q.offer(nextWord); hash.add(nextWord); } } } return 0; } private Set<String> getNextWords(String curr, Set<String> dict) { Set<String> nextWords = new HashSet<String>(); for (int i = 0; i < curr.length(); i++) { char[] chars = curr.toCharArray(); for (char c = 'a'; c <= 'z'; c++) { chars[i] = c; String temp = new String(chars); if (dict.contains(temp)) { nextWords.add(temp); } } } return nextWords; } } ~~~ ### 源碼分析 #### `getNextWords`的實現首先分析給定單詞`curr`并從 dict 中選出所有距離為1 的單詞。常規的思路可能是將`curr`與 dict 中的單詞逐個比較，并遍歷每個字符串，返回距離為1的單詞組。這種找距離為1的節點的方法復雜度為 l(length?of?word)×n(size?of?dict)×m(queue?length)=O(lmn)l(length\ of\ word) \times n(size\ of\ dict)\times m(queue\ length) = O(lmn)l(length?of?word)×n(size?of?dict)×m(queue?length)=O(lmn). 在 dict 較長時會 [TLE](# "Time Limit Exceeded 的簡稱。你的程序在 OJ 上的運行時間太長了，超過了對應題目的時間限制。"). 其實根據 dict 的數據結構特點，比如查找任一元素的時間復雜度可認為是 O(1)O(1)O(1). 根據哈希表和單個單詞長度通常不會太長這一特點，我們就可以根據給定單詞構造到其距離為一的單詞變體，然后查詢其是否在 dict 中，這種實現的時間復雜度為 O(26(a?to?z)×l×m)=O(lm)O(26(a\ to\ z) \times l \times m) = O(lm)O(26(a?to?z)×l×m)=O(lm), 與 dict 長度沒有太大關系，大大優化了時間復雜度。經驗教訓：根據給定數據結構特征選用合適的實現，遇到哈希表時多用其查找的 O(1)O(1)O(1) 特性。 #### [BFS](# "Breadth-First Search, 廣度優先搜索") 和哈希表的配合使用 [BFS](# "Breadth-First Search, 廣度優先搜索") 用作搜索，哈希表用于記錄已經訪問節點。在可以改變輸入數據的前提下，需要將 end 加入 dict 中，否則對于不在 dict 中出現的 end 會有問題。 ### 復雜度分析主要在于`getNextWords`方法的時間復雜度，時間復雜度 O(lmn)O(lmn)O(lmn)。使用了隊列存儲中間處理節點，空間復雜度平均條件下應該是常量級別，當然最壞條件下可能惡化為 O(n)O(n)O(n), 即 dict 中某個點與其他點距離均為1. ### Reference - [Word Ladder 參考程序 Java/C++/Python](http://www.jiuzhang.com/solutions/word-ladder/) - [Java Solution using Dijkstra's algorithm, with explanation - Leetcode Discuss](https://leetcode.com/discuss/50930/java-solution-using-dijkstras-algorithm-with-explanation)