0820. 單詞的壓縮編碼 · ttttt

# 0820. 單詞的壓縮編碼 ## 題目地址（820. 單詞的壓縮編碼） <https://leetcode-cn.com/problems/short-encoding-of-words/> ## 題目描述 ``` <pre class="calibre18">``` 給定一個單詞列表，我們將這個列表編碼成一個索引字符串 S 與一個索引列表 A。例如，如果這個列表是 ["time", "me", "bell"]，我們就可以將其表示為 S = "time#bell#" 和 indexes = [0, 2, 5]。對于每一個索引，我們可以通過從字符串 S 中索引的位置開始讀取字符串，直到 "#" 結束，來恢復我們之前的單詞列表。那么成功對給定單詞列表進行編碼的最小字符串長度是多少呢？示例：輸入: words = ["time", "me", "bell"] 輸出: 10 說明: S = "time#bell#" ， indexes = [0, 2, 5] 。提示： 1 <= words.length <= 2000 1 <= words[i].length <= 7 每個單詞都是小寫字母。 ``` ``` ## 前置知識 - 前綴樹 ## 公司 - 阿里 - 字節 ## 思路讀完題目之后就發現如果將列表中每一個單詞分別倒序就是一個后綴樹問題。比如 `["time", "me", "bell"]` 倒序之后就是 \["emit", "em", "lleb"\]，我們要求的結果無非就是 "emit" 的長度 + "llem"的長度 + "##"的長度（em 和 emit 有公共前綴，計算一個就好了）。因此符合直覺的想法是使用前綴樹 + 倒序插入的形式來模擬后綴樹。下面的代碼看起來復雜，但是很多題目我都是用這個模板，稍微調整下細節就能 AC。我這里總結了一套[前綴樹專題](https://github.com/azl397985856/leetcode/blob/master/thinkings/trie.md) ![](https://img.kancloud.cn/45/42/4542ce4997102301000b97cd786562cb_850x252.jpg) 前綴樹的 api 主要有以下幾個： - `insert(word)`: 插入一個單詞 - `search(word)`：查找一個單詞是否存在 - `startWith(word)`：查找是否存在以 word 為前綴的單詞其中 startWith 是前綴樹最核心的用法，其名稱前綴樹就從這里而來。大家可以先拿 208 題開始，熟悉一下前綴樹，然后再嘗試別的題目。一個前綴樹大概是這個樣子： ![](https://img.kancloud.cn/08/e0/08e038e4c5e4db4840ad43b89c7ddde5_827x602.jpg) 如圖每一個節點存儲一個字符，然后外加一個控制信息表示是否是單詞結尾，實際使用過程可能會有細微差別，不過變化不大。這道題需要考慮 edge case，比如這個列表是 \["time", "time", "me", "bell"\] 這種包含重復元素的情況，這里我使用 hashset 來去重。 ## 關鍵點 - 前綴樹 - 去重 ## 代碼 ``` <pre class="calibre18">``` class Trie: def __init__(self): """ Initialize your data structure here. """ self.Trie = {} def insert(self, word): """ Inserts a word into the trie. :type word: str :rtype: void """ curr = self.Trie for w in word: if w not in curr: curr[w] = {} curr = curr[w] curr['#'] = 1 def search(self, word): """ Returns if the word is in the trie. :type word: str :rtype: bool """ curr = self.Trie for w in word: curr = curr[w] # len(curr) == 1 means we meet '#' # when we search 'em'(which reversed from 'me') # the result is len(curr) > 1 # cause the curr look like { '#': 1, i: {...}} return len(curr) == 1 class Solution: def minimumLengthEncoding(self, words: List[str]) -> int: trie = Trie() cnt = 0 words = set(words) for word in words: trie.insert(word[::-1]) for word in words: if trie.search(word[::-1]): cnt += len(word) + 1 return cnt ``` ``` ***復雜度分析*** - 時間復雜度：O(N)O(N)O(N)，其中 N 為單詞長度列表中的總字符數，比如\["time", "me"\]，就是 4 + 2 = 6。 - 空間復雜度：O(N)O(N)O(N)，其中 N 為單詞長度列表中的總字符數，比如\["time", "me"\]，就是 4 + 2 = 6。大家也可以關注我的公眾號《力扣加加》獲取更多更新鮮的 LeetCode 題解 ![](https://img.kancloud.cn/a3/63/a363818092b0356fbd67882f0389528b_900x500.jpg) ## 相關題目 - [0208.implement-trie-prefix-tree](208.implement-trie-prefix-tree.html) - [0211.add-and-search-word-data-structure-design](211.add-and-search-word-data-structure-design.html) - [0212.word-search-ii](212.word-search-ii.html) - [0472.concatenated-words](472.concatenated-words.html) - [0820.short-encoding-of-words](https://github.com/azl397985856/leetcode/blob/master/problems/820.short-encoding-of-words.md) - [1032.stream-of-characters](https://github.com/azl397985856/leetcode/blob/master/problems/1032.stream-of-characters.md)