Distinct Subsequences · 數據結構與算法/leetcode/lintcode題解

# Distinct Subsequences ### Source - leetcode: [Distinct Subsequences | LeetCode OJ](https://leetcode.com/problems/distinct-subsequences/) - lintcode: [(118) Distinct Subsequences](http://www.lintcode.com/en/problem/distinct-subsequences/) ~~~ Given a string S and a string T, count the number of distinct subsequences of T in S. A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not). Example Given S = "rabbbit", T = "rabbit", return 3. Challenge Do it in O(n2) time and O(n) memory. O(n2) memory is also acceptable if you do not know how to optimize memory. ~~~ ### 題解1 首先分清 subsequence 和 substring 兩者的區別，subsequence 可以是不連續的子串。題意要求 S 中子序列 T 的個數。如果不考慮程序實現，我們能想到的辦法是逐個比較 S 和 T 的首字符，相等的字符刪掉，不等時則刪除 S 中的首字符，繼續比較后續字符直至 T 中字符串被刪完。這種簡單的思路有這么幾個問題，題目問的是子序列的個數，而不是是否存在，故在字符不等時不能輕易刪除掉 S 中的字符。那么如何才能得知子序列的個數呢？要想得知不同子序列的個數，那么我們就不能在 S 和 T 中首字符不等時簡單移除 S 中的首字符了，取而代之的方法應該是先將 S 復制一份，再用移除 S 中首字符后的新字符串和 T 進行比較，這點和深搜中的剪枝函數的處理有點類似。 ### Python ~~~ class Solution: # @param S, T: Two string. # @return: Count the number of distinct subsequences def numDistinct(self, S, T): if S is None or T is None: return 0 if len(S) < len(T): return 0 if len(T) == 0: return 1 num = 0 for i, Si in enumerate(S): if Si == T[0]: num += self.numDistinct(S[i + 1:], T[1:]) return num ~~~ ### C++ ~~~ class Solution { public: /** * @param S, T: Two string. * @return: Count the number of distinct subsequences */ int numDistinct(string &S, string &T) { if (S.size() < T.size()) return 0; if (T.empty()) return 1; int num = 0; for (int i = 0; i < S.size(); ++i) { if (S[i] == T[0]) { string Si = S.substr(i + 1); string t = T.substr(1); num += numDistinct(Si, t); } } return num; } }; ~~~ ### Java ~~~ public class Solution { /** * @param S, T: Two string. * @return: Count the number of distinct subsequences */ public int numDistinct(String S, String T) { if (S == null || T == null) return 0; if (S.length() < T.length()) return 0; if (T.length() == 0) return 1; int num = 0; for (int i = 0; i < S.length(); i++) { if (S.charAt(i) == T.charAt(0)) { // T.length() >= 1, T.substring(1) will not throw index error num += numDistinct(S.substring(i + 1), T.substring(1)); } } return num; } } ~~~ ### 源碼分析 1. 對 null 異常處理(C++ 中對 string 賦NULL 是錯的，函數內部無法 handle 這種情況) 1. S 字符串長度若小于 T 字符串長度，T 必然不是 S 的子序列，返回0 1. T 字符串長度為0，證明 T 是 S 的子序列，返回1 由于進入 for 循環的前提是 `T.length() >= 1`, 故當 T 的長度為1時，Java 中對 T 取子串`T.substring(1)`時產生的是空串`""`而并不拋出索引越界的異常。 ### 復雜度分析最好情況下，S 中沒有和 T 相同的字符，時間復雜度為 O(n)O(n)O(n); 最壞情況下，S 中的字符和 T 中字符完全相同，此時可以畫出遞歸調用棧，發現和深搜非常類似，數學關系為 f(n)=∑i=1n?1f(i)f(n) = \sum _{i = 1} ^{n - 1} f(i)f(n)=∑i=1n?1f(i), 這比 Fibonacci 的復雜度還要高很多。 ### 題解2 - Dynamic Programming 從題解1 的復雜度分析中我們能發現由于存在較多的重疊子狀態(相同子串被比較多次), 因此可以想到使用動態規劃優化。但是動規的三大要素如何建立？由于本題為兩個字符串之間的關系，故可以嘗試使用雙序列([DP_Two_Sequence](# "一般有兩個數組或者兩個字符串，計算其匹配關系. 通常可用 `f[i][j]`表示第一個數組的前 i 位和第二個數組的前 j 位的關系。"))動規的思路求解。定義`f[i][j]`為 S[0:i] 中子序列為 T[0:j] 的個數，接下來尋找狀態轉移關系，狀態轉移應從 f[i-1][j], f[i-1][j-1], f[i][j-1] 中尋找，接著尋找突破口——S[i] 和 T[j] 的關系。 1. `S[i] == T[j]`: 兩個字符串的最后一個字符相等，我們可以選擇 S[i] 和 T[j] 配對，那么此時有 f[i][j] = f[i-1][j-1]; 若不使 S[i] 和 T[j] 配對，而是選擇 S[0:i-1] 中的某個字符和 T[j] 配對，那么 f[i][j] = f[i-1][j]. 綜合以上兩種選擇，可得知在`S[i] == T[j]`時有 f[i][j] = f[i-1][j-1] + f[i-1][j] 1. `S[i] != T[j]`: 最后一個字符不等時，S[i] 不可能和 T[j] 配對，故 f[i][j] = f[i-1][j] 為便于處理第一個字符相等的狀態(便于累加)，初始化f[i][0]為1, 其余為0. 這里對于 S 或 T 為空串時返回0，返回1 也能說得過去。 ### Python ~~~ class Solution: # @param S, T: Two string. # @return: Count the number of distinct subsequences def numDistinct(self, S, T): if S is None or T is None: return 0 if len(S) < len(T): return 0 if len(T) == 0: return 1 f = [[0 for i in xrange(len(T) + 1)] for j in xrange(len(S) + 1)] for i, Si in enumerate(S): f[i][0] = 1 for j, Tj in enumerate(T): if Si == Tj: f[i + 1][j + 1] = f[i][j + 1] + f[i][j] else: f[i + 1][j + 1] = f[i][j + 1] return f[len(S)][len(T)] ~~~ ### C++ ~~~ class Solution { public: /** * @param S, T: Two string. * @return: Count the number of distinct subsequences */ int numDistinct(string &S, string &T) { if (S.size() < T.size()) return 0; if (T.empty()) return 1; vector<vector<int> > f(S.size() + 1, vector<int>(T.size() + 1, 0)); for (int i = 0; i < S.size(); ++i) { f[i][0] = 1; for (int j = 0; j < T.size(); ++j) { if (S[i] == T[j]) { f[i + 1][j + 1] = f[i][j + 1] + f[i][j]; } else { f[i + 1][j + 1] = f[i][j + 1]; } } } return f[S.size()][T.size()]; } }; ~~~ ### Java ~~~ public class Solution { /** * @param S, T: Two string. * @return: Count the number of distinct subsequences */ public int numDistinct(String S, String T) { if (S == null || T == null) return 0; if (S.length() < T.length()) return 0; if (T.length() == 0) return 1; int[][] f = new int[S.length() + 1][T.length() + 1]; for (int i = 0; i < S.length(); i++) { f[i][0] = 1; for (int j = 0; j < T.length(); j++) { if (S.charAt(i) == T.charAt(j)) { f[i + 1][j + 1] = f[i][j + 1] + f[i][j]; } else { f[i + 1][j + 1] = f[i][j + 1]; } } } return f[S.length()][T.length()]; } } ~~~ ### 源碼分析異常處理部分和題解1 相同，初始化時維度均多一個元素便于處理。 ### 復雜度分析由于免去了重疊子狀態的計算，雙重 for 循環，時間復雜度為 O(n2)O(n^2)O(n2), 使用了二維矩陣保存狀態，空間復雜度為 O(n2)O(n^2)O(n2). 空間復雜度可以通過滾動數組的方式優化，詳見 [Dynamic Programming - 動態規劃](http://algorithm.yuanbin.me/zh-cn/dynamic_programming/index.html). 空間復雜度優化之后的代碼如下： #### Java ~~~ public class Solution { /** * @param S, T: Two string. * @return: Count the number of distinct subsequences */ public int numDistinct(String S, String T) { if (S == null || T == null) return 0; if (S.length() < T.length()) return 0; if (T.length() == 0) return 1; int[] f = new int[T.length() + 1]; f[0] = 1; for (int i = 0; i < S.length(); i++) { for (int j = T.length() - 1; j >= 0; j--) { if (S.charAt(i) == T.charAt(j)) { f[j + 1] += f[j]; } } } return f[T.length()]; } } ~~~ ### Reference - [LeetCode: Distinct Subsequences（不同子序列的個數） - 亦忘卻_亦紀念](http://blog.csdn.net/abcbc/article/details/8978146) - soulmachine leetcode-cpp 中 Distinct Subsequences 部分 - [Distinct Subsequences | Training dragons the hard way](http://traceformula.blogspot.com/2015/08/distinct-subsequences.html)