# Distinct Subsequences
### Source
- leetcode: [Distinct Subsequences | LeetCode OJ](https://leetcode.com/problems/distinct-subsequences/)
- lintcode: [(118) Distinct Subsequences](http://www.lintcode.com/en/problem/distinct-subsequences/)
~~~
Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string
which is formed from the original string by deleting some (can be none) of the characters
without disturbing the relative positions of the remaining characters.
(ie, "ACE" is a subsequence of "ABCDE" while "AEC" is not).
Example
Given S = "rabbbit", T = "rabbit", return 3.
Challenge
Do it in O(n2) time and O(n) memory.
O(n2) memory is also acceptable if you do not know how to optimize memory.
~~~
### 題解1
首先分清 subsequence 和 substring 兩者的區別,subsequence 可以是不連續的子串。題意要求 S 中子序列 T 的個數。如果不考慮程序實現,我們能想到的辦法是逐個比較 S 和 T 的首字符,相等的字符刪掉,不等時則刪除 S 中的首字符,繼續比較后續字符直至 T 中字符串被刪完。這種簡單的思路有這么幾個問題,題目問的是子序列的個數,而不是是否存在,故在字符不等時不能輕易刪除掉 S 中的字符。那么如何才能得知子序列的個數呢?
要想得知不同子序列的個數,那么我們就不能在 S 和 T 中首字符不等時簡單移除 S 中的首字符了,取而代之的方法應該是先將 S 復制一份,再用移除 S 中首字符后的新字符串和 T 進行比較,這點和深搜中的剪枝函數的處理有點類似。
### Python
~~~
class Solution:
# @param S, T: Two string.
# @return: Count the number of distinct subsequences
def numDistinct(self, S, T):
if S is None or T is None:
return 0
if len(S) < len(T):
return 0
if len(T) == 0:
return 1
num = 0
for i, Si in enumerate(S):
if Si == T[0]:
num += self.numDistinct(S[i + 1:], T[1:])
return num
~~~
### C++
~~~
class Solution {
public:
/**
* @param S, T: Two string.
* @return: Count the number of distinct subsequences
*/
int numDistinct(string &S, string &T) {
if (S.size() < T.size()) return 0;
if (T.empty()) return 1;
int num = 0;
for (int i = 0; i < S.size(); ++i) {
if (S[i] == T[0]) {
string Si = S.substr(i + 1);
string t = T.substr(1);
num += numDistinct(Si, t);
}
}
return num;
}
};
~~~
### Java
~~~
public class Solution {
/**
* @param S, T: Two string.
* @return: Count the number of distinct subsequences
*/
public int numDistinct(String S, String T) {
if (S == null || T == null) return 0;
if (S.length() < T.length()) return 0;
if (T.length() == 0) return 1;
int num = 0;
for (int i = 0; i < S.length(); i++) {
if (S.charAt(i) == T.charAt(0)) {
// T.length() >= 1, T.substring(1) will not throw index error
num += numDistinct(S.substring(i + 1), T.substring(1));
}
}
return num;
}
}
~~~
### 源碼分析
1. 對 null 異常處理(C++ 中對 string 賦NULL 是錯的,函數內部無法 handle 這種情況)
1. S 字符串長度若小于 T 字符串長度,T 必然不是 S 的子序列,返回0
1. T 字符串長度為0,證明 T 是 S 的子序列,返回1
由于進入 for 循環的前提是 `T.length() >= 1`, 故當 T 的長度為1時,Java 中對 T 取子串`T.substring(1)`時產生的是空串`""`而并不拋出索引越界的異常。
### 復雜度分析
最好情況下,S 中沒有和 T 相同的字符,時間復雜度為 O(n)O(n)O(n); 最壞情況下,S 中的字符和 T 中字符完全相同,此時可以畫出遞歸調用棧,發現和深搜非常類似,數學關系為 f(n)=∑i=1n?1f(i)f(n) = \sum _{i = 1} ^{n - 1} f(i)f(n)=∑i=1n?1f(i), 這比 Fibonacci 的復雜度還要高很多。
### 題解2 - Dynamic Programming
從題解1 的復雜度分析中我們能發現由于存在較多的重疊子狀態(相同子串被比較多次), 因此可以想到使用動態規劃優化。但是動規的三大要素如何建立?由于本題為兩個字符串之間的關系,故可以嘗試使用雙序列([DP_Two_Sequence](# "一般有兩個數組或者兩個字符串,計算其匹配關系. 通常可用 `f[i][j]`表示第一個數組的前 i 位和第二個數組的前 j 位的關系。"))動規的思路求解。
定義`f[i][j]`為 S[0:i] 中子序列為 T[0:j] 的個數,接下來尋找狀態轉移關系,狀態轉移應從 f[i-1][j], f[i-1][j-1], f[i][j-1] 中尋找,接著尋找突破口——S[i] 和 T[j] 的關系。
1. `S[i] == T[j]`: 兩個字符串的最后一個字符相等,我們可以選擇 S[i] 和 T[j] 配對,那么此時有 f[i][j] = f[i-1][j-1]; 若不使 S[i] 和 T[j] 配對,而是選擇 S[0:i-1] 中的某個字符和 T[j] 配對,那么 f[i][j] = f[i-1][j]. 綜合以上兩種選擇,可得知在`S[i] == T[j]`時有 f[i][j] = f[i-1][j-1] + f[i-1][j]
1. `S[i] != T[j]`: 最后一個字符不等時,S[i] 不可能和 T[j] 配對,故 f[i][j] = f[i-1][j]
為便于處理第一個字符相等的狀態(便于累加),初始化f[i][0]為1, 其余為0. 這里對于 S 或 T 為空串時返回0,返回1 也能說得過去。
### Python
~~~
class Solution:
# @param S, T: Two string.
# @return: Count the number of distinct subsequences
def numDistinct(self, S, T):
if S is None or T is None:
return 0
if len(S) < len(T):
return 0
if len(T) == 0:
return 1
f = [[0 for i in xrange(len(T) + 1)] for j in xrange(len(S) + 1)]
for i, Si in enumerate(S):
f[i][0] = 1
for j, Tj in enumerate(T):
if Si == Tj:
f[i + 1][j + 1] = f[i][j + 1] + f[i][j]
else:
f[i + 1][j + 1] = f[i][j + 1]
return f[len(S)][len(T)]
~~~
### C++
~~~
class Solution {
public:
/**
* @param S, T: Two string.
* @return: Count the number of distinct subsequences
*/
int numDistinct(string &S, string &T) {
if (S.size() < T.size()) return 0;
if (T.empty()) return 1;
vector<vector<int> > f(S.size() + 1, vector<int>(T.size() + 1, 0));
for (int i = 0; i < S.size(); ++i) {
f[i][0] = 1;
for (int j = 0; j < T.size(); ++j) {
if (S[i] == T[j]) {
f[i + 1][j + 1] = f[i][j + 1] + f[i][j];
} else {
f[i + 1][j + 1] = f[i][j + 1];
}
}
}
return f[S.size()][T.size()];
}
};
~~~
### Java
~~~
public class Solution {
/**
* @param S, T: Two string.
* @return: Count the number of distinct subsequences
*/
public int numDistinct(String S, String T) {
if (S == null || T == null) return 0;
if (S.length() < T.length()) return 0;
if (T.length() == 0) return 1;
int[][] f = new int[S.length() + 1][T.length() + 1];
for (int i = 0; i < S.length(); i++) {
f[i][0] = 1;
for (int j = 0; j < T.length(); j++) {
if (S.charAt(i) == T.charAt(j)) {
f[i + 1][j + 1] = f[i][j + 1] + f[i][j];
} else {
f[i + 1][j + 1] = f[i][j + 1];
}
}
}
return f[S.length()][T.length()];
}
}
~~~
### 源碼分析
異常處理部分和題解1 相同,初始化時維度均多一個元素便于處理。
### 復雜度分析
由于免去了重疊子狀態的計算,雙重 for 循環,時間復雜度為 O(n2)O(n^2)O(n2), 使用了二維矩陣保存狀態,空間復雜度為 O(n2)O(n^2)O(n2). 空間復雜度可以通過滾動數組的方式優化,詳見 [Dynamic Programming - 動態規劃](http://algorithm.yuanbin.me/zh-cn/dynamic_programming/index.html).
空間復雜度優化之后的代碼如下:
#### Java
~~~
public class Solution {
/**
* @param S, T: Two string.
* @return: Count the number of distinct subsequences
*/
public int numDistinct(String S, String T) {
if (S == null || T == null) return 0;
if (S.length() < T.length()) return 0;
if (T.length() == 0) return 1;
int[] f = new int[T.length() + 1];
f[0] = 1;
for (int i = 0; i < S.length(); i++) {
for (int j = T.length() - 1; j >= 0; j--) {
if (S.charAt(i) == T.charAt(j)) {
f[j + 1] += f[j];
}
}
}
return f[T.length()];
}
}
~~~
### Reference
- [LeetCode: Distinct Subsequences(不同子序列的個數) - 亦忘卻_亦紀念](http://blog.csdn.net/abcbc/article/details/8978146)
- soulmachine leetcode-cpp 中 Distinct Subsequences 部分
- [Distinct Subsequences | Training dragons the hard way](http://traceformula.blogspot.com/2015/08/distinct-subsequences.html)
- Preface
- Part I - Basics
- Basics Data Structure
- String
- Linked List
- Binary Tree
- Huffman Compression
- Queue
- Heap
- Stack
- Set
- Map
- Graph
- Basics Sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Bucket Sort
- Counting Sort
- Radix Sort
- Basics Algorithm
- Divide and Conquer
- Binary Search
- Math
- Greatest Common Divisor
- Prime
- Knapsack
- Probability
- Shuffle
- Basics Misc
- Bit Manipulation
- Part II - Coding
- String
- strStr
- Two Strings Are Anagrams
- Compare Strings
- Anagrams
- Longest Common Substring
- Rotate String
- Reverse Words in a String
- Valid Palindrome
- Longest Palindromic Substring
- Space Replacement
- Wildcard Matching
- Length of Last Word
- Count and Say
- Integer Array
- Remove Element
- Zero Sum Subarray
- Subarray Sum K
- Subarray Sum Closest
- Recover Rotated Sorted Array
- Product of Array Exclude Itself
- Partition Array
- First Missing Positive
- 2 Sum
- 3 Sum
- 3 Sum Closest
- Remove Duplicates from Sorted Array
- Remove Duplicates from Sorted Array II
- Merge Sorted Array
- Merge Sorted Array II
- Median
- Partition Array by Odd and Even
- Kth Largest Element
- Binary Search
- Binary Search
- Search Insert Position
- Search for a Range
- First Bad Version
- Search a 2D Matrix
- Search a 2D Matrix II
- Find Peak Element
- Search in Rotated Sorted Array
- Search in Rotated Sorted Array II
- Find Minimum in Rotated Sorted Array
- Find Minimum in Rotated Sorted Array II
- Median of two Sorted Arrays
- Sqrt x
- Wood Cut
- Math and Bit Manipulation
- Single Number
- Single Number II
- Single Number III
- O1 Check Power of 2
- Convert Integer A to Integer B
- Factorial Trailing Zeroes
- Unique Binary Search Trees
- Update Bits
- Fast Power
- Hash Function
- Count 1 in Binary
- Fibonacci
- A plus B Problem
- Print Numbers by Recursion
- Majority Number
- Majority Number II
- Majority Number III
- Digit Counts
- Ugly Number
- Plus One
- Linked List
- Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II
- Remove Duplicates from Unsorted List
- Partition List
- Two Lists Sum
- Two Lists Sum Advanced
- Remove Nth Node From End of List
- Linked List Cycle
- Linked List Cycle II
- Reverse Linked List
- Reverse Linked List II
- Merge Two Sorted Lists
- Merge k Sorted Lists
- Reorder List
- Copy List with Random Pointer
- Sort List
- Insertion Sort List
- Check if a singly linked list is palindrome
- Delete Node in the Middle of Singly Linked List
- Rotate List
- Swap Nodes in Pairs
- Remove Linked List Elements
- Binary Tree
- Binary Tree Preorder Traversal
- Binary Tree Inorder Traversal
- Binary Tree Postorder Traversal
- Binary Tree Level Order Traversal
- Binary Tree Level Order Traversal II
- Maximum Depth of Binary Tree
- Balanced Binary Tree
- Binary Tree Maximum Path Sum
- Lowest Common Ancestor
- Invert Binary Tree
- Diameter of a Binary Tree
- Construct Binary Tree from Preorder and Inorder Traversal
- Construct Binary Tree from Inorder and Postorder Traversal
- Subtree
- Binary Tree Zigzag Level Order Traversal
- Binary Tree Serialization
- Binary Search Tree
- Insert Node in a Binary Search Tree
- Validate Binary Search Tree
- Search Range in Binary Search Tree
- Convert Sorted Array to Binary Search Tree
- Convert Sorted List to Binary Search Tree
- Binary Search Tree Iterator
- Exhaustive Search
- Subsets
- Unique Subsets
- Permutations
- Unique Permutations
- Next Permutation
- Previous Permuation
- Unique Binary Search Trees II
- Permutation Index
- Permutation Index II
- Permutation Sequence
- Palindrome Partitioning
- Combinations
- Combination Sum
- Combination Sum II
- Minimum Depth of Binary Tree
- Word Search
- Dynamic Programming
- Triangle
- Backpack
- Backpack II
- Minimum Path Sum
- Unique Paths
- Unique Paths II
- Climbing Stairs
- Jump Game
- Word Break
- Longest Increasing Subsequence
- Palindrome Partitioning II
- Longest Common Subsequence
- Edit Distance
- Jump Game II
- Best Time to Buy and Sell Stock
- Best Time to Buy and Sell Stock II
- Best Time to Buy and Sell Stock III
- Best Time to Buy and Sell Stock IV
- Distinct Subsequences
- Interleaving String
- Maximum Subarray
- Maximum Subarray II
- Longest Increasing Continuous subsequence
- Longest Increasing Continuous subsequence II
- Graph
- Find the Connected Component in the Undirected Graph
- Route Between Two Nodes in Graph
- Topological Sorting
- Word Ladder
- Bipartial Graph Part I
- Data Structure
- Implement Queue by Two Stacks
- Min Stack
- Sliding Window Maximum
- Longest Words
- Heapify
- Problem Misc
- Nuts and Bolts Problem
- String to Integer
- Insert Interval
- Merge Intervals
- Minimum Subarray
- Matrix Zigzag Traversal
- Valid Sudoku
- Add Binary
- Reverse Integer
- Gray Code
- Find the Missing Number
- Minimum Window Substring
- Continuous Subarray Sum
- Continuous Subarray Sum II
- Longest Consecutive Sequence
- Part III - Contest
- Google APAC
- APAC 2015 Round B
- Problem A. Password Attacker
- Microsoft
- Microsoft 2015 April
- Problem A. Magic Box
- Problem B. Professor Q's Software
- Problem C. Islands Travel
- Problem D. Recruitment
- Microsoft 2015 April 2
- Problem A. Lucky Substrings
- Problem B. Numeric Keypad
- Problem C. Spring Outing
- Microsoft 2015 September 2
- Problem A. Farthest Point
- Appendix I Interview and Resume
- Interview
- Resume