生信大本營 · 用Python解決生物信息學問題（To Solve Bioinformatics Problems with Python）

<a id = "header"></a> 本專題是各類生物信息學/生物統計學問題的博覽會。問題背景多樣，方法靈活，但都沒有離開生命科學背景與應用。這里劃分了16個子專題，基本涵蓋了生命科學與生物信息學研究的主流領域。 >[info] 以下是各個子專題及其問題的索引。需要注意的是，有的問題被歸入**不止一個子專題**中。請點擊以下子專題的鏈接，跳轉到索引的對應位置： - [比對](#alignment)；[組合數學](#combinatorics)；[計算質譜](#comp-mass-spec)；[分治法](#divide-conquer)； - [動態規劃](#dyn-prog)；[基因組組裝](#genome-asse)；[基因組重排](#genome-rear)；[圖論算法](#graph-algo)； - [圖結構](#graphs)；[遺傳](#heredity)；[系統發生](#phylogeny)；[種群動態](#popu-dyna)； - [概率論](#probability)；[集合論](#set-theory)；[排序](#sorting)；[字符串算法](#string-algo) 點擊各子專題的副標題，可回到本介紹頁頂部；要訪問其他專題，請回到[歡迎界面](../README.md)。 &emsp; <a id = "alignment"></a> ## [比對](#header) 將一條序列比對到另一條序列上（序列中允許間隔），以表示序列間的插入、缺失與替換。 - Counting Point Mutations [點突變計數](HAMM.md) - Pairwise Global Alignment 雙序列全局比對 - Suboptimal Local Alignment 次優的局部比對 - Transitions and Transversions 轉換與顛換 - Global Multiple Alignment 多序列全局比對 - Creating a Distance Matrix 創建距離矩陣 - Edit Distance 編輯距離 - Edit Distance Alignment 編輯距離比對 - Counting Optimal Alignments 計算最優比對 - Global Alignment with Scoring Matrix 基于打分矩陣的全局比對 - Global Alignment with Constant Gap Penalty 基于恒定空位罰分的全局比對 - Local Alignment with Scoring Matrix 基于打分矩陣的局部比對 - Maximizing the Gap Symbols of an Optimal Alignment 使最優比對的間隔信號最大化 - Multiple Alignment 多重比對 - Global Alignment with Scoring Matrix and Affine Gap Penalty 基于打分矩陣與仿射空位罰分的全局比對 - Overlap Alignment 重疊比對 - Semiglobal Alignment 半全局比對 - Local Alignment with Affine Gap Penalty 基于仿射空位罰分的局部比對 - Isolating Symbols in Alignments 區分比對信號 &emsp; <a id = "combinatorics"></a> ## [組合數學](#header) 物體計數的數學方法。 - Rabbits and Recurrence Relations [兔子與遞推關系](FIB.md) - Mortal Fibonacci Rabbits [壽命有限的斐波那契兔子](FIBD.md) - Inferring mRNA from Protein 從蛋白質推測mRNA - Open Reading Frames 開放讀碼框 - Enumerating Gene Orders 枚舉基因次序 - Perfect Matchings and RNA Secondary Structures 完美匹配與RNA二級結構 - Partial Permutations 部分置換 - Enumerating Oriented Gene Orderings 枚舉定向基因次序 - Catalan Numbers and RNA Secondary Structures 卡特蘭數與RNA二級結構 - Counting Phylogenetic Ancestors 計算系統發生樹祖先 - Maximum Matchings and RNA Secondary Structures 最大匹配與RNA二級結構 - Reversal Distance 翻轉距離 - Counting Subsets 子集計算 - Introduction to Alternative Splicing 可變剪接介紹 - Motzkin Numbers and RNA Secondary Structures 莫特金數與RNA二級結構 - Sorting by Reversals 反轉排序 - Wobble Bonding and RNA Secondary Structures 搖擺綁定與RNA二級結構 - Counting Optimal Alignments 計算最優比對 - Counting Unrooted Binary Trees 計算無根二叉樹 - Counting Quartets 計算四分體 - Enumerating Unrooted Binary Trees 枚舉無根二叉樹 - Counting Rooted Binary Trees 計算有根二叉樹 &emsp; <a id = "comp-mass-spec"></a> ## [計算質譜](#header) 質譜技術——一種通過將分子分裂成小塊并分析這些小塊的化學性質來識別分子的技術。 - Calculating Protein Mass 計算蛋白質質量 - Inferring Protein from Spectrum 從光譜推斷蛋白質 - Comparing Spectra with the Spectral Convolution 光譜卷積的光譜比較 - Inferring Peptide from Full Spectrum 從全譜推斷蛋白肽 - Matching a Spectrum to a Protein 蛋白質與光譜的匹配 - Using the Spectrum Graph to Infer Peptides 利用頻譜圖推斷蛋白肽 &emsp; <a id = "divide-conquer"></a> ## [分治法](#header) - Binary Search 二分查找 - Merge Sort 歸并排序 &emsp; <a id = "dyn-prog"></a> ## [動態規劃](#header) 動態規劃算法——通過逐步地在更大的案例中解決問題，來建立解決方案。 - Rabbits and Recurrence Relations [兔子與遞推關系](FIB.md) - Mortal Fibonacci Rabbits [壽命有限的斐波那契兔子](FIBD.md) - Longest Increasing Subsequence 最長上升子序列 - Perfect Matchings and RNA Secondary Structures 完美匹配與RNA二級結構 - Catalan Numbers and RNA Secondary Structures 卡特蘭數與RNA二級結構 - Finding a Shared Spliced Motif 找出共享剪接基序 - Maximum Matchings and RNA Secondary Structures 最大匹配與RNA二級結構 - Edit Distance 編輯距離 - Motzkin Numbers and RNA Secondary Structures 莫特金數與RNA二級結構 - Interleaving Two Motifs 交錯雙基序 - Edit Distance Alignment 編輯距離比對 - Finding Disjoint Motifs in a Gene 找出基因中的不相連基序 - Wobble Bonding and RNA Secondary Structures 搖擺綁定與RNA二級結構 - Global Alignment with Scoring Matrix 基于打分矩陣的全局比對 - Global Alignment with Constant Gap Penalty 基于恒定空位罰分的全局比對 - Local Alignment with Scoring Matrix 基于打分矩陣的局部比對 - Maximizing the Gap Symbols of an Optimal Alignment 使最優比對的間隔信號最大化 - Multiple Alignment 多重比對 - Global Alignment with Scoring Matrix and Affine Gap Penalty 基于打分矩陣與仿射空位罰分的全局比對 - Overlap Alignment 重疊比對 - Semiglobal Alignment 半全局比對 - Local Alignment with Affine Gap Penalty 基于仿射空位罰分的局部比對 - Isolating Symbols in Alignments 區分比對信號 &emsp; <a id = "genome-asse"></a> ## [基因組組裝](#header) 從DNA短片段中重建連續的染色體大片段的算法。 - Genome Assembly as Shortest Superstring 基因組組裝為最短“超序列” - Error Correction in Reads 讀段誤差校正 - Constructing a De Bruijn Graph 構建De Bruijn圖 - Genome Assembly with Perfect Coverage 具有完美覆蓋度的基因組組裝 - Genome Assembly Using Reads 使用讀段進行基因組組裝 - Assessing Assembly Quality with N50 and N75 使用N50與N75評估基因組組裝質量 - Genome Assembly with Perfect Coverage and Repeats 具有完全覆蓋與重復的基因組組裝 &emsp; <a id = "genome-rear"></a> ## [基因組重排](#header) 影響整個核酸間隔組成的大規模突變。 - Enumerating Gene Orders 枚舉基因次序 - Longest Increasing Subsequence 最長上升子序列 - Partial Permutations 部分置換 - Enumerating Oriented Gene Orderings 枚舉定向基因次序 - Reversal Distance 翻轉距離 - Sorting by Reversals 反轉排序 &emsp; <a id = "graph-algo"></a> ## [圖論算法](#header) 解釋、處理網絡或圖的算法。 - Overlap Graphs 重疊圖 - Completing a Tree 構建樹 - Introduction to Pattern Matching 模式匹配介紹 - Finding the Longest Multiple Repeat 找出最長多次重復區域 - Wobble Bonding and RNA Secondary Structures 搖擺綁定與RNA二級結構 - Genome Assembly with Perfect Coverage 具有完美覆蓋度的基因組組裝 - Using the Spectrum Graph to Infer Peptides 利用頻譜圖推斷蛋白肽 - Encoding Suffix Trees 編碼后綴樹 - Genome Assembly Using Reads 使用讀段進行基因組組裝 - Identifying Maximal Repeats 識別最大重復區域 - Genome Assembly with Perfect Coverage and Repeats 具有完全覆蓋與重復的基因組組裝 &emsp; <a id = "graphs"></a> ## [圖結構](#header) 圖——包含一系列結點、兩結點之間以邊相連接的網絡。 - Degree Array 度的數組 - Double-Degree Array 雙向度的數組 - Breadth-First Search 廣度優先搜索 - Connected Components 連通分支 - Testing Bipartiteness 測試雙向性 - Testing Acyclicity 測試無環性 - Dijkstra's Algorithm Dijkstra最短路算法 - Square in a Graph 圖中的平方 - Bellman-Ford Algorithm BF最短路算法 - Shortest Cycle Through a Given Edge 通過給定邊的最短環路 - Topological Sorting 拓撲排序 - Hamiltonian Path in DAG 有向無環圖中的哈密頓路徑 - Negative Weight Cycle 負權環路 - Strongly Connected Components 強連通分支 - 2-Satisfiability 2-可滿足性問題 - General Sink 普遍連通節點 - Semi-Connected Graph 半連通圖 - Shortest Paths in DAG 有向無環圖上的最短路 &emsp; <a id = "heredity"></a> ## [遺傳](#header) 性狀遺傳的科學研究。 - Mendel's First Law [孟德爾第一定律（分離定律）](IPRB.md) - Calculating Expected Offspring 計算預期后代 - Independent Alleles 獨立的等位基因 - Independent Segregation of Chromosomes 染色體的獨立分離 - Inferring Genotype from a Pedigree 從系譜圖推斷基因型 - Sex-Linked Inheritance 伴性遺傳 &emsp; <a id = "phylogeny"></a> ## [系統發生](#header) 系統發生樹——對生物演化場景進行建模，一系列物種從它們的預設祖先中產生。 - Completing a Tree 構建樹 - Counting Phylogenetic Ancestors 計算系統發生樹祖先 - Creating a Distance Matrix 創建距離矩陣 - Distances in Trees 樹的距離 - Creating a Character Table 創建特征表 - Newick Format with Edge Weights 帶邊權值的Newick格式 - Creating a Character Table from Genetic Strings 從基因序列中創建特征表 - Counting Unrooted Binary Trees 計算無根二叉樹 - Quartets 四分體 - Character-Based Phylogeny 基于特征的系統發生 - Counting Quartets 計算四分體 - Enumerating Unrooted Binary Trees 枚舉無根二叉樹 - Inferring Genotype from a Pedigree 從系譜圖推斷基因型 - Counting Rooted Binary Trees 計算有根二叉樹 - Phylogeny Comparison with Split Distance 伴隨分歧距離的系統發生比較 - Alignment-Based Phylogeny 基于比對的系統發生 - Fixing an Inconsistent Character Set 修復不一致的特征集合 - Quartet Distance 四分體距離 - Identifying Reversing Substitutions 識別反轉替換 &emsp; <a id = "popu-dyna"></a> ## [種群動態](#header) - Counting Disease Carriers 計算疾病攜帶者 - The Wright-Fisher Model of Genetic Drift 遺傳漂變的Wright-Fisher模型 - The Founder Effect and Genetic Drift 奠基者效應與遺傳漂變 &emsp; <a id = "probability"></a> ## [概率論](#header) 概率論——關于隨機事件發生可能性，或特定事件即將發生的可能性大小的數學研究。 - Mendel's First Law [孟德爾第一定律（分離定律）](IPRB.md) - Calculating Expected Offspring 計算預期后代 - Independent Alleles 獨立的等位基因 - Introduction to Random Strings 隨機序列簡介 - Matching Random Motifs 隨機基序的匹配 - Expected Number of Restriction Sites 限制性位點的期望數 - Independent Segregation of Chromosomes 染色體的獨立分離 - Counting Disease Carriers 計算疾病攜帶者 - Inferring Genotype from a Pedigree 從系譜圖推斷基因型 - Sex-Linked Inheritance 伴性遺傳 - The Wright-Fisher Model of Genetic Drift 遺傳漂變的Wright-Fisher模型 - Wright-Fisher's Expected Behavior Wright-Fisher的預期行為 - The Founder Effect and Genetic Drift 奠基者效應與遺傳漂變 &emsp; <a id = "set-theory"></a> ## [集合論](#header) 集合論——集合及其特性的數學研究。 - Counting Subsets 計算子集 - Introduction to Set Operations 集合運算簡介 - Creating a Restriction Map 構建限制性圖譜 &emsp; <a id = "sorting"></a> ## [排序](#header) 排序問題——尋找將無序結構更改為有序結構的最少操作方式。 - Insertion Sort 插入排序 - Majority Element 主元素（排序） - Merge Two Sorted Arrays 合并兩個已排序數組 - 2SUM - Building a Heap 建立堆 - Merge Sort 歸并排序 - 2-Way Partition 雙向劃分 - 3SUM - Heap Sort 堆排序 - Counting Inversions 計算反演 - 3-Way Partition 3向劃分 - Median 中位數 - Partial Sort 部分排序 - Quick Sort 快速排序 &emsp; <a id = "string-algo"></a> ## [字符串算法](#header) 涉及字符串操作與特性的算法。 - Counting DNA Nucleotides [DNA核酸計數](DNA.md) - Transcribing DNA into RNA [DNA轉錄成RNA](RNA.md) - Complementing a Strand of DNA [DNA鏈互補配對](REVC.md) - Computing GC Content [計算GC含量](GC.md) - Translating RNA into Protein [RNA翻譯成蛋白質](PROT.md) - Finding a Motif in DNA [找出DNA中的基序](SUBS.md) - Consensus and Profile [（序列）一致性與概況](CONS.md) - Finding a Shared Motif 找出共享基序 - Locating Restriction Sites 定位限制性位點 - RNA Splicing RNA剪接 - Enumerating k-mers Lexicographically 按字典序枚舉k-mers - Perfect Matchings and RNA Secondary Structures 完美匹配與RNA二級結構 - Finding a Spliced Motif 找出剪接基序 - Catalan Numbers and RNA Secondary Structures 卡特蘭數與RNA二級結構 - k-Mer Composition k成分 - Speeding Up Motif Finding 加速基序查找 - Finding a Shared Spliced Motif 找出共享剪接基序 - Ordering Strings of Varying Length Lexicographically 按字典序排列長度不同的字符串 - Maximum Matchings and RNA Secondary Structures 最大匹配與RNA二級結構 - Motzkin Numbers and RNA Secondary Structures 莫特金數與RNA二級結構 - Interleaving Two Motifs 交錯雙基序 - Introduction to Pattern Matching 模式匹配介紹 - Finding Disjoint Motifs in a Gene 找出基因中的不相連基序 - Encoding Suffix Trees 編碼后綴樹 - Linguistic Complexity of a Genome 基因組的語義復雜性 - Identifying Maximal Repeats 識別最大重復區域 - Finding All Similar Motifs 找出所有相似基序