# Heap Sort - 堆排序
堆排序通常基于[**二叉堆**](http://algorithm.yuanbin.me/zh-cn/basics_data_structure/heap.html)實現,以大根堆為例,堆排序的實現過程分為兩個子過程。第一步為取出大根堆的根節點(當前堆的最大值), 由于取走了一個節點,故需要對余下的元素重新建堆。重新建堆后繼續取根節點,循環直至取完所有節點,此時數組已經有序。基本思想就是這樣,不過實現上還是有些小技巧的。
### 堆的操作
以大根堆為例,堆的常用操作如下。
1. 最大堆調整(Max_Heapify):將堆的末端子節點作調整,使得子節點永遠小于父節點
1. 創建最大堆(Build_Max_Heap):將堆所有數據重新排序
1. 堆排序(HeapSort):移除位在第一個數據的根節點,并做最大堆調整的遞歸運算
其中步驟1是給步驟2和3用的。

建堆時可以自頂向下,也可以采取自底向上,以下先采用自底向上的思路分析。我們可以將數組的后半部分節點想象為堆的最下面的那些節點,由于是單個節點,故顯然滿足二叉堆的定義,于是乎我們就可以從中間節點向上逐步構建二叉堆,每前進一步都保證其后的節點都是二叉堆,這樣一來前進到第一個節點時整個數組就是一個二叉堆了。下面用 C++ 實現一個堆的類。
堆排在空間比較小(嵌入式設備和手機)時特別有用,但是因為現代系統往往有較多的緩存,堆排序無法有效利用緩存,數組元素很少和相鄰的其他元素比較,故緩存未命中的概率遠大于其他在相鄰元素間比較的算法。但是在海量數據的排序下又重新發揮了重要作用,因為它在插入操作和刪除最大元素的混合動態場景中能保證對數級別的運行時間。TopM
### C++
~~~
#include <iostream>
#include <vector>
using namespace std;
class HeapSort {
// get the parent node index
int parent(int i) {
return (i - 1) / 2;
}
// get the left child node index
int left(int i) {
return 2 * i + 1;
}
// get the right child node index
int right(int i) {
return 2 * i + 2;
}
// build max heap
void build_max_heapify(vector<int> &nums, int heap_size) {
for (int i = heap_size / 2; i >= 0; --i) {
max_heapify(nums, i, heap_size);
}
print_heap(nums, heap_size);
}
// build min heap
void build_min_heapify(vector<int> &nums, int heap_size) {
for (int i = heap_size / 2; i >= 0; --i) {
min_heapify(nums, i, heap_size);
}
print_heap(nums, heap_size);
}
// adjust the heap to max-heap
void max_heapify(vector<int> &nums, int k, int len) {
// int len = nums.size();
while (k < len) {
int max_index = k;
// left leaf node search
int l = left(k);
if (l < len && nums[l] > nums[max_index]) {
max_index = l;
}
// right leaf node search
int r = right(k);
if (r < len && nums[r] > nums[max_index]) {
max_index = r;
}
// node after k are max-heap already
if (k == max_index) {
break;
}
// keep the root node the largest
int temp = nums[k];
nums[k] = nums[max_index];
nums[max_index] = temp;
// adjust not only just current index
k = max_index;
}
}
// adjust the heap to min-heap
void min_heapify(vector<int> &nums, int k, int len) {
// int len = nums.size();
while (k < len) {
int min_index = k;
// left leaf node search
int l = left(k);
if (l < len && nums[l] < nums[min_index]) {
min_index = l;
}
// right leaf node search
int r = right(k);
if (r < len && nums[r] < nums[min_index]) {
min_index = r;
}
// node after k are min-heap already
if (k == min_index) {
break;
}
// keep the root node the largest
int temp = nums[k];
nums[k] = nums[min_index];
nums[min_index] = temp;
// adjust not only just current index
k = min_index;
}
}
public:
// heap sort
void heap_sort(vector<int> &nums) {
int len = nums.size();
// init heap structure
build_max_heapify(nums, len);
// heap sort
for (int i = len - 1; i >= 0; --i) {
// put the largest number int the last
int temp = nums[0];
nums[0] = nums[i];
nums[i] = temp;
// reconstruct heap
build_max_heapify(nums, i);
}
print_heap(nums, len);
}
// print heap between [0, heap_size - 1]
void print_heap(vector<int> &nums, int heap_size) {
for (int i = 0; i < heap_size; ++i) {
cout << nums[i] << ", ";
}
cout << endl;
}
};
int main(int argc, char *argv[])
{
int A[] = {19, 1, 10, 14, 16, 4, 7, 9, 3, 2, 8, 5, 11};
vector<int> nums;
for (int i = 0; i < sizeof(A) / sizeof(A[0]); ++i) {
nums.push_back(A[i]);
}
HeapSort sort;
sort.print_heap(nums, nums.size());
sort.heap_sort(nums);
return 0;
}
~~~
### 復雜度分析
從代碼中可以發現堆排最費時間的地方在于構建二叉堆的過程。
上述構建大根堆和小根堆都是自底向上的方法,建堆過程時間復雜度為 O(2N)O(2N)O(2N), 堆排過程中重建的時間復雜度為 O(2NlogN)O(2N \log N)O(2NlogN). 故總的時間復雜度為 O(NlogN)O(N \log N)O(NlogN).
先看看建堆的過程,畫圖分析(比如以8個節點為例)可知在最壞情況下,每次都需要調整之前已經成為堆的節點,那么就意味著有二分之一的節點向下比較了一次,四分之一的節點向下比較了兩次,八分之一的節點比較了三次... 等差等比數列求和,具體過程可參考下面的鏈接。
### Reference
- [堆排序 - 維基百科,自由的百科全書](http://zh.wikipedia.org/wiki/%E5%A0%86%E6%8E%92%E5%BA%8F)
- [Priority Queues](http://algs4.cs.princeton.edu/24pq/) - Robert Sedgewick 的大作,詳解了關于堆的操作。
- [經典排序算法總結與實現 | Jark's Blog](http://wuchong.me/blog/2014/02/09/algorithm-sort-summary/) - 堆排序講的很好。
- *Algorithm* - Robert Sedgewick
- [堆排序中建堆過程時間復雜度O(n)怎么來的?](http://www.zhihu.com/question/20729324)
- [《大話數據結構》第9章 排序 9.7 堆排序(上) - 伍迷 - 博客園](http://www.cnblogs.com/cj723/archive/2011/04/21/2024261.html)
- [《大話數據結構》第9章 排序 9.7 堆排序(下) - 伍迷 - 博客園](http://www.cnblogs.com/cj723/archive/2011/04/22/2024269.html)
- Preface
- Part I - Basics
- Basics Data Structure
- String
- Linked List
- Binary Tree
- Huffman Compression
- Queue
- Heap
- Stack
- Set
- Map
- Graph
- Basics Sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
- Bucket Sort
- Counting Sort
- Radix Sort
- Basics Algorithm
- Divide and Conquer
- Binary Search
- Math
- Greatest Common Divisor
- Prime
- Knapsack
- Probability
- Shuffle
- Basics Misc
- Bit Manipulation
- Part II - Coding
- String
- strStr
- Two Strings Are Anagrams
- Compare Strings
- Anagrams
- Longest Common Substring
- Rotate String
- Reverse Words in a String
- Valid Palindrome
- Longest Palindromic Substring
- Space Replacement
- Wildcard Matching
- Length of Last Word
- Count and Say
- Integer Array
- Remove Element
- Zero Sum Subarray
- Subarray Sum K
- Subarray Sum Closest
- Recover Rotated Sorted Array
- Product of Array Exclude Itself
- Partition Array
- First Missing Positive
- 2 Sum
- 3 Sum
- 3 Sum Closest
- Remove Duplicates from Sorted Array
- Remove Duplicates from Sorted Array II
- Merge Sorted Array
- Merge Sorted Array II
- Median
- Partition Array by Odd and Even
- Kth Largest Element
- Binary Search
- Binary Search
- Search Insert Position
- Search for a Range
- First Bad Version
- Search a 2D Matrix
- Search a 2D Matrix II
- Find Peak Element
- Search in Rotated Sorted Array
- Search in Rotated Sorted Array II
- Find Minimum in Rotated Sorted Array
- Find Minimum in Rotated Sorted Array II
- Median of two Sorted Arrays
- Sqrt x
- Wood Cut
- Math and Bit Manipulation
- Single Number
- Single Number II
- Single Number III
- O1 Check Power of 2
- Convert Integer A to Integer B
- Factorial Trailing Zeroes
- Unique Binary Search Trees
- Update Bits
- Fast Power
- Hash Function
- Count 1 in Binary
- Fibonacci
- A plus B Problem
- Print Numbers by Recursion
- Majority Number
- Majority Number II
- Majority Number III
- Digit Counts
- Ugly Number
- Plus One
- Linked List
- Remove Duplicates from Sorted List
- Remove Duplicates from Sorted List II
- Remove Duplicates from Unsorted List
- Partition List
- Two Lists Sum
- Two Lists Sum Advanced
- Remove Nth Node From End of List
- Linked List Cycle
- Linked List Cycle II
- Reverse Linked List
- Reverse Linked List II
- Merge Two Sorted Lists
- Merge k Sorted Lists
- Reorder List
- Copy List with Random Pointer
- Sort List
- Insertion Sort List
- Check if a singly linked list is palindrome
- Delete Node in the Middle of Singly Linked List
- Rotate List
- Swap Nodes in Pairs
- Remove Linked List Elements
- Binary Tree
- Binary Tree Preorder Traversal
- Binary Tree Inorder Traversal
- Binary Tree Postorder Traversal
- Binary Tree Level Order Traversal
- Binary Tree Level Order Traversal II
- Maximum Depth of Binary Tree
- Balanced Binary Tree
- Binary Tree Maximum Path Sum
- Lowest Common Ancestor
- Invert Binary Tree
- Diameter of a Binary Tree
- Construct Binary Tree from Preorder and Inorder Traversal
- Construct Binary Tree from Inorder and Postorder Traversal
- Subtree
- Binary Tree Zigzag Level Order Traversal
- Binary Tree Serialization
- Binary Search Tree
- Insert Node in a Binary Search Tree
- Validate Binary Search Tree
- Search Range in Binary Search Tree
- Convert Sorted Array to Binary Search Tree
- Convert Sorted List to Binary Search Tree
- Binary Search Tree Iterator
- Exhaustive Search
- Subsets
- Unique Subsets
- Permutations
- Unique Permutations
- Next Permutation
- Previous Permuation
- Unique Binary Search Trees II
- Permutation Index
- Permutation Index II
- Permutation Sequence
- Palindrome Partitioning
- Combinations
- Combination Sum
- Combination Sum II
- Minimum Depth of Binary Tree
- Word Search
- Dynamic Programming
- Triangle
- Backpack
- Backpack II
- Minimum Path Sum
- Unique Paths
- Unique Paths II
- Climbing Stairs
- Jump Game
- Word Break
- Longest Increasing Subsequence
- Palindrome Partitioning II
- Longest Common Subsequence
- Edit Distance
- Jump Game II
- Best Time to Buy and Sell Stock
- Best Time to Buy and Sell Stock II
- Best Time to Buy and Sell Stock III
- Best Time to Buy and Sell Stock IV
- Distinct Subsequences
- Interleaving String
- Maximum Subarray
- Maximum Subarray II
- Longest Increasing Continuous subsequence
- Longest Increasing Continuous subsequence II
- Graph
- Find the Connected Component in the Undirected Graph
- Route Between Two Nodes in Graph
- Topological Sorting
- Word Ladder
- Bipartial Graph Part I
- Data Structure
- Implement Queue by Two Stacks
- Min Stack
- Sliding Window Maximum
- Longest Words
- Heapify
- Problem Misc
- Nuts and Bolts Problem
- String to Integer
- Insert Interval
- Merge Intervals
- Minimum Subarray
- Matrix Zigzag Traversal
- Valid Sudoku
- Add Binary
- Reverse Integer
- Gray Code
- Find the Missing Number
- Minimum Window Substring
- Continuous Subarray Sum
- Continuous Subarray Sum II
- Longest Consecutive Sequence
- Part III - Contest
- Google APAC
- APAC 2015 Round B
- Problem A. Password Attacker
- Microsoft
- Microsoft 2015 April
- Problem A. Magic Box
- Problem B. Professor Q's Software
- Problem C. Islands Travel
- Problem D. Recruitment
- Microsoft 2015 April 2
- Problem A. Lucky Substrings
- Problem B. Numeric Keypad
- Problem C. Spring Outing
- Microsoft 2015 September 2
- Problem A. Farthest Point
- Appendix I Interview and Resume
- Interview
- Resume