遲到一年HashMap解讀 · 我的Android Wiki

原文鏈接：[遲到一年HashMap解讀](http://dandanlove.com/2017/10/27/late-one-year-hashmap/) 備份存檔： --- # 前言 HashMap和List這兩個類是我們在Java語言編程時使用的頻率非常高集合類。“知其然，更要知其所以然”。HashMap認識我已經好多年了，對我在工作中一直也盡心盡力的提供幫助。我從去年開始就想去它家拜訪來著，可是經常因為各種各樣的原因讓其遺忘在路過的風景中。（文章大部分源碼基于jdk1.7）。 [![Map&Set](http://upload-images.jianshu.io/upload_images/1319879-f243ffd6da9f703c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "Map&Set")](http://upload-images.jianshu.io/upload_images/1319879-f243ffd6da9f703c.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "Map&Set") # [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#HashMap%E6%A6%82%E8%BF%B0%EF%BC%9A "HashMap概述：")HashMap概述： HashMap是基于哈希表實現的鍵值對的集合，繼承自AbstractMap并的Map接口的非同步實現。此實現提供所有可選的映射操作，并允許使用null值和null鍵。此類不保證映射的順序，特別是它不保證該順序恒久不變。 HashMap的特殊存儲結構使得在獲取指定元素的前需要經過哈希運算，得到目標元素在哈希表中的位置，然后再進行少量的比較即可得到元素，這使得HashMap的查找效率很高。 # [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#HashMap%E7%9A%84%E7%89%B9%E7%82%B9 "HashMap的特點")HashMap的特點 * 底層實現JDK1.8之前是數組加鏈表，之后是數組加紅黑樹。 * key是用Set進行存儲的，所以不允許重復（可以允許null作為key）。 * 元素的存儲是無序的，每次重新擴容元素位置可能改變。 * 插入、獲取的時間復雜度基本是O(1)（提前試有適當的哈希函數，讓元素均勻分布分布）。 * 兩個關鍵因子：出事容量，加載因子。 # [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#HashMap%E7%9A%84%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84 "HashMap的數據結構")HashMap的數據結構 ~~~ public class HashMapK,V>extends AbstractMapK,V> implements MapK,V>, Cloneable, Serializable { static final int DEFAULT_INITIAL_CAPACITY = 1 4; // aka 16 static final int MAXIMUM_CAPACITY = 1 30; static final float DEFAULT_LOAD_FACTOR = 0.75f; static final Entry[] EMPTY_TABLE = {}; transient Entry[] table = (Entry[]) EMPTY_TABLE; transient int size; int threshold; final float loadFactor; transient int modCount; static final int ALTERNATIVE_HASHING_THRESHOLD_DEFAULT = Integer.MAX_VALUE; /**********部分代碼省略**********/ static class EntryK,V> implements Map.EntryK,V> { final K key; V value; Entry next; int hash; /**********部分代碼省略**********/ } /**********部分代碼省略**********/ } ~~~ HashMap中主要存儲著一個Entry的數組table，Entry就是數組中的元素，Entry實現了Map.Entry所以其實Entry就是一個key-value對，并且它持有一個指向下一個元素的引用，這樣構成了鏈表（在java8中Entry改名為Node，因為在Java8中Entry不僅有鏈表形式還有樹型結構，對應的類為TreeNode）。 [![HashMap的數據結構](http://upload-images.jianshu.io/upload_images/1319879-7966cdd20c44eff4.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "HashMap的數據結構")](http://upload-images.jianshu.io/upload_images/1319879-7966cdd20c44eff4.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "HashMap的數據結構") # [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#HashMap%E7%9A%84%E6%9E%84%E9%80%A0 "HashMap的構造")HashMap的構造 ~~~ /** * Constructs an empty HashMap with the specified initial * capacity and load factor. * * @param initialCapacity the initial capacity * @param loadFactor the load factor * @throws IllegalArgumentException if the initial capacity is negative * or the load factor is nonpositive */ public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); this.loadFactor = loadFactor; threshold = initialCapacity; init(); } public HashMap(int initialCapacity) { this(initialCapacity, DEFAULT_LOAD_FACTOR); } public HashMap() { this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); } public HashMap(Map m) { this(Math.max((int) (m.size() / DEFAULT_LOAD_FACTOR) + 1, DEFAULT_INITIAL_CAPACITY), DEFAULT_LOAD_FACTOR); inflateTable(threshold); putAllForCreate(m); } ~~~ 主要有兩個參數，【initialCapacity】初始容量、【loadFactor】加載因子。這兩個屬性在類定義時候都賦有默認值分別為16和0.75。table數組默認值為EMPTY_TABLE，在添加元素的時候判斷table是否為EMPTY_TABLE來調用【inflateTable】。在構造HashMap實例的時候默認【threshold】閾值等于初始容量。當構造方法的參數為Map時，調用【inflateTable(threshold)】方法對table數組容量進行設置： ~~~ /** * Inflates the table. */ private void inflateTable(int toSize) { // Find a power of 2 >= toSize int capacity = roundUpToPowerOf2(toSize); //更新閾值 threshold = (int) Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry[capacity]; initHashSeedAsNeeded(capacity); } ~~~ //返回一個比初始容量大的最小的2的冪數,如果number為2的整數冪值那么直接返回，最小為1，最大為2^31。 ~~~ private static int roundUpToPowerOf2(int number) { // assert number >= 0 : "number must be non-negative"; return number >= MAXIMUM_CAPACITY ? MAXIMUM_CAPACITY : (number > 1) ? Integer.highestOneBit((number - 1) 1) : 1; } ~~~ ## [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#highestOneBit "highestOneBit")highestOneBit 返回一個不大于i的2的整數次冪 ~~~ public static int highestOneBit(int i) { // HD, Figure 3-1 i |= (i >> 1);//i的二進制右邊2位為1 。 i |= (i >> 2);//i的二進制右邊4位為1。 i |= (i >> 4);//i的二進制右邊8位為1。 i |= (i >> 8);//i的二進制右邊16位為1。 i |= (i >> 16);//i的二進制右邊32位為1。 //這樣5次移位后再進行與操作，i的所有非0低位全部變成1； return i - (i >>> 1);//i減去所有底位的1，只留一個高為的1 } ~~~ 為什么桶的容量要是2的指數，后面會講到這樣有助于添加元素時減少哈希沖突。 # HashMap的存取實現 ## HashMap的put方法 > * 獲取key的hashcode > * 二次hash > * 通過hash找到對應的index > * 插入鏈表 ~~~ //HashMap添加元素 public V put(K key, V value) { //table沒有初始化size=0，先調用inflateTable對table容器進行擴容 if (table == EMPTY_TABLE) { inflateTable(threshold); } //在hashMap增加key=null的鍵值對 if (key == null) return putForNullKey(value); //計算key的哈希值 int hash = hash(key); //計算在table數據中的bucketIndex int i = indexFor(hash, table.length); //遍歷table[i]的鏈表，如果節點不為null，通過循環遍歷鏈表的下一個元素 for (Entry e = table[i]; e != null; e = e.next) { Object k; //找到對應的key，則將value進行替換 if (e.hash == hash && ((k = e.key) == key || key.equals(k))) { V oldValue = e.value; e.value = value; e.recordAccess(this); return oldValue; } } //沒有找到對應的key的Entry，則需要對數據進行modify,modCount加一 modCount++; //將改key，value添加入table中 addEntry(hash, key, value, i); return null; } //添加Entry void addEntry(int hash, K key, V value, int bucketIndex) { //當前桶的長度大于于閾值，而且當前桶的索引位置不為null。則需要對桶進行擴容 if ((size >= threshold) && (null != table[bucketIndex])) { //對桶進行擴容 resize(2 * table.length); //重新計算hash值 hash = (null != key) ? hash(key) : 0; //重新計算當前需要插入的桶的位置 bucketIndex = indexFor(hash, table.length); } //在bucketIndex位置創建Entry createEntry(hash, key, value, bucketIndex); } //創建Entry void createEntry(int hash, K key, V value, int bucketIndex) { //找到當前桶的當前鏈表的頭節點 Entry e = table[bucketIndex]; //新創建一個Entry將其插入在桶的bucketIndex位置的鏈表的頭部 table[bucketIndex] = new Entry size++; } ~~~ ## 獲取key的hashcode并進行二次hash ~~~ final int hash(Object k) { int h = hashSeed; if (0 != h && k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h ^= k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } ~~~ 為什么這么進行二次hash，目的是唯一的就是讓產生的hashcode散列均勻。在網絡上也找了一些關于hash值獲取的介紹，下邊是我找到感覺比較靠譜的一篇文章中關于hash算法的解析：假設h^key.hashCode()的值為：0x7FFFFFFF，table.length為默認值16。上面算法執行 [![image.png](http://upload-images.jianshu.io/upload_images/1319879-e14ca1e958bf33cf.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1000 "image.png")](http://upload-images.jianshu.io/upload_images/1319879-e14ca1e958bf33cf.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1000 "image.png") 得到i=15 其中h\^(h>>>7)^(h>>>4) 結果中的位運行標識是把h>>>7 換成 h>>>8來看。即最后h\^(h>>>8)^(h>>>4) 運算后hashCode值每位數值如下： ~~~ > 8=8 > 7=7^8 > 6=6^7^8 > 5=5^8^7^6 > 4=4^7^6^5^8 > 3=3^8^6^5^8^4^7 ————> 3^4^5^6^7 > 2=2^7^5^4^7^3^8^6 ———> 2^3^4^5^6^8 > 1=1^6^4^3^8^6^2^7^5 ——> 1^2^3^4^5^7^8 > 算法中是采用(h>>>7)而不是(h>>>8)的算法，應該是考慮1、2、3三位出現重復位^運算的情況。使得最低位上原hashCode的8位都參與了\^運算，所以在table.length為默認值16的情況下面，hashCode任意位的變化基本都能反應到最終hash table 定位算法中，這種情況下只有原hashCode第3位高1位變化不會反應到結果中，即：0x7FFFF7FF的i=15。 ~~~ 從整個二次hash的解析過程來看，通過多次位移和多次與操作獲取的hashc。每當key的hashcode有任何變化的時候都能影響到二次hash后的底位的不同，這樣在下邊根據hash獲取在桶上的索引的時候最大減少哈希沖突。 ## 獲取hash在桶上的索引 > 當我們想找一個hash函數想讓均勻分布在桶中時，我們首先想到的就是把hashcode對數組長度取模運算，這樣一來，元素的分布相對來說是比較均勻的。但是，“模”運算的消耗還是比較大。而JDK中的實現hash根數組的長度-1做一次“&”操作。 ~~~ //找到當前的hash在桶的分布節點位置 static int indexFor(int h, int length) { // assert Integer.bitCount(length) == 1 : "length must be a non-zero power of 2"; return h & (length-1); } ~~~ 這里需要講一下為什么index=h&(length-1)呢？因為HashMap中的數組長度為2的指數。（lenth-1）的值恰好是數組能容納的最大容量，且在二進制下每位都是1。所以在經過二次hash之后所獲取的code，就能通過一次與操作（取hash值的底位）讓其分布在table桶中。 ## HashMap的get方法 > 在理解了put之后，get就很簡單了。大致思路如下： > bucket里的第一個節點，直接命中； > > * 如果有沖突，則通過key.equals(k)去查找對應的entry > * 若為樹，則在樹中通過key.equals(k)查找，O(logn)； > * 若為鏈表，則在鏈表中通過key.equals(k)查找，O(n)。 ~~~ //HashMap的get方法 public V get(Object key) { //獲取key為null的value if (key == null) return getForNullKey(); //獲取key對應的Entry實例 Entry entry = getEntry(key); return null == entry ? null : entry.getValue(); } //獲取Entry final Entry getEntry(Object key) { if (size == 0) { return null; } //計算key的hash值 int hash = (key == null) ? 0 : hash(key); //根據hash調用indexFor方法找到當前key對應的桶的index，遍歷該節點對應的鏈表 for (Entry e = table[indexFor(hash, table.length)]; e != null; e = e.next) { Object k; //判斷當前Entry的hash、key的hash和Entry的key、key是否相等 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } return null; } ~~~ # HashMap的擴容 > 當HashMap中的元素越來越多的時候，因為數組的長度是固定的所以hash沖突的幾率也就越來越高，桶的節點處的鏈表就越來越長，這個時候查找元素的時間復雜度相應的增加。為了提高查詢的效率，就要對HashMap的數組進行擴容（這是一個常用的操作，數組擴容這個操作也會出現在ArrayList中。），而在HashMap數組擴容之后，最消耗性能的地方就出現了：原數組中的數據必須重新計算其在新數組中的位置，并放進去，這就是resize。 > > 當HashMap中的元素個數超過閾值時，就會進行數組擴容，【loadFactor】加載因子的默認值為0.75，【threshold】閾值等于桶長乘以loadFactor這是一個折中的取值。也就是說，默認情況下，數組大小為16，那么當HashMap中元素個數超過16*0.75=12的時候，就把數組的大小擴展為 2*16=32，即擴大一倍，然后重新計算每個元素在數組中的位置。 ~~~ //HashMap擴容 void resize(int newCapacity) { //引用備份 Entry[] oldTable = table; //原來桶的長度 int oldCapacity = oldTable.length; //判斷是否已經擴容到極限 if (oldCapacity == MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return; } //根據容器大小創新的建桶 Entry[] newTable = new Entry[newCapacity]; // transfer(newTable, initHashSeedAsNeeded(newCapacity)); //重置桶的引用 table = newTable; //重新計算閾值 threshold = (int)Math.min(newCapacity * loadFactor, MAXIMUM_CAPACITY + 1); } //用于初始化hashSeed參數. //其中hashSeed用于計算key的hash值，它與key的hashCode進行按位異或運算。 //這個hashSeed是一個與實例相關的隨機值，主要用于解決hash沖突。 final boolean initHashSeedAsNeeded(int capacity) { boolean currentAltHashing = hashSeed != 0; boolean useAltHashing = sun.misc.VM.isBooted() && (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); boolean switching = currentAltHashing ^ useAltHashing; if (switching) { hashSeed = useAltHashing ? sun.misc.Hashing.randomHashSeed(this) : 0; } return switching; } //桶中數據的遷移 void transfer(Entry[] newTable, boolean rehash) { //新的痛長 int newCapacity = newTable.length; for (Entry e : table) { //遍歷桶的沒一個節點的鏈表 while(null != e) { Entry next = e.next; //重新計算哈希值 if (rehash) { e.hash = null == e.key ? 0 : hash(e.key); } //找到當前Entry在新桶中的位置 int i = indexFor(e.hash, newCapacity); //將Entry添加在當桶中的bucketIndex處的鏈表的頭部 e.next = newTable[i]; //將產生的新鏈表賦值為桶的bucketIndex處 newTable[i] = e; //遍歷當前鏈表的下一個節點 e = next; } } } ~~~ > * 假設hash算法就是最簡單的 key mod table.length（也就是數組的長度）。 > * 最上面的是old hash 表，其中的Hash表的 size = 2, 所以 key = 3, 7, 5，在mod 2以后碰撞發生在 table[1] > * 接下來的三個步驟是 Hash表 resize 到4，并將所有的?重新resize到新Hash表的過程 [![resize](http://upload-images.jianshu.io/upload_images/1319879-4303ce1a45ea22a0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "resize")](http://upload-images.jianshu.io/upload_images/1319879-4303ce1a45ea22a0.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "resize") > 在HashMap進行擴容的時候有一個點大家發現沒，所有Entry的hash值是不需要重新計算的。因為hash值與（length - 1）取的總是hash值的二進制右邊底位，擴容一次向左多取一位二進制。 # [](http://dandanlove.com/2017/10/27/late-one-year-hashmap/#%E6%9C%89%E5%85%B3HashMap%E7%9A%84%E6%80%9D%E8%80%83 "有關HashMap的思考")有關HashMap的思考 > * 什么時候會使用HashMap？他有什么特點？是基于Map接口的實現，存儲鍵值對時，它可以接收null的鍵值，是非同步的，HashMap存儲著Entry(hash, key, value, next)對象。 > * 你知道HashMap的工作原理嗎？通過hash的方法，通過put和get存儲和獲取對象。存儲對象時，我們將K/V傳給put方法時，它調用hashCode計算hash從而得到bucket位置，進一步存儲，HashMap會根據當前bucket的占用情況自動調整容量(超過Load Facotr則resize為原來的2倍)。獲取對象時，我們將K傳給get，它調用hashCode計算hash從而得到bucket位置，并進一步調用equals()方法確定鍵值對。如果發生碰撞的時候，Hashmap通過鏈表將產生碰撞沖突的元素組織起來，在Java 8中，如果一個bucket中碰撞沖突的元素超過某個限制(默認是8)，則使用紅黑樹來替換鏈表，從而提高速度。 > * 你知道get和put的原理嗎？equals()和hashCode()的都有什么作用？通過對key的hashCode()進行hashing，并計算下標( n-1 & hash)，從而獲得buckets的位置。如果產生碰撞，則利用key.equals()方法去鏈表或樹中去查找對應的節點 > * 你知道hash的實現嗎？為什么要這樣實現？在通過hashCode()的高位與底位進行異或，主要是從速度、功效、質量來考慮的，這么做可以在bucket的n比較小的時候，也能保證考慮到高低bit都參與到hash的計算中，同時不會有太大的開銷。 > * 如果HashMap的大小超過了負載因子(load factor)定義的容量，怎么辦？如果超過了負載因子(默認0.75)，則會重新resize一個原來長度兩倍的HashMap，并且重新調用hash方法。 # JDK1.8對HashMap的改進 ## 代碼實現的不同之處 ~~~ //鏈表切換為紅黑樹的閾值 static final int TREEIFY_THRESHOLD = 8; //紅黑樹切花為鏈表的閾值 static final int UNTREEIFY_THRESHOLD = 6; //紅黑樹上的節點個數滿足時對整個桶進行擴容 static final int MIN_TREEIFY_CAPACITY = 64; //紅黑樹 static final class TreeNodeK,V> extends LinkedHashMap.EntryK,V> { TreeNode parent; // red-black tree links TreeNode left; TreeNode right; TreeNode prev; // needed to unlink next upon deletion boolean red; /*************部分代碼省略*****************/ } //獲取key的hashCode,并進行二次hash。二次hash只是將hashcode的高16位于第16位進行異或 static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); } //resize時hash沖突使用的是紅黑樹 final Node[] resize() { /*************部分代碼省略*****************/ } ~~~ ## 性能的提升 > 哈希碰撞會對hashMap的性能帶來災難性的影響。如果多個hashCode()的值落到同一個桶內的時候，這些值是存儲到一個鏈表中的。最壞的情況下，所有的key都映射到同一個桶中，這樣hashmap就退化成了一個鏈表——查找時間從O(1)到O(n)，而使用紅黑樹代替鏈表查找時間會變為O(logn)。參考文章： [主題：HashMap hash方法分析](http://www.iteye.com/topic/709945) 文章到這里就全部講述完啦，若有其他需要交流的可以留言哦~！~！想閱讀作者的更多文章，可以查看我?[個人博客](http://dandanlove.com/)?和公共號： [![振興書城](http://upload-images.jianshu.io/upload_images/1319879-612c4c66d40ce855.jpg?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "振興書城")](http://upload-images.jianshu.io/upload_images/1319879-612c4c66d40ce855.jpg?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240 "振興書城") 本文標題:[遲到一年HashMap解讀](http://dandanlove.com/2017/10/27/late-one-year-hashmap/) 文章作者:[振興](http://dandanlove.com/ "回到主頁") 發布時間:2017-10-27, 11:35:00 最后更新:2017-11-19, 16:58:40 原始鏈接:[http://yoursite.com/2017/10/27/late-one-year-hashmap/](http://dandanlove.com/2017/10/27/late-one-year-hashmap/ "遲到一年HashMap解讀")? 許可協議:?["署名-非商用-相同方式共享 4.0"](http://creativecommons.org/licenses/by-nc-sa/4.0/ "CC BY-NC-SA 4.0 International")?轉載請保留原文鏈接及作者。