原理一、Java中的HashMap的實作-有解無憂

文章從JDK1.7和JDK1.8兩個版本決議HashMap的實作原理及其中常見的面試題(兩個版本HashMap最大的區別，1.7版HashMap=陣列+鏈表，1.8版HashMap=陣列+紅黑樹+鏈表)

一、先講講哈希表

又叫散串列，是為了加快查找陣列元素的速度，將每個要存進陣列的數值進行哈希計算，從而獲得另外一個唯一對應的數，將該數作為目標數值存進陣列的索引，以后每次查詢該數，只要再進行一次哈希計算，可以找到對應的索引，取值，

所以在不考慮哈希沖突的情況下，哈希表的增刪改查都為O(1)
解決哈希沖突的方法：1、開放地址法；2、再哈希法；3、公共溢位區；4、鏈地址法（HashMap所采用的）

開放地址法：發生沖突時，向后查找一個空位插入

再哈希法：采用另外一個散列函式

二、JDK1.7

1、原理

HashMap底層為陣列，加鏈表用于解決哈希沖突，并且鏈表的插入用的是頭插法

后插入的值被查詢的概率更高，效率更高
頭插法擴容時鏈表順序倒置，可能導致鏈表成環問題

2、初始化

/**
     * Constructs an empty <tt>HashMap</tt> with the specified initial
     * capacity and load factor.
     *
     * @param  initialCapacity the initial capacity
     * @param  loadFactor      the load factor
     * @throws IllegalArgumentException if the initial capacity is negative
     *         or the load factor is nonpositive
     */
    public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " +
                                               initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " +
                                               loadFactor);

        // Find a power of 2 >= initialCapacity
        int capacity = 1;
        while (capacity < initialCapacity)
            capacity <<= 1;

        this.loadFactor = loadFactor;
        threshold = (int)(capacity * loadFactor);
        table = new Entry[capacity];
        init();
    }

以上為HashMap的初始化，有幾個引數需要注意

capacity，容量，即陣列長度，默認為16，計算index時與15進行異或計算，剛好保留hashcode后四位

初始化時，Map的容量必須都為2^n(a power of 2)，為的是使得哈希計算求index的值盡可能不同，減少哈希沖突，哈希更均勻
capacity <<= 1是位運算，即二進制數值左移一位，回圈遞增，確保初始化后的容量為2^n

loadFactor，裝載因子(默認為0.75)

threhold，閾值(capacity * loadFactor) 存放數值(size)達到閾值時進行擴容
0.75確保了不會存的值太少，空間利用率低，存的值太多，效率低

table，陣列的長度 new Entry[]為鍵值對

3、插入元素

    static int hash(int h) {
        // This function ensures that hashCodes that differ only by
        // constant multiples at each bit position have a bounded
        // number of collisions (approximately 8 at default load factor).
        h ^= (h >>> 20) ^ (h >>> 12);
        return h ^ (h >>> 7) ^ (h >>> 4);
    }
      /**
     * Returns index for hash code h.
     */
     static int indexFor(int h, int length) {
        return h & (length-1); //異或 比 取模更快                                                                                                                    
    }
      /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
        if (key == null)
            return putForNullKey(value);  //key為空，存在陣列的第一位
        int hash = hash(key.hashCode());  // hashcode()后得到二進制數需要右移，保證高位參與運算，減少哈希計算的沖突
        int i = indexFor(hash, table.length); //指定到陣列對應的索引，采用異或運算，速度更快
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {  //陣列對應索引上不為空時，進行遍歷
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = https://www.cnblogs.com/gg12138/p/e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue; //替換新的value，訪問oldValue
            }
        }

        modCount++;  
        addEntry(hash, key, value, i); //索引為空
        return null;
    }

4、獲取元素

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
        if (key == null)
            return getForNullKey();
        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) //equals()方法放在最后可以提高效率
                return e.value;
        }
        return null;

洗掉元素時，不能用for回圈，要用迭代器進行元素洗掉

5、擴容

/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        if (size++ >= threshold)
            resize(2 * table.length); //全部元素個數超過閾值時，而且陣列對應索引上必須有值，陣列才會擴容兩倍  
    }

擴容時，原陣列的全部資料，進行重新計算（陣列長度改變）插入到新的陣列(最后插入的元素會轉化為最先插入的)，比較消耗性能

向HashMap添加1000個元素是怎么擴容的？1000，初始化1024大小的陣列，達到閾值1024*0.75=768時進行擴容
所以在新建HashMap時最好自定義初始化陣列的長度，減少擴容消耗性能

三、JDK1.8

1、原理

JDK1.8后的HashMap底層采用陣列+部分鏈表+部分紅黑樹的組合，并采用尾插法

鏈表遍歷的時間復雜度為O(n)，紅黑樹為O(log n)，提升了效率
當一個索引上要存盤的元素個數超過8個，并且陣列的長度大于64時，鏈表就會樹化成紅黑樹

2、插入元素

 /**
     * Implements Map.put and related methods
     *
     * @param hash hash for key
     * @param key the key
     * @param value the value to put
     * @param onlyIfAbsent if true, don't change existing value
     * @param evict if false, the table is in creation mode.
     * @return previous value, or null if none
     */
    final V putVal(int hash, K key, V value, boolean onlyIfAbsent,
                   boolean evict) {
        Node<K,V>[] tab; Node<K,V> p; int n, i;
        if ((tab = table) == null || (n = tab.length) == 0)
            n = (tab = resize()).length;
        if ((p = tab[i = (n - 1) & hash]) == null)
            tab[i] = newNode(hash, key, value, null);
        else {
            Node<K,V> e; K k;
            if (p.hash == hash &&
                ((k = p.key) == key || (key != null && key.equals(k))))
                e = p;
            else if (p instanceof TreeNode)
                e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value);
            else {
                for (int binCount = 0; ; ++binCount) {
                    if ((e = p.next) == null) {
                        p.next = newNode(hash, key, value, null);
                        if (binCount >= TREEIFY_THRESHOLD - 1) //  TREEIFY_THRESHOLD=8
                            treeifyBin(tab, hash);
                        break;
                    }
                    if (e.hash == hash &&
                        ((k = e.key) == key || (key != null && key.equals(k))))
                        break;
                    p = e;
                }
            }
            if (e != null) { // existing mapping for key
                V oldValue = https://www.cnblogs.com/gg12138/p/e.value;
                if (!onlyIfAbsent || oldValue == null)
                    e.value = value;
                afterNodeAccess(e);
                return oldValue;
            }
        }
        ++modCount;
        if (++size > threshold)
            resize();
        afterNodeInsertion(evict);
        return null;
    }

3、擴容

JDK8，HashMap的擴容大小與之前的一致，主要區別在于擴容后紅黑樹如何插入到新的陣列

（原始碼過于復雜就不放上來了）紅黑樹里的元素遷移，不需要像1.7那樣進行重新計算
因為陣列長度整加了一倍，可以直接分為兩組資料，一組保留原位置，另一組分到新增加的陣列

遷移后的紅黑樹，如果同一陣列上的元素小于6個，就會自動去樹化成鏈表

四、jdk7和8的區別

JDK8鏈表會變成紅黑樹，加快查詢
新節點插入的鏈表的方式不同，JDK7是頭插法，JDK8是尾插法因為需要遍歷鏈表變成紅黑樹
JDK8的hash演算法進行了簡化
resize的邏輯修改，JDK7可能會出現死鎖
JDK7鍵值對Entry，在構造方法時創建；JDK8稱為Node，在put第一個元素時創建

五、equals和hashcode

equals繼承Object，對參考物件的比較是通過物件的記憶體地址，
hashcode用于存入陣列索引的計算，get和put的時候都會呼叫equals進行判斷
所以想要保證，equals相等hashcode也相等，物件不同的時候hashcode也要不同

六、安全的HashMap

Collections.synchronizedMap
- 通過構造器傳入的mutex引數作為互斥鎖，底層代碼是通過synchronized同步代碼塊實作的
Hashtable
- get put方法都會加synchronized鎖
- 初始容量為11，擴容為翻倍+1
- 不能存放null值，因為沒有使用集合類的fail-fast安全機制(modcount標志來判斷遍歷的程序中是否遭到了修改，所以很多的集合類都不支持多執行緒)
ConcurrentHashMap
- JDK7
  - 由segment陣列和HashEntry組成，hashEntry同樣為陣列+鏈表
  - 其中HashEntry使用了Volatile修飾資料
  - segment繼承了reentrantLock，即每個執行緒訪問一個segment，只鎖定該segment，不會影響到其他，所以并發度高，理論上，并發度就是segment陣列的容量，
  - put方法：找到對應的segment，嘗試加鎖，存在競爭就scanAndLockForPut()自旋獲取鎖，達到一定次的會改為互斥鎖，
  - get方法：因為HashEntry的value是使用了volatile修飾的，保證了每次讀到的值都是最新，所以不用加鎖，而且效率高
- JDK8
  - 同HashMap，也會有紅黑樹，Node鍵值對用volatile修飾，保證了可見性
  - 放棄了segment分段鎖，采用CAS+synchronized實作并發
  - put方法：
    - 計算hashcode，判斷是否為空，需要初始化
    - 定位是否為null，null使用CAS寫入
    - 不null存在元素了，判斷是否擴容，不擴容再使用synchronize關鍵字加鎖寫入
  - get方法，同JDK7
- 為什么更改為CAS+synchronized
  - 使用reentrantLock需要繼續AQS類，增加了記憶體開銷，synchronized屬于JDK級別，性能會隨著升級
  - 擴容的時候并不會對segment陣列進行擴容，擴容的是HashEntry陣列，所以隨著元素越來越多，鎖的粒度是變大的

六、常見面試題

HashMap的底層資料結構
HashMap的存取原理
- 繼承了Map介面是以鍵值對的形式保存資料，保存資料時，是將key的hashcode進行哈希計算得到的數，作為放在陣列上的索引，如果計算之后的索引相同，就在那個節點后面加一個鏈表或者紅黑樹；取資料的時候，就只要進行一次哈希計算就可以確定索引的位置，遍歷該索引上的節點就可以找打，
為啥會出現執行緒不安全
- JDK7，擴容時會出現環形鏈表的情況，因為擴容轉移時鏈表的順序會調換
- JDK8，多執行緒會出現資料覆寫的情況
有什么執行緒安全的類替換
- currentHashMap、hashTable因為性能低，只是簡單地在方法上加synchronized鎖
默認初始化大小是多少？為啥是這么多？為啥大小都是2的冪？
- 16，為了讓哈希計算結果的分布更均勻，求索引的時候是用hashcode和陣列的長度-1進行異或運算，15的二進制剛好為1111，只要hashcode的分布是均勻的，異或運算之后的數值也是均勻的
HashMap的擴容方式？負載因子是多少？為什是這么多？
- 1.7擴容需要存盤的個數大于閾值且存放新的值時剛好發生了哈希沖突，這時才會觸發擴容機制，擴容后長度為原來的兩倍，遍歷原來的entry陣列，將節點重新hash后復制到新的陣列
- 0.75，確保存的值不會太少，空間利用率低；也不會太多，導致遍歷效率低
hash的計算規則
- 將hashcode右移16位和原值進行異或運算，保證高16位和低16位參與計算，使回傳的值足夠均勻，再和陣列的長度-1進行異或計算，得到索引（異或計算：相同為1不同為0）
為什么長度為2^n
- 是為了讓哈希計算后索引的分布更均勻，減少哈希沖突，我記得原始碼里面的索引的計算是key的hashcode和陣列長度-1進行異或運算，如果長度是2^n-1,二進制就全是11111，比如16就是四個一、和hascode進行進行異或運算，只要hashcode是均勻的，計算出來的索引也會是均勻的
為什么長度超過8就會自動轉為紅黑樹
- 是根據泊松分布，負載因子為0.75時，單個hash槽內出現8個元素的概率已經很小了，就可以減少鏈表轉換為紅黑樹這種比較耗時的操作，

總結

HashMap是一種利用key的hashcode來進行存盤的復雜資料結構

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/181814.html

標籤：Java

上一篇：Java的前世今生

下一篇：99 道 Java 多執行緒面試題，看完我跪了！