事務隔離級別遺留問題：

在讀已提交的級別下，事務B可以讀到事務A持有寫鎖的的記錄，且讀到的是未更新前的，為何寫讀沒有沖突？
可重復讀級別，事務B可以更新事務A理論上應該已經獲取讀鎖的記錄，且更新后，事務A依然可以讀到資料，為何讀-寫-讀沒有沖突？
在可重復讀級別，幻讀沒有產生

　　其中，前兩個問題就是因為mvcc機制（讀鎖的一種優化機制），通過不加讀鎖，避免讀寫沖突，進而提高了性能，

為什么要有MVCC機制？

在讀已提交的級別下，由于是給讀加鎖來保證讀已提交，如果事務A持有寫鎖，為了保證讀已提交，事務B必須等待事務A提交之后才可以讀；其他的讀事務也是這樣的情況，效率太低
在可重復讀級別，為了保證可重復讀，如果事務A持有讀鎖，為了第二次讀到的一樣，其他所有寫事務必須等待讀完才可以，同樣效率低

那么很自然的想到，無論讀事務是先產生還是后產生，如果這個時候還存在寫事務沒有執行，或者需要執行；那么就應該讓讀事務讀到目前最新的值，且寫事務可以更新；只不過讀事務在寫事務提交更新后，依據隔離級別是否可見最新更新即可，這就是MVCC機制的核心能力，將讀鎖干掉，

MVCC機制核心組件

MVCC機制由版本鏈、undolog、readview三大核心構成

版本鏈

猜測很多人第一次看到MVCC的版本都是和我一樣在各種各樣的博客文章上，或者可能是在一些課程專欄或者《高性能mysql》這本書的mvcc部分看到的，那么在你的理解中，版本的底層是什么樣子呢？ innodb引擎資料庫中的每一條記錄上，我們都可以認為上面有3個隱藏欄位，分別是DB_ROW_ID(不在此次討論范圍),DB_TRX_ID和DB_ROLL_PTR,如下圖一樣

在我的理解中， DB_TRX_ID就是插入或者更新時，當前事務的trx_id，由全域事務管理器分配的遞增的一個id； DB_ROLL_PTR存盤的undolog中當前記錄上一個版本的指標，先姑且記住這是一個指標， 當插入一條記錄時 在這條記錄的DB_TRX_ID填入當前事務的id，由于沒有歷史版本，所以DB_ROLL_PTR為空 當更新一條記錄時 由于這個時候存在歷史版本，所以需要將老版本的資料寫到undolog里，然后構建指標，將DB_TRX_ID更新為當前事務的id,將DB_ROLL_PTR更新為剛才構建的指標，以及更新需要更新的欄位， 當洗掉一條記錄時（這個不太確定，主觀猜測）猜測是將老記錄寫到undolog，然后構建指標，新記錄DB_TRX_ID更新為當前事務的id,將DB_ROLL_PTR更新為剛才構建的指標，但是沒有需要更新的欄位，而且mysql不會立即洗掉，記錄上有一個info_bits欄位，會標記上洗掉標識(REC_INFO_DELETED_FLAG)，后續由purge執行緒（不了解，姑且認為是個scheduleTask吧)洗掉這樣，當多次更新之后，新記錄存盤的永遠都是最新操作的事務id，并通過指標指向了老版本，老版本還指向了更老的版本...等等，最終構成了一個版本鏈

Readview

理論：

在周志明老師的鳳凰架構（或者極客時間的‘周志明的軟體架構課’）中對mvcc簡單介紹到

隔離級別是可重復讀：總是讀取 CREATE_VERSION 小于或等于當前事務 ID 的記錄，在這個前提下，如果資料仍有多個版本，則取最新（事務 ID 最大）的，隔離級別是讀已提交：總是取最新的版本即可，即最近被 Commit 的那個版本的資料記錄，

在mysql官網中是這么描述的

If the transaction isolation level is REPEATABLE READ (the default level), all consistent reads within the same transaction read the snapshot established by the first such read in that transaction. You can get a fresher snapshot for your queries by committing the current transaction and after that issuing new queries. With READ COMMITTED isolation level, each consistent read within a transaction sets and reads its own fresh snapshot.

翻譯：隔離級別是可重復讀：在同一個事務中，一致性讀總是去讀在該事務第一次讀取時生成的快照，隔離級別是讀已提交：事務中的每次讀取都取自己新生成的快照，

相比之下，周老師形容的更貼近隔離級別的概念上，官方的描述則是底層的具體實作邏輯，兩者結合一下就是可重復讀：通過在每個事物只讀取第一次select時生成的快照和undolog比較，根據一個可見性規則判斷，是否可以讀當前版本的記錄，可以就回傳，不行就繼續比較再上一個版本，直到最老的版本； 讀已提交：除了每次讀取都會使用最新的快照，后面的都和可重復讀的邏輯一樣， 為什么我這里說的是可見性規則呢？ 是因為周老師描述里“總是讀取 CREATE_VERSION 小于或等于當前事務 ID 的記錄” 很容易錯誤的理解為當前版本記錄里的trx_id<=快照創建時的事務id(create_trx_id)就都可見，真正的判斷邏輯并不只是一個create_trx_id就能搞定的，但這里先不展開講，自己想一下為什么不行，下面的圖可能會給你一點靈感，接下來我們先去讀一下“可見性規則”的底層原始碼，

可見性規則底層實作

ReadView類

storage/innobase/include/read0types.h:47
//ReadView類
class ReadView {
...
private:
  /** trx id of creating transaction, set to TRX_ID_MAX for free
  views. */
  //創建快照的時候，快照對應的事務id，只有含有寫操作的才會分配真正的事務id
  trx_id_t m_creator_trx_id;
/** Set of RW transactions that was active when this snapshot
  was taken */
  //活躍的讀寫事務id串列，從trx_sys->rw_trx_ids抄過來的
  ids_t m_ids;
/** The read should not see any transaction with trx id >= this
  value. In other words, this is the "high water mark". */
  //賦值是即將分配的下一個事務id，所以大于等于這個id的記錄對當前事務來說都是不可見的
  trx_id_t m_low_limit_id;

/** The read should see all trx ids which are strictly
  smaller (<) than this value.  In other words, this is the
  low water mark". */
  //m_ids不為空就是ids.get(0)，為空則是m_low_limit_id,所以小于這個事務id的就代表著快照建立的時候
  //已經不是活躍事務了，即已經提交了，所以一定可以看到這些事務的改動記錄
  trx_id_t m_up_limit_id;
....
  }

初始化賦值的時候

//read0read.cc
//row_search_mvcc -> trx_assign_read_view -> MVCC::view_open ->

void ReadView::prepare(trx_id_t id) {
  ut_ad(trx_sys_mutex_own());

  m_creator_trx_id = id;

  m_low_limit_no = trx_get_serialisation_min_trx_no();

  m_low_limit_id = trx_sys_get_next_trx_id_or_no();

  ut_a(m_low_limit_no <= m_low_limit_id);

  if (!trx_sys->rw_trx_ids.empty()) {
    copy_trx_ids(trx_sys->rw_trx_ids);
  } else {
    m_ids.clear();
  }

/* The first active transaction has the smallest id. */
  m_up_limit_id = !m_ids.empty() ? m_ids.front() : m_low_limit_id;

  ut_a(m_up_limit_id <= m_low_limit_id);

  ut_d(m_view_low_limit_no = m_low_limit_no);
  m_closed = false;
}

判斷某個版本的記錄是否可見？

//read0types.h
 bool changes_visible(trx_id_t id,
                                   const table_name_t &name) const {
  ut_ad(id > 0);
//如果當前版本記錄上的事務id（DB_TRX_ID）小于低水位或者等于當前事務，
//那么要么就是自己更改的，要么就是歷史上已經提交了的，所以可以讀到
  if (id < m_up_limit_id || id == m_creator_trx_id) {
    return (true);
  }

  check_trx_id_sanity(id, name);
//如果當前版本記錄上的事務id（DB_TRX_ID）大于高水位，那么就是在當前快照生成后生成的事務，一律看不到
  if (id >= m_low_limit_id) {
    return (false);
  //這一步我沒有理解，
  } else if (m_ids.empty()) {
    return (true);
  }

  const ids_t::value_type *p = m_ids.data();
//二分查找，如果活躍的事務里面沒有，那么就回傳true
//這里我是這么理解的，[低水位,高水位]包含活水和死水，即活躍的事務和已經提交的事務
//假如存在事務1是活躍的，事物2是已提交的，事務3是活躍的，我們在事務4的時候開啟快照，很明顯我們只能讀到事務2或者事務4的變更
//假如正在判斷的是事務2，因為已經經過了上面的校驗，
//所以我們知道當前版本記錄的事務m_low_limit_id（高水位）>id>=m_up_limit_id（低水位)，且不是當前事務;
//所以就需要判斷事務只要不是活躍的，那么就一定是已經提交的事務，那么就可讀
  return (!std::binary_search(p, p + m_ids.size(), id));
}

在了解可見性規則之后我們知道，快照建立的時候會初始化幾個屬性，在查詢的時候會通過changes_visible方法來判斷是否可見,而呼叫這個方法的上層就是下面這兩段邏輯，和我先前描述的類似，判斷是否可以讀當前版本的記錄，可以就回傳，不行就繼續比較再上一個版本，直到最老的版本

//row0sel.cc#row_search_mvcc
if (srv_force_recovery < 5 &&
    !lock_clust_rec_cons_read_sees(rec, index, offsets,
                                   trx_get_read_view(trx))) {
  rec_t *old_vers;
  /* The following call returns 'offsets' associated with 'old_vers' */
  err = row_sel_build_prev_vers_for_mysql(
      trx->read_view, clust_index, prebuilt, rec, &offsets, &heap,
      &old_vers, need_vrow ? &vrow : nullptr, &mtr,
      prebuilt->get_lob_undo());

  if (err != DB_SUCCESS) {
    goto lock_wait_or_error;
  }

  if (old_vers == nullptr) {
    /* The row did not exist yet in
    the read view */

    goto next_rec;
  }

  rec = old_vers;
  prev_rec = rec;
  ut_d(prev_rec_debug = row_search_debug_copy_rec_order_prefix(
           pcur, index, prev_rec, &prev_rec_debug_n_fields,
           &prev_rec_debug_buf, &prev_rec_debug_buf_size));
}
//lock0lock.cc#lock_clust_rec_cons_read_sees
bool lock_clust_rec_cons_read_sees(
    const rec_t *rec,     /*!< in: user record which should be read or
                          passed over by a read cursor */
    dict_index_t *index,  /*!< in: clustered index */
    const ulint *offsets, /*!< in: rec_get_offsets(rec, index) */
    ReadView *view)       /*!< in: consistent read view */
{
  ut_ad(index->is_clustered());
  ut_ad(page_rec_is_user_rec(rec));
  ut_ad(rec_offs_validate(rec, index, offsets));

  /* Temp-tables are not shared across connections and multiple
  transactions from different connections cannot simultaneously
  operate on same temp-table and so read of temp-table is
  always consistent read. */
  if (srv_read_only_mode || index->table->is_temporary()) {
    ut_ad(view == nullptr || index->table->is_temporary());
    return (true);
  }

  /* NOTE that we call this function while holding the search
  system latch. */

  trx_id_t trx_id = row_get_rec_trx_id(rec, index, offsets);

  return (view->changes_visible(trx_id, index->table->name));
}

事務的trx_id

在我還沒開始看mysql原始碼，只是跟著博客學習寫用例測驗的時候，我發現，開啟事務進行了第一次查詢之后，確實有生成事務id，但后面我執行了一條更新陳述句之后，原來的事務id變了；就像下面這個圖一樣，最開始只有查詢的時候是比較長的這個id,但執行了一條update陳述句后，事務id變成了一個短的，

這個時候我就產生了很多疑問？同一個事務里，事務id怎么還能變呢？變的話changes_visible里面的比較怎么算？搜索了一下之后了解到，只讀事務是不會生成事務id的，是假的！于是我又疑惑，那這個假id怎么參與changes_visible呢？也就是這個時候，我才下定決心去看原始碼，也借此理解了高低水位的設計，并認識到自己之前的理解是錯誤的，先上結論 只讀事務不會分配真正的事務id，他的值是0； 只讀事務參與change_visable的時候，create_trx_id也確實是0，是通過m_up_limit_id（低水位）來判斷是否可見的，只有在變成讀寫事務時，create_trx_id才會起效并應用； 因為值是0，所以在通過下面sql查詢的時候，那串id只是展示的時候特殊處理的

select * from information_schema.INNODB_TRX;

//trx0trx.cc#trx_start_low
 //這里可以看到只有讀寫事務才真正分配了id
 else {
trx->id = 0;

if (!trx_is_autocommit_non_locking(trx)) {
    /* If this is a read-only transaction that is writing
    to a temporary table then it needs a transaction id
    to write to the temporary table. */

    if (read_write) {
      trx_sys_mutex_enter();

      ut_ad(!srv_read_only_mode);

      trx->state.store(TRX_STATE_ACTIVE, std::memory_order_relaxed);

      trx->id = trx_sys_allocate_trx_id();

      trx_sys->rw_trx_ids.push_back(trx->id);

      trx_sys_mutex_exit();

      trx_sys_rw_trx_add(trx);

    } else {
      trx->state.store(TRX_STATE_ACTIVE, std::memory_order_relaxed);
    }
  } else {
    ut_ad(!read_write);
    trx->state.store(TRX_STATE_ACTIVE, std::memory_order_relaxed);
  }
}
//trx0trx.ic
//這里是在展示的時候對只讀事務的id做了處理
/** Retreieves the transaction ID.
In a given point in time it is guaranteed that IDs of the running
transactions are unique. The values returned by this function for readonly
transactions may be reused, so a subsequent RO transaction may get the same ID
as a RO transaction that existed in the past. The values returned by this
function should be used for printing purposes only.
@param[in]      trx     transaction whose id to retrieve
@return transaction id */
static inline trx_id_t trx_get_id_for_print(const trx_t *trx) {
  /* Readonly and transactions whose intentions are unknown (whether
  they will eventually do a WRITE) don't have trx_t::id assigned (it is
  0 for those transactions). Transaction IDs in
  information_schema.innodb_trx.trx_id,
  performance_schema.data_locks.engine_transaction_id,
  performance_schema.data_lock_waits.requesting_engine_transaction_id,
  performance_schema.data_lock_waits.blocking_engine_transaction_id
  should match because those tables
  could be used in an SQL JOIN on those columns. Also trx_t::id is
  printed by SHOW ENGINE INNODB STATUS, and in logs, so we must have the
  same value printed everywhere consistently. */

  /* DATA_TRX_ID_LEN is the storage size in bytes. */
  static const trx_id_t max_trx_id = (1ULL << (DATA_TRX_ID_LEN * CHAR_BIT)) - 1;

  ut_ad(trx->id <= max_trx_id);

  /* on some 32bit architectures casting trx_t* (4 bytes) directly to
  trx_id_t (8 bytes unsigned) does sign extension and the resulting value
  has highest 32 bits set to 1, so the number is unnecessarily huge.
  Also there is no guarantee that we will obtain the same integer each time.
  Casting to uintptr_t first, and then extending to 64 bits keeps the highest
  bits clean. */

  return (trx->id != 0
              ? trx->id
              : trx_id_t{reinterpret_cast<uintptr_t>(trx)} | (max_trx_id + 1));
}

生成快照時機（不太確定）

可重復讀：只生成一次，后面繼續使用

ReadView *trx_assign_read_view(trx_t *trx) /*!< in/out: active transaction */
{
  ut_ad(trx_can_be_handled_by_current_thread_or_is_hp_victim(trx));
  ut_ad(trx->state.load(std::memory_order_relaxed) == TRX_STATE_ACTIVE);

  if (srv_read_only_mode) {
    ut_ad(trx->read_view == nullptr);
    return (nullptr);

  } else if (!MVCC::is_view_active(trx->read_view)) {
    trx_sys->mvcc->view_open(trx->read_view, trx);
  }

  return (trx->read_view);
}

讀已提交：用完就關，所以每次再獲取就得新開，但是這里的關有兩個地方調

ha_innodb.cc#store_lock 和ha_innodb.cc#external_lock
if (lock_type != TL_IGNORE && trx->n_mysql_tables_in_use == 0) {
  trx->isolation_level =
      innobase_trx_map_isolation_level(thd_get_trx_isolation(thd));

  if (trx->isolation_level <= TRX_ISO_READ_COMMITTED &&
      MVCC::is_view_active(trx->read_view)) {
    /* At low transaction isolation levels we let
    each consistent read set its own snapshot */

    mutex_enter(&trx_sys->mutex);

    trx_sys->mvcc->view_close(trx->read_view, true);

    mutex_exit(&trx_sys->mutex);
  }
}

在學習了解MVCC機制中遇到的問題：

為什么更新操作必須使用當前讀？

更新操作后的如果不回滾那沒有事，如果要回滾，應該回滾到最新一次的提交，所以undo log里必須是最新的視圖

只讀事務突然更新的話，因為更新必須使用當前讀，那是否需要重新生成事務id?

不算做重新，是只有在觸發鎖操作時會分配真正的事務id，只讀事務分配的id其實就是0

只讀事務分配的事務id是什么東西？如何參與運作？

沒有作為判斷條件，作為判斷條件的是up_limit記錄的是最小活躍id，小于他的才能讀，正規的事務id分配的在readview結構里其實是creator_trxid，用來判斷當前是否能被當前事務看見

readview的范圍

有這個疑問還是因為最開始對change_visable掌握的不夠清晰，之前想的場景是在事務A里，如果存在兩條不同記錄甚至不同表的查詢，而事務B在第事務A兩條查詢中間的時候對第二條查詢的記錄做了更改并提交，那應該查到的是新的還是舊的，其實應該是舊的，因為就算是第二條查詢，最新版本的改動的事務id會大于等于事務A的高水位，因此只能查詢到更老的undolog里的記錄

知道了mvcc底層是undolog和readview后，怎么理解“版本”這個概念

創建版本肯定就是欄位里那個隱藏欄位，洗掉版本應該是回滾指標

在只讀視圖能查到其他事務已經洗掉并且提交的記錄嗎？

經測驗是可以的

怎么解決的幻讀？

在只讀事務下，如上文所說的事務1讀不到事務2的更新是因為事務2的版本號要大于當前快照的高水位，那對于新增的記錄來說，其版本號也是同樣的道理，因此事務1讀不到比當前快照里的高水位高的，也就避免了幻讀這種情況，

參考資料：

MySQL 8.0 MVCC 原始碼決議 - 掘金 https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html MySQL事務ID的分配時機_mysql事務id什么時候分配_哲學長的博客-CSDN博客 MYSQL innodb中的只讀事物以及事物id的分配方式_ITPUB博客 Mysql如何實作隔離級別 - 可重復讀和讀提交原始碼分析_mysql 可重復度原始碼_擇維士的博客-CSDN博客

本文來自博客園，作者：起司啊，轉載請注明原文鏈接：https://www.cnblogs.com/qisi/p/mvcc.html

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/555144.html

標籤：其他

上一篇：如何成功實施一個資料治理專案？實施步驟有哪些？

下一篇：返回列表

MySql的MVCC機制