MySQL8.0-INFORMATION_SCHEMA增強-有解無憂

導讀

作者：Gopal Shankar

翻譯：徐晨亮

原文地址： https://mysqlserverteam.com/mysql-8-0-improvements-to-information_schema/

Coinciding with the new native data dictionary in MySQL 8.0, we have made a number of useful enhancements to our INFORMATION_SCHEMA subsystem design in MySQL 8.0. In this post I will first go through our legacy implementation as it has stood since MySQL 5.1, and then cover what’s changed.

與MySQL 8.0原生資料字典一致，在MySQL 8.0的 INFORMATION_SCHEMA 子系統設計中，我們做了一些很有用的增強，在這篇文章中，我將會介紹自MySQL 5.1以來的舊的實作方式，然后介紹我們做了什么改變，

Background

INFORMATION_SCHEMA was first introduced into MySQL 5.0, as a standards compliant way of retrieving meta data from a running MySQL server. When we look at the history of INFORMATION_SCHEMA there have been a number of complaints about the performance of certain queries, particularly in the case that there are many database objects (schemas, tables etc).

INFORMATION_SCHEMA 首次引入MySQL 5.0，作為一種從正在運行的MySQL服務器檢索元資料的標準兼容方式，當我們回顧 INFORMATION_SCHEMA 的歷史時，對于某些特定查詢性能總是有很多的抱怨，特別是在有許多資料庫物件（schema，表等）的情況下，

In an effort to address these reported issues, since MySQL 5.1 we have made a number of performance optimizations to speed up the execution of INFORMATION_SCHEMA queries. The optimizations are described in the MySQL manual, and apply when the user provides an explicit schema name or table name in the query.

為了解決這些上報的問題，從MySQL 5.1開始，我們進行了許多性能優化來加快 INFORMATION_SCHEMA 查詢的執行速度，MySQL手冊 <鏈接1> 中描述了這些優化，當用戶在查詢中提供顯式schema名稱或表名時，將會應用這些，

Alas, despite these improvements INFORMATION_SCHEMA performance is still a major pain point for many of our users. The key reason behind these performance issues in the current INFORMATION_SCHEMA implementation is that INFORMATION_SCHEMA tables are implemented as temporary tables that are created on-the-fly during query execution. These temporary tables are populated via:

Meta data from files, e.g. table definitions from .FRM files.
Details from storage engines, e.g. dynamic table statistics.
Data from global data structures in the MySQL server.

盡管有這些改進， INFORMATION_SCHEMA的 性能仍然是我們許多用戶的主要痛點，在當前 INFORMATION_SCHEMA 實作方式下產生的性能問題背后的關鍵原因是， INFORMATION_SCHEMA 表的查詢實作方式是在查詢執行期間創建臨時表，這些臨時表通過以下方式填充：

元資料來自檔案，例如：表定義來自FRM檔案
細節來自于存盤引擎，例如：動態表的統計資訊
來自MySQL server層中全域資料結構的資料

For a MySQL server having hundreds of database, each with hundreds of tables within them, the INFORMATION_SCHEMA query would end-up doing lot of I/O reading each individual FRM files from the file system. And it would also end-up using more CPU cycles in effort to open the table and prepare related in-memory data structures. It does attempt to use the MySQL server table cache (the system variable ‘ table_definition_cache ‘), however in large server instances it’s very rare to have a table cache that is large enough to accommodate all of these tables.

對于一個MySQL實體來說可能有上百個庫，每個庫又有上百張表， INFORMATION_SCHEMA 查詢最侄訓從檔案系統中讀取每個單獨的FRM檔案，造成很多I/O讀取，并且最侄訓會消耗更多的CPU來打開表并準備相關的記憶體資料結構，它確實嘗試使用MySQL server層的表快取（系統變數 table_definition_cache ），但是在大型實體中，很少有一個足夠大的表快取來容納所有的表，

One can easily face the above mentioned performance issue if the optimization is not used by the INFORMATION_SCHEMA query. For example, let us consider the two queries below

如果 INFORMATION_SCHEMA 查詢未使用優化，則可以很容易碰到上面的性能問題，例如，讓我們考慮下面的兩個查詢

mysql > EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA. TABLES WHERE
-> TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
partitions: NULL
type: ALL
possible_keys: NULL
key: TABLE_SCHEMA,TABLE_NAME
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: Using where; Skip_open_table; Scanned 0 databases
1 row in set, 1 warning ( 0. 00 sec)

mysql > EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA. TABLES WHERE
-> TABLE_SCHEMA like 'test%' AND TABLE_NAME like 't%'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: TABLES
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
filtered: NULL
Extra: Using where; Skip_open_table; Scanned all databases
1 row in set, 1 warning ( 0. 00 sec)

As we can see from the EXPLAIN output, we see that the former query would use the values provided in WHERE clause for the TABLE_SCHEMA and TABLE_NAME field as a key to read the desired FRM files from the file system. However, the latter query would end up reading all the FRM in the entire data directory, which is very costly and does not scale.

從 EXPLAIN 的輸出可以看到，我們看到前一個查詢將使用 WHERE 子句中為 TABLE_SCHEMA 和 TABLE_NAME 欄位提供的值作為鍵，從檔案系統讀取所需 FRM 檔案，但是，后一個查詢最終將讀取整個資料目錄中的所有 FRM ，這非常昂貴且無法擴展，

Changes in MySQL 8.0

One of the major changes in 8.0 is the introduction of a native data dictionary based on InnoDB. This change has enabled us to get rid of file-based metadata store ( FRM files) and also help MySQL to move towards supporting transactional DDL. For more details on introduction of data dictionary feature in 8.0 and its benefits, please look at Staale’s post here.

8.0中的一個主要變化是引入了基于InnoDB的資料字典，這一變化使我們能夠擺脫基于檔案的元資料存盤（ FRM 檔案），并幫助MySQL轉向支持事務DDL，有關在8.0中引入資料字典功能及其優點的更多詳細資訊，請在此處查看Staale的文章 <鏈接2> ，

Now that the metadata of all database tables is stored in transactional data dictionary tables, it enables us to design an INFORMATION_SCHEMA table as a database VIEW over the data dictionary tables. This eliminates costs such as the creation of temporary tables for each INFORMATION_SCHEMA query during execution on-the-fly, and also scanning file-system directories to find FRM files. It is also now possible to utilize the full power of the MySQL optimizer to prepare better query execution plans using indexes on data dictionary tables.

既然所有資料庫表的元資料都存盤在事務資料字典表中，它使我們能夠將INFORMATION_SCHEMA表設計為資料字典表上的資料庫視圖，這消除了成本，例如在執行期間為每個 INFORMATION_SCHEMA 查詢創建臨時表，以及掃描檔案系統目錄以查找FRM檔案，現在還可以利用MySQL優化器的全部功能，使用資料字典表上的索引來獲得更好的執行計劃，

The following diagram explains the difference in design in MySQL 5.7 and 8.0.

下面的圖解釋了MySQL 5.7和8.0設計上的區別

If we consider the above example under Background, we see that the optimizer plans to use indexes on data dictionary tables, in both the cases.

如果我們在之前介紹的背景下考慮上面的例子，我們會看到優化器在兩種情況下都會使用資料字典表上的索引，

mysql > EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA. TABLES WHERE TABLE_SCHEMA = 'test' AND TABLE_NAME = 't1';
+ --+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+
|id|select_type|table ||type |possible_keys |key ||ref |rows|filtered|Extra |
+ --+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+
| 1|SIMPLE |cat ||index |PRIMARY |name ||NULL | 1| 100. 00|Using index |
| 1|SIMPLE |sch ||eq_ref|PRIMARY,catalog_id|catalog_id || mysql. cat.id,const | 1| 100. 00|Using index |
| 1|SIMPLE |tbl ||eq_ref|schema_id |schema_id || mysql. sch.id,const | 1| 10. 00|Using index condition; Using where|
| 1|SIMPLE |col ||eq_ref|PRIMARY |PRIMARY || mysql. tbl.collation_id| 1| 100. 00|Using index |
+ --+-----------+-----++------+------------------+----------++----------------------+----+--------+----------------------------------+

mysql > EXPLAIN SELECT TABLE_NAME FROM INFORMATION_SCHEMA. TABLES WHERE TABLE_SCHEMA like 'test%' AND TABLE_NAME like 't%';
+ --+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+
|id|select_type|table ||type |possible_keys |key || ref |rows|filtered|Extra |
+ --+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+
| 1|SIMPLE |cat ||index |PRIMARY |name || NULL | 1| 100. 00|Using index |
| 1|SIMPLE |sch ||ref |PRIMARY,catalog_id|catalog_id || mysql. cat.id | 6| 16. 67|Using where; Using index |
| 1|SIMPLE |tbl ||ref |schema_id |schema_id || mysql. sch.id | 26| 1. 11|Using index condition;Using where|
| 1|SIMPLE |col ||eq_ref|PRIMARY |PRIMARY || mysql. tbl.collation_id| 1| 100. 00|Using index |
+ --+-----------+-----++------+------------------+----------++-----------------------+----+--------+---------------------------------+

When we look at performance gain with this new INFORMATION_SCHEMA design in 8.0, we see that it is much more efficient than MySQL 5.7. As an example, this query is now ~100 times faster (with 100 databases with 50 tables each). A separate blog will describe more about performance of INFORMATION_SCHEMA in 8.0.

當我們通過這個全新的8.0設計的 INFORMATION_SCHEMA 來看性能提升時，我們發現它比MySQL 5.7更有效，例如，此查詢現在快?100倍（100個資料庫，每個50個表），另外一篇博客將詳細介紹8.0 中 INFORMATION_SCHEMA 性能，

SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_TYPE, ENGINE, ROW_FORMAT
FROM information_schema. tables
WHERE TABLE_SCHEMA LIKE 'db%';

Sources of Metadata

Not all the INFORMATION_SCHEMA tables are implemented as a VIEW over the data dictionary tables in 8.0. Currently we have the following INFORMATION_SCHEMA tables designed as views:

并非所有 INFORMATION_SCHEMA 表都通過8.0中的資料字典表作為視圖實作，目前，我們將以下 INFORMATION_SCHEMA 表設計為視圖：

SCHEMATA
TABLES
COLUMNS
VIEWS
CHARACTER_SETS
COLLATIONS
COLLATION_CHARACTER_SET_APPLICABILITY
STATISTICS
KEY_COLUMN_USAGE
TABLE_CONSTRAINTS

Upcoming MySQL 8.0 versions aims to provide even the following INFORMATION_SCHEMA tables as views:

即將推出的MySQL 8.0版本將提供以下 INFORMATION_SCHEMA 表作為視圖：

EVENTS
TRIGGERS
ROUTINES
REFERENTIAL_CONSTRAINTS

To describe the INFORMATION_SCHEMA queries which are not directly implemented as VIEWs over data dictionary tables, let me first describe that there are two types of meta data which are presented in INFORMATION_SCHEMA tables:

為了描述INFORMATION_SCHEMA查詢，這些查詢沒有直接實作為資料字典表上的視圖，讓我首先描述在 INFORMATION_SCHEMA 表中有兩種型別的元資料：

Static table metadata. For example: TABLE_SCHEMA , TABLE_NAME , TABLE_TYPE , ENGINE . These statistics will be read directly from the data dictionary.
Dynamic table metadata. For example: AUTO_INCREMENT , AVG_ROW_LENGTH , DATA_FREE . Dynamic metadata frequently changes (for example: the auto_increment value will advance after each insert).In many cases the dynamic metadata will also incur some cost to accurately calculate on demand, and accuracy may not be beneficial for the typical query. Consider the case of the DATA_FREE statistic which shows the number of free bytes in a table – a cached value is usually sufficient.
靜態表元資料，例如： TABLE_SCHEMA , TABLE_NAME , TABLE_TYPE , ENGINE ，這些靜態資料將會從資料字典中直接讀取
動態表元資料，例如： AUTO_INCREMENT , AVG_ROW_LENGTH , DATA_FREE ，動態元資料經常會變更(例如：自增值會在每次插入后自增)，在許多情況下，動態元資料也會產生一些成本，以便按需準確計算，并且對于某些特定的查詢這個準確性并沒有用，考慮 DATA_FREE 統計資訊的情況，該統計資訊顯示表中的空閑位元組數 - 快取值通常就足夠了，

In MySQL 8.0, the dynamic table metadata will default to being cached. This is configurable via the setting information_schema_stats (default cached ), and can be changed to information_schema_stats=latest in order to always retrieve the dynamic information directly from the storage engine (at the cost of slightly higher query execution).

在MySQL 8.0中，動態表元資料將默認為快取，這可以通過設定 information_schema_stats （默認 快取 ）進行配置，并且可以更改為 information_schema_stats = latest ，以便始終直接從存盤引擎檢索動態資訊（以稍高的查詢執行為代價）

As an alternative, the user can also execute ANALYZE TABLE on the table, to update the cached dynamic statistics.

作為替代方案，用戶還可以在表上執行 ANALYZE TABLE ，以更新快取的動態統計資訊，

Conclusion

The INFORMATION_SCHEMA design in 8.0 is a step forward enabling:

- Simple and maintainable implementation.
- Us to get rid of numerous INFORMATION_SCHEMA legacy bugs.
- Proper use of the MySQL optimizer for INFORMATION_SCHEMA queries.
- INFORMATION_SCHEMA queries to execute ~100 times faster, compared to 5.7, when retrieving static table metadata, as show in query Q1.

8.0中的 INFORMATION_SCHEMA 設計是向前邁出的一步：

- 簡單且可維護的實作，
- 我們擺脫了很多的 INFORMATION_SCHEMA 遺留漏洞，
- 正確使用MySQL優化器進行 INFORMATION_SCHEMA 查詢，
- 與檢索靜態表元資料時的5.7相比， INFORMATION_SCHEMA 查詢執行速度快~100倍，如查詢Q1中所示，

There is more to discuss about INFORMATION_SCHEMA in 8.0. The new implementation comes with a few changes in behavior when compared to the old INFORMATION_SCHEMA implementation. Please check the MySQL manual for more details about it.

Thanks for using MySQL!

在8.0中還有更多關于 INFORMATION_SCHEMA的 討論，與舊的 INFORMATION_SCHEMA 實作相比，新的實作方式有一些變化，有關它的更多詳細資訊，請查看MySQL手冊，

感謝您使用MySQL！

鏈接1：

http://dev.mysql.com/doc/refman/5.7/en/information-schema-optimization.html

鏈接2： http://mysqlserverteam.com/a-new-data-dictionary-for-mysql/

END

掃碼加入MySQL技術Q群

（群號：529671799）

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/224343.html

標籤：其他

上一篇：MySQL對varchar欄位使用int查詢會發生什么？

下一篇：MySQL學習筆記總結(小白必看+基礎篇)