解讀數倉常用模糊查詢的優化方法-有解無憂

摘要：本文講解了GaussDB(DWS)上模糊查詢常用的性能優化方法，通過創建索引，能夠提升多種場景下模糊查詢陳述句的執行速度，

本文分享自華為云社區《GaussDB(DWS) 模糊查詢性能優化》，作者：黎明的風，

在使用GaussDB(DWS)時，通過like進行模糊查詢，有時會遇到查詢性能慢的問題，

（一）LIKE模糊查詢

通常的查詢陳述句如下：

select * from t1 where c1 like 'A123%';

當表t1的資料量大時，使用like進行模糊查詢，查詢的速度非常慢，

通過explain查看該陳述句生成的查詢計劃：

test=# explain select * from t1 where c1 like 'A123%';
                                 QUERY PLAN 
-----------------------------------------------------------------------------
  id |          operation           | E-rows | E-memory | E-width | E-costs 
 ----+------------------------------+--------+----------+---------+---------
 1 | ->  Streaming (type: GATHER) | 1 | | 8 | 16.25 
 2 | ->  Seq Scan on t1        | 1 | 1MB      | 8 | 10.25 
 Predicate Information (identified by plan id)
 ---------------------------------------------
 2 --Seq Scan on t1
         Filter: (c1 ~~ 'A123%'::text)

查詢計劃顯示對表t1進行了全表掃描，因此在表t1資料量大的時候執行速度會比較慢，

上面查詢的模糊匹配條件 'A123%'，我們稱它為后模糊匹配，這種場景，可以通過建立一個BTREE索引來提升查詢性能，

建立索引時需要根據欄位資料型別設定索引對應的operator，對于text，varchar和char分別設定和text_pattern_ops，varchar_pattern_ops和bpchar_pattern_ops，

例如上面例子里的c1列的型別為text，創建索引時增加text_pattern_ops，建立索引的陳述句如下：

CREATE INDEX ON t1 (c1 text_pattern_ops);

增加索引后列印查詢計劃：

test=# explain select * from t1 where c1 like 'A123%';
                                       QUERY PLAN 
----------------------------------------------------------------------------------------
  id |                operation                | E-rows | E-memory | E-width | E-costs 
 ----+-----------------------------------------+--------+----------+---------+---------
 1 | ->  Streaming (type: GATHER)            | 1 | | 8 | 14.27 
 2 | -> Index Scan using t1_c1_idx on t1 | 1 | 1MB      | 8 | 8.27 
             Predicate Information (identified by plan id)             
 ----------------------------------------------------------------------
 2 --Index Scan using t1_c1_idx on t1
 Index Cond: ((c1 ~>=~ 'A123'::text) AND (c1 ~<~ 'A124'::text))
         Filter: (c1 ~~ 'A123%'::text)

在創建索引后，可以看到陳述句執行時會使用到前面創建的索引，執行速度會變快，

前面遇到的問題使用的查詢條件是后綴的模糊查詢，如果使用的是前綴的模糊查詢，我們可以看一下查詢計劃是否有使用到索引，

test=# explain select * from t1 where c1 like '%A123';
                                 QUERY PLAN 
-----------------------------------------------------------------------------
  id |          operation           | E-rows | E-memory | E-width | E-costs 
 ----+------------------------------+--------+----------+---------+---------
 1 | ->  Streaming (type: GATHER) | 1 | | 8 | 16.25 
 2 | ->  Seq Scan on t1        | 1 | 1MB      | 8 | 10.25 
 Predicate Information (identified by plan id)
 ---------------------------------------------
 2 --Seq Scan on t1
         Filter: (c1 ~~ '%A123'::text)

如上圖所示，當查詢條件變成前綴的模糊查詢，之前建的索引將不能使用到，查詢執行時進行了全表的掃描，

這種情況，我們可以使用翻轉函式（reverse），建立一個索引來支持前模糊的查詢，建立索引的陳述句如下：

CREATE INDEX ON t1 (reverse(c1) text_pattern_ops);

將查詢陳述句的條件采用reverse函式進行改寫之后，輸出查詢計劃：

test=# explain select * from t1 where reverse(c1) like 'A123%';
                                        QUERY PLAN 
------------------------------------------------------------------------------------------
  id |           operation           | E-rows | E-memory | E-width | E-costs 
 ----+-------------------------------+--------+----------+---------+---------
 1 | ->  Streaming (type: GATHER)  | 5 | | 8 | 14.06 
 2 | ->  Bitmap Heap Scan on t1 | 5 | 1MB      | 8 | 8.06 
 3 | ->  Bitmap Index Scan   | 5 | 1MB      | 0 | 4.28 
                      Predicate Information (identified by plan id)                      
 ----------------------------------------------------------------------------------------
 2 --Bitmap Heap Scan on t1
         Filter: (reverse(c1) ~~ 'A123%'::text)
 3 --Bitmap Index Scan
 Index Cond: ((reverse(c1) ~>=~ 'A123'::text) AND (reverse(c1) ~<~ 'A124'::text))

陳述句經過改寫后，可以走索引，查詢性能得到提升，

（二）指定collate來創建索引

如果使用默認的index ops class時，要使b-tree索引支持模糊的查詢，就需要在查詢和建索引時都指定collate="C"，

注意：索引和查詢條件的collate都一致的情況下才能使用索引，

創建索引的陳述句為：

CREATE INDEX ON t1 (c1 collate "C");

查詢陳述句的where條件中需要增加collate的設定：

test=# explain select * from t1 where c1 like 'A123%' collate "C";
                                       QUERY PLAN 
----------------------------------------------------------------------------------------
  id |                operation                | E-rows | E-memory | E-width | E-costs 
 ----+-----------------------------------------+--------+----------+---------+---------
 1 | ->  Streaming (type: GATHER)            | 1 | | 8 | 14.27 
 2 | -> Index Scan using t1_c1_idx on t1 | 1 | 1MB      | 8 | 8.27 
           Predicate Information (identified by plan id)           
 ------------------------------------------------------------------
 2 --Index Scan using t1_c1_idx on t1
 Index Cond: ((c1 >= 'A123'::text) AND (c1 < 'A124'::text))
         Filter: (c1 ~~ 'A123%'::text COLLATE "C")

（三）GIN倒排索引

GIN（Generalized Inverted Index）通用倒排索引，設計為處理索引項為組合值的情況，查詢時需要通過索引搜索出出現在組合值中的特定元素值，例如，檔案是由多個單詞組成，需要查詢出檔案中包含的特定單詞，

下面舉例說明GIN索引的使用方法：

create table gin_test_data(id int, chepai varchar(10), shenfenzheng varchar(20), duanxin text) distribute by hash (id);
create index chepai_idx on gin_test_data using gin(to_tsvector('ngram', chepai)) with (fastupdate=on);

上述陳述句在車牌的列上建立了一個GIN倒排索引，

如果要根據車牌進行模糊查詢，可以使用下面的陳述句：

select count(*) from gin_test_data where to_tsvector('ngram', chepai) @@ to_tsquery('ngram', '湘F');

這個陳述句的查詢計劃如下：

test=# explain select count(*) from gin_test_data where to_tsvector('ngram', chepai) @@ to_tsquery('ngram', '湘F'); 
                                           QUERY PLAN 
------------------------------------------------------------------------------------------------
  id |                   operation                    | E-rows | E-memory | E-width | E-costs 
 ----+------------------------------------------------+--------+----------+---------+---------
 1 | ->  Aggregate | 1 | | 8 | 18.03 
 2 | ->  Streaming (type: GATHER)                | 1 | | 8 | 18.03 
 3 | ->  Aggregate | 1 | 1MB      | 8 | 12.03 
 4 | ->  Bitmap Heap Scan on gin_test_data | 1 | 1MB      | 0 | 12.02 
 5 | ->  Bitmap Index Scan              | 1 | 1MB      | 0 | 8.00 
                         Predicate Information (identified by plan id)                         
 ----------------------------------------------------------------------------------------------
 4 --Bitmap Heap Scan on gin_test_data
         Recheck Cond: (to_tsvector('ngram'::regconfig, (chepai)::text) @@ '''湘f'''::tsquery)
 5 --Bitmap Index Scan
 Index Cond: (to_tsvector('ngram'::regconfig, (chepai)::text) @@ '''湘f'''::tsquery)

查詢中使用了倒排索引，因此有比較的好的執行性能，

點擊關注，第一時間了解華為云新鮮技術~

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/534214.html

標籤：其他

上一篇：ElasticSearch深度分頁詳解

下一篇：還在為資料庫事務一致性檢測而苦惱？讓Elle幫幫你，以TDSQL為例我們測測 | DB·洞見#7