摘要:本文講解了GaussDB(DWS)上模糊查詢常用的性能優化方法,通過創建索引,能夠提升多種場景下模糊查詢陳述句的執行速度,
本文分享自華為云社區《GaussDB(DWS) 模糊查詢性能優化》,作者: 黎明的風 ,
在使用GaussDB(DWS)時,通過like進行模糊查詢,有時會遇到查詢性能慢的問題,
(一)LIKE模糊查詢
通常的查詢陳述句如下:
select * from t1 where c1 like 'A123%';
當表t1的資料量大時,使用like進行模糊查詢,查詢的速度非常慢,
通過explain查看該陳述句生成的查詢計劃:
test=# explain select * from t1 where c1 like 'A123%'; QUERY PLAN ----------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | 8 | 16.25 2 | -> Seq Scan on t1 | 1 | 1MB | 8 | 10.25 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on t1 Filter: (c1 ~~ 'A123%'::text)
查詢計劃顯示對表t1進行了全表掃描,因此在表t1資料量大的時候執行速度會比較慢,
上面查詢的模糊匹配條件 'A123%',我們稱它為后模糊匹配,這種場景,可以通過建立一個BTREE索引來提升查詢性能,
建立索引時需要根據欄位資料型別設定索引對應的operator,對于text,varchar和char分別設定和text_pattern_ops,varchar_pattern_ops和bpchar_pattern_ops,
例如上面例子里的c1列的型別為text,創建索引時增加text_pattern_ops,建立索引的陳述句如下:
CREATE INDEX ON t1 (c1 text_pattern_ops);
增加索引后列印查詢計劃:
test=# explain select * from t1 where c1 like 'A123%'; QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | 8 | 14.27 2 | -> Index Scan using t1_c1_idx on t1 | 1 | 1MB | 8 | 8.27 Predicate Information (identified by plan id) ---------------------------------------------------------------------- 2 --Index Scan using t1_c1_idx on t1 Index Cond: ((c1 ~>=~ 'A123'::text) AND (c1 ~<~ 'A124'::text)) Filter: (c1 ~~ 'A123%'::text)
在創建索引后,可以看到陳述句執行時會使用到前面創建的索引,執行速度會變快,
前面遇到的問題使用的查詢條件是后綴的模糊查詢,如果使用的是前綴的模糊查詢,我們可以看一下查詢計劃是否有使用到索引,
test=# explain select * from t1 where c1 like '%A123'; QUERY PLAN ----------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | 8 | 16.25 2 | -> Seq Scan on t1 | 1 | 1MB | 8 | 10.25 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on t1 Filter: (c1 ~~ '%A123'::text)
如上圖所示,當查詢條件變成前綴的模糊查詢,之前建的索引將不能使用到,查詢執行時進行了全表的掃描,
這種情況,我們可以使用翻轉函式(reverse),建立一個索引來支持前模糊的查詢,建立索引的陳述句如下:
CREATE INDEX ON t1 (reverse(c1) text_pattern_ops);
將查詢陳述句的條件采用reverse函式進行改寫之后,輸出查詢計劃:
test=# explain select * from t1 where reverse(c1) like 'A123%'; QUERY PLAN ------------------------------------------------------------------------------------------ id | operation | E-rows | E-memory | E-width | E-costs ----+-------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 5 | | 8 | 14.06 2 | -> Bitmap Heap Scan on t1 | 5 | 1MB | 8 | 8.06 3 | -> Bitmap Index Scan | 5 | 1MB | 0 | 4.28 Predicate Information (identified by plan id) ---------------------------------------------------------------------------------------- 2 --Bitmap Heap Scan on t1 Filter: (reverse(c1) ~~ 'A123%'::text) 3 --Bitmap Index Scan Index Cond: ((reverse(c1) ~>=~ 'A123'::text) AND (reverse(c1) ~<~ 'A124'::text))
陳述句經過改寫后,可以走索引, 查詢性能得到提升,
(二)指定collate來創建索引
如果使用默認的index ops class時,要使b-tree索引支持模糊的查詢,就需要在查詢和建索引時都指定collate="C",
注意:索引和查詢條件的collate都一致的情況下才能使用索引,
創建索引的陳述句為:
CREATE INDEX ON t1 (c1 collate "C");
查詢陳述句的where條件中需要增加collate的設定:
test=# explain select * from t1 where c1 like 'A123%' collate "C"; QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | 8 | 14.27 2 | -> Index Scan using t1_c1_idx on t1 | 1 | 1MB | 8 | 8.27 Predicate Information (identified by plan id) ------------------------------------------------------------------ 2 --Index Scan using t1_c1_idx on t1 Index Cond: ((c1 >= 'A123'::text) AND (c1 < 'A124'::text)) Filter: (c1 ~~ 'A123%'::text COLLATE "C")
(三)GIN倒排索引
GIN(Generalized Inverted Index)通用倒排索引,設計為處理索引項為組合值的情況,查詢時需要通過索引搜索出出現在組合值中的特定元素值,例如,檔案是由多個單詞組成,需要查詢出檔案中包含的特定單詞,
下面舉例說明GIN索引的使用方法:
create table gin_test_data(id int, chepai varchar(10), shenfenzheng varchar(20), duanxin text) distribute by hash (id); create index chepai_idx on gin_test_data using gin(to_tsvector('ngram', chepai)) with (fastupdate=on);
上述陳述句在車牌的列上建立了一個GIN倒排索引,
如果要根據車牌進行模糊查詢,可以使用下面的陳述句:
select count(*) from gin_test_data where to_tsvector('ngram', chepai) @@ to_tsquery('ngram', '湘F');
這個陳述句的查詢計劃如下:
test=# explain select count(*) from gin_test_data where to_tsvector('ngram', chepai) @@ to_tsquery('ngram', '湘F'); QUERY PLAN ------------------------------------------------------------------------------------------------ id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------------------------+--------+----------+---------+--------- 1 | -> Aggregate | 1 | | 8 | 18.03 2 | -> Streaming (type: GATHER) | 1 | | 8 | 18.03 3 | -> Aggregate | 1 | 1MB | 8 | 12.03 4 | -> Bitmap Heap Scan on gin_test_data | 1 | 1MB | 0 | 12.02 5 | -> Bitmap Index Scan | 1 | 1MB | 0 | 8.00 Predicate Information (identified by plan id) ---------------------------------------------------------------------------------------------- 4 --Bitmap Heap Scan on gin_test_data Recheck Cond: (to_tsvector('ngram'::regconfig, (chepai)::text) @@ '''湘f'''::tsquery) 5 --Bitmap Index Scan Index Cond: (to_tsvector('ngram'::regconfig, (chepai)::text) @@ '''湘f'''::tsquery)
查詢中使用了倒排索引,因此有比較的好的執行性能,
點擊關注,第一時間了解華為云新鮮技術~
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/534214.html
標籤:其他
