在子查詢中使用 unnest() 時,Postgresql (13.4) 無法提出使用執行時磁區修剪的查詢計劃。
鑒于這些表:
CREATE TABLE users (
user_id uuid,
channel_id uuid,
CONSTRAINT user_pk PRIMARY KEY(user_id, channel_id)
)
PARTITION BY hash(user_id);
CREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 1);
CREATE TABLE channels (
channel_id uuid,
user_ids uuid[],
CONSTRAINT channel_pk PRIMARY KEY(channel_id)
) PARTITION BY hash(channel_id);
CREATE TABLE channels_0 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE channels_1 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 1);
插入一些資料:
INSERT INTO users(user_id, channel_id) VALUES('0861180b-c972-42fe-9fb3-3b55e652f893', '45205876-7270-4e06-ab8d-b5f669298422');
INSERT INTO channels(channel_id, user_ids) VALUES('45205876-7270-4e06-ab8d-b5f669298422', '{0861180b-c972-42fe-9fb3-3b55e652f893}');
INSERT INTO users
SELECT
gen_random_uuid() as user_id,
gen_random_uuid() as channel_id
FROM generate_series(1, 100);
INSERT INTO channels
SELECT
(SELECT max(channel_id::text) FROM (SELECT channel_id FROM users ORDER BY random()*generate_series LIMIT 1) c)::uuid as channel_id,
(SELECT array_agg(DISTINCT user_id::text) FROM (SELECT user_id FROM users ORDER BY random()*generate_series
LIMIT 1) u)::uuid[] as user_ids
FROM (SELECT * FROM generate_series(1, 100)) g
ON conflict DO NOTHING;
以下查詢:
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT unnest(user_ids) FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
回傳一個掃描所有磁區的查詢計劃。
Hash Semi Join (cost=8.45..37.28 rows=8 width=32) (actual time=0.208..0.387 rows=1 loops=1)
Hash Cond: (users.user_id = (unnest(channels.user_ids)))
-> Append (cost=0.00..28.71 rows=8 width=32) (actual time=0.037..0.134 rows=1 loops=1)
-> Seq Scan on users_0 users_1 (cost=0.00..27.00 rows=7 width=32) (actual time=0.021..0.041 rows=1 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 45
-> Seq Scan on users_1 users_2 (cost=0.00..1.68 rows=1 width=32) (actual time=0.018..0.027 rows=0 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 54
-> Hash (cost=8.33..8.33 rows=10 width=16) (actual time=0.131..0.172 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> ProjectSet (cost=0.15..8.23 rows=10 width=16) (actual time=0.060..0.114 rows=1 loops=1)
-> Index Scan using channels_0_pkey on channels_0 channels (cost=0.15..8.17 rows=1 width=32) (actual time=0.040..0.059 rows=1 loops=1)
Index Cond: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Planning Time: 0.363 ms
Execution Time: 0.515 ms
我希望 Postgresql 運行子查詢并查看回傳的 user_id 以確定該資料將在哪些磁區中。但是,Postgresql 正在查看該資料的所有磁區。我嘗試在頻道表中使用一行 pr user_id,這很完美。
EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
SELECT user_id FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
然后,Postgresql 不會對不能保存任何資料的磁區運行任何步驟。
似乎 unnest() 導致執行時間磁區修剪不起作用。這是為什么?
解決方案:我可以確認 jjanes 的解決方案。通過向表中添加 100k 行,使用 unnest() 的查詢在執行時進行磁區修剪。
uj5u.com熱心網友回復:
您的示例僅顯示了它對一個查詢的用途,而不是它能夠做的所有事情。
你的桌子小得可笑。將另外 10,000 行放入 users 表中,以便索引實際上很重要,看看它的作用。
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=1.85..151.99 rows=2 width=32) (actual time=0.040..0.042 rows=1 loops=1)
-> HashAggregate (cost=1.57..1.67 rows=10 width=16) (actual time=0.022..0.023 rows=1 loops=1)
Group Key: unnest(channels.user_ids)
Batches: 1 Memory Usage: 24kB
-> ProjectSet (cost=0.00..1.44 rows=10 width=16) (actual time=0.016..0.019 rows=1 loops=1)
-> Seq Scan on channels_0 channels (cost=0.00..1.39 rows=1 width=37) (actual time=0.013..0.016 rows=1 loops=1)
Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Rows Removed by Filter: 30
-> Append (cost=0.28..15.01 rows=2 width=32) (actual time=0.016..0.017 rows=1 loops=1)
-> Index Only Scan using users_0_pkey on users_0 users_1 (cost=0.28..7.50 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
Heap Fetches: 1
-> Index Only Scan using users_1_pkey on users_1 users_2 (cost=0.28..7.50 rows=1 width=32) (never executed)
Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
Heap Fetches: 0
Planning Time: 0.470 ms
Execution Time: 0.087 ms
這(never executed)是由于執行時修剪。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/436579.html
標籤:PostgreSQL 子查询 数据库分区
