在子查詢中使用unnest()時，Postgresql磁區修剪不起作用-有解無憂

在子查詢中使用 unnest() 時，Postgresql (13.4) 無法提出使用執行時磁區修剪的查詢計劃。

鑒于這些表：

CREATE TABLE users (
    user_id uuid, 
    channel_id uuid, 
    CONSTRAINT user_pk PRIMARY KEY(user_id, channel_id)
) 
PARTITION BY hash(user_id);
CREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 2, REMAINDER 1);
    
CREATE TABLE channels (
    channel_id uuid, 
    user_ids uuid[],
    CONSTRAINT channel_pk PRIMARY KEY(channel_id)
) PARTITION BY hash(channel_id);

CREATE TABLE channels_0 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 0);
CREATE TABLE channels_1 partition of channels FOR VALUES WITH (MODULUS 2, REMAINDER 1);

插入一些資料：

INSERT INTO users(user_id, channel_id) VALUES('0861180b-c972-42fe-9fb3-3b55e652f893', '45205876-7270-4e06-ab8d-b5f669298422');
INSERT INTO channels(channel_id, user_ids) VALUES('45205876-7270-4e06-ab8d-b5f669298422', '{0861180b-c972-42fe-9fb3-3b55e652f893}');

INSERT INTO users 
SELECT 
    gen_random_uuid() as user_id,
    gen_random_uuid() as channel_id
FROM generate_series(1, 100);

INSERT INTO channels
SELECT
    (SELECT max(channel_id::text) FROM (SELECT channel_id FROM users ORDER BY random()*generate_series LIMIT 1) c)::uuid as channel_id,
    (SELECT array_agg(DISTINCT user_id::text) FROM (SELECT user_id FROM users ORDER BY random()*generate_series 
    LIMIT 1) u)::uuid[] as user_ids
FROM (SELECT * FROM generate_series(1, 100)) g
ON conflict DO NOTHING;

以下查詢：

EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
    SELECT unnest(user_ids) FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'

回傳一個掃描所有磁區的查詢計劃。

Hash Semi Join  (cost=8.45..37.28 rows=8 width=32) (actual time=0.208..0.387 rows=1 loops=1)
  Hash Cond: (users.user_id = (unnest(channels.user_ids)))
  ->  Append  (cost=0.00..28.71 rows=8 width=32) (actual time=0.037..0.134 rows=1 loops=1)
        ->  Seq Scan on users_0 users_1  (cost=0.00..27.00 rows=7 width=32) (actual time=0.021..0.041 rows=1 loops=1)
              Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
              Rows Removed by Filter: 45
        ->  Seq Scan on users_1 users_2  (cost=0.00..1.68 rows=1 width=32) (actual time=0.018..0.027 rows=0 loops=1)
              Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
              Rows Removed by Filter: 54
  ->  Hash  (cost=8.33..8.33 rows=10 width=16) (actual time=0.131..0.172 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  ProjectSet  (cost=0.15..8.23 rows=10 width=16) (actual time=0.060..0.114 rows=1 loops=1)
              ->  Index Scan using channels_0_pkey on channels_0 channels  (cost=0.15..8.17 rows=1 width=32) (actual time=0.040..0.059 rows=1 loops=1)
                    Index Cond: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
Planning Time: 0.363 ms
Execution Time: 0.515 ms

我希望 Postgresql 運行子查詢并查看回傳的 user_id 以確定該資料將在哪些磁區中。但是，Postgresql 正在查看該資料的所有磁區。我嘗試在頻道表中使用一行 pr user_id，這很完美。

EXPLAIN ANALYZE
SELECT * FROM users
WHERE user_id IN (
    SELECT user_id FROM channels WHERE channel_id = '45205876-7270-4e06-ab8d-b5f669298422'
)
AND channel_id = '45205876-7270-4e06-ab8d-b5f669298422'

然后，Postgresql 不會對不能保存任何資料的磁區運行任何步驟。

似乎 unnest() 導致執行時間磁區修剪不起作用。這是為什么？

解決方案：我可以確認 jjanes 的解決方案。通過向表中添加 100k 行，使用 unnest() 的查詢在執行時進行磁區修剪。

uj5u.com熱心網友回復：

您的示例僅顯示了它對一個查詢的用途，而不是它能夠做的所有事情。

你的桌子小得可笑。將另外 10,000 行放入 users 表中，以便索引實際上很重要，看看它的作用。

                                                                   QUERY PLAN                                                                   
------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=1.85..151.99 rows=2 width=32) (actual time=0.040..0.042 rows=1 loops=1)
   ->  HashAggregate  (cost=1.57..1.67 rows=10 width=16) (actual time=0.022..0.023 rows=1 loops=1)
         Group Key: unnest(channels.user_ids)
         Batches: 1  Memory Usage: 24kB
         ->  ProjectSet  (cost=0.00..1.44 rows=10 width=16) (actual time=0.016..0.019 rows=1 loops=1)
               ->  Seq Scan on channels_0 channels  (cost=0.00..1.39 rows=1 width=37) (actual time=0.013..0.016 rows=1 loops=1)
                     Filter: (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid)
                     Rows Removed by Filter: 30
   ->  Append  (cost=0.28..15.01 rows=2 width=32) (actual time=0.016..0.017 rows=1 loops=1)
         ->  Index Only Scan using users_0_pkey on users_0 users_1  (cost=0.28..7.50 rows=1 width=32) (actual time=0.014..0.015 rows=1 loops=1)
               Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
               Heap Fetches: 1
         ->  Index Only Scan using users_1_pkey on users_1 users_2  (cost=0.28..7.50 rows=1 width=32) (never executed)
               Index Cond: ((user_id = (unnest(channels.user_ids))) AND (channel_id = '45205876-7270-4e06-ab8d-b5f669298422'::uuid))
               Heap Fetches: 0
 Planning Time: 0.470 ms
 Execution Time: 0.087 ms

這(never executed)是由于執行時修剪。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/436579.html

標籤：PostgreSQL 子查询数据库分区

上一篇：計算PostgreSQL中表中列的for回圈中的平均值

下一篇：為什么在Postgres中錯誤的行估計很慢？