我正在嘗試查詢包含大約 150 萬條記錄的表。我有索引,它表現良好。
但是,我想獲得不同列的 COUNT 列之一(有許多重復項)。當我做 DISTINCT 而不是它慢 10 倍時。
這是查詢:
SELECT
created_at,
SUM(amount) as total,
COUNT(DISTINCT partner_id) as count_partners
FROM
consumption
WHERE
is_official = true
AND
(is_processed = true OR is_deferred = true)
GROUP BY created_at
這需要 2.5 秒
如果我做到了:
COUNT(partner_id) as count_partners
它需要 230 毫秒。但這不是我想要的。
我想要每個分組(日期)的一組獨特的合作伙伴以及他們在該期間消耗的金額的總和。
我不明白為什么這要慢得多。PostgreSQL 似乎非常快地創建了一個包含所有重復項的陣列,為什么簡單地添加 DISTINCT 會破壞它的性能?
查詢計劃:
GroupAggregate (cost=85780.70..91461.63 rows=12 width=24) (actual time=1019.428..2641.434 rows=13 loops=1)
Output: created_at, sum(amount), count(DISTINCT partner_id)"
Group Key: p.created_at
Buffers: shared hit=16487
-> Sort (cost=85780.70..87200.90 rows=568081 width=16) (actual time=865.599..945.674 rows=568318 loops=1)
Output: created_at, amount, partner_id
Sort Key: p.created_at
Sort Method: quicksort Memory: 62799kB
Buffers: shared hit=16487
-> Seq Scan on public.consumption p (cost=0.00..31484.26 rows=568081 width=16) (actual time=0.020..272.126 rows=568318 loops=1)
Output: created_at, amount, partner_id
Filter: (p.is_official AND (p.is_deferred OR p.is_processed))
Rows Removed by Filter: 931408
Buffers: shared hit=16487
Planning Time: 0.191 ms
Execution Time: 2647.629 ms
索引:
CREATE INDEX IF NOT EXISTS i_pid ON consumption (partner_id);
CREATE INDEX IF NOT EXISTS i_processed ON consumption (is_processed);
CREATE INDEX IF NOT EXISTS i_official ON consumption (is_official);
CREATE INDEX IF NOT EXISTS i_deferred ON consumption (is_deferred);
CREATE INDEX IF NOT EXISTS i_created ON consumption (created_at);
uj5u.com熱心網友回復:
以下查詢應該能夠從索引中受益。
SELECT
created_at,
SUM(amount) AS total,
COUNT(DISTINCT partner_id) AS count_partners
FROM
(SELECT
created_at,
sum(amount) as amount,
partner_id
FROM consumption
WHERE is_official = true
AND (is_processed = true OR is_deferred = true)
GROUP BY
created_at,
partner_id
) AS c
GROUP BY created_at;
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/469543.html
標籤:PostgreSQL
