Django:使用 m2m 的查詢集的性能問題
我在這里問了這個問題,但沒有得到答案,所以我重新發布了一個更詳細的問題。
當我使用ORDER BY與Count匯總值,由于某些原因不使用索引和查詢需要很長的時間來執行。
該videos_video_tags列有大約 130 萬行。
以下將需要大約 500-800 毫秒。
SELECT "videos_tag"."id",
"videos_tag"."name",
COUNT("videos_video_tags"."video_id") AS "count"
FROM "videos_tag"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_tag"."id" = "videos_video_tags"."tag_id")
GROUP BY "videos_tag"."id"
ORDER BY "count" DESC
LIMIT 100;
ORDER BY "count" DESC從此 SQL 陳述句中洗掉只需要大約2-10ms.
如果使用 來查看執行計劃中的詳細資訊EXPLAIN,您會看到使用ORDER BY不使用索引的查詢沒有被使用。
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=35198.66..35198.91 rows=100 width=37) (actual time=770.355..770.376 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Buffers: shared hit=6928 read=4311
-> Sort (cost=35198.66..35212.53 rows=5548 width=37) (actual time=770.354..770.366 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Sort Key: (count(videos_video_tags.video_id)) DESC
Sort Method: top-N heapsort Memory: 37kB
Buffers: shared hit=6928 read=4311
-> HashAggregate (cost=34931.14..34986.62 rows=5548 width=37) (actual time=766.050..768.090 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
Group Key: videos_tag.id
Batches: 1 Memory Usage: 977kB
Buffers: shared hit=6928 read=4311
-> Hash Right Join (cost=221.83..28246.14 rows=1337000 width=45) (actual time=2.840..497.697 rows=1337000 loops=1)
Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
Inner Unique: true
Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
Buffers: shared hit=6928 read=4311
-> Seq Scan on public.videos_video_tags (cost=0.00..24512.00 rows=1337000 width=32) (actual time=0.008..109.061 rows=1337000 loops=1)
Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
Buffers: shared hit=6831 read=4311
-> Hash (cost=152.48..152.48 rows=5548 width=29) (actual time=2.795..2.796 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name
Buckets: 8192 Batches: 1 Memory Usage: 399kB
Buffers: shared hit=97
-> Seq Scan on public.videos_tag (cost=0.00..152.48 rows=5548 width=29) (actual time=0.008..1.048 rows=5548 loops=1)
Output: videos_tag.id, videos_tag.name
Buffers: shared hit=97
Planning:
Buffers: shared hit=14
Planning Time: 0.497 ms
Execution Time: 770.812 ms
(32 rows)
Time: 772.336 ms
如果您沒有使用 ORDER BY,您將看到以下內容
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.71..1689.61 rows=100 width=37) (actual time=0.069..9.664 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
Buffers: shared hit=7761
-> GroupAggregate (cost=0.71..93700.72 rows=5548 width=37) (actual time=0.069..9.647 rows=100 loops=1)
Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
Group Key: videos_tag.id
Buffers: shared hit=7761
-> Merge Left Join (cost=0.71..86960.24 rows=1337000 width=45) (actual time=0.060..8.222 rows=11375 loops=1)
Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
Merge Cond: (videos_tag.id = videos_video_tags.tag_id)
Buffers: shared hit=7761
-> Index Scan using videos_tag_pkey on public.videos_tag (cost=0.28..635.50 rows=5548 width=29) (actual time=0.011..0.066 rows=101 loops=1)
Output: videos_tag.id, videos_tag.name, videos_tag.is_actress, videos_tag.created_at
Buffers: shared hit=102
-> Index Scan using videos_video_tags_tag_id_2673cfc8 on public.videos_video_tags (cost=0.43..69598.37 rows=1337000 width=32) (actual time=0.012..5.928 rows=11375 loops=1)
Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
Buffers: shared hit=7659
Planning:
Buffers: shared hit=14
Planning Time: 0.364 ms
Execution Time: 9.734 ms
(21 rows)
Time: 10.639 ms
我認為 index 也沒有任何問題。
public | videos_tag_name_key | index | postgres | videos_tag
public | videos_tag_pkey | index | postgres | videos_tag
public | videos_video_tags_pkey | index | postgres | videos_video_tags
public | videos_video_tags_tag_id_2673cfc8 | index | postgres | videos_video_tags
public | videos_video_tags_video_id_8220dbb8 | index | postgres | videos_video_tags
public | videos_video_tags_video_id_tag_id_f8d6ba70_uniq | index | postgres | videos_video_tags
I have spent quite a bit of time on this and still have not been able to solve it. What do you think could be the cause?
uj5u.com熱心網友回復:
索引不直接用于選擇性。它們用于生成已按欄位排序的行,這對于連接和分組都很有用。一旦 100 個組以某種方便的(對系統而言)順序被擠出,它就可以停止,而且非常早。
但是對于 ORDER BY,除非您知道所有組的計數,否則您無法按所有組的計數進行排序。沒有提前停止的機會。由于這是使用索引的主要優勢,一旦機會消失,就沒有理由再使用索引了。無論如何,當它必須運行到完成時,散列連接可能更有效。
那么我該怎么做呢?由于我們正在實作分頁,因此需要按計數排序并獲得前100名左右
不要在資料庫中實作分頁。5548 不是很多,計算一次然后將它們全部發送到客戶端或應用程式服務器,讓它自己處理分頁。而且這不會很快改變,所以使用物化視圖來存盤摘要并每小時重新計算一次。
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/388209.html
標籤:sql postgresql
下一篇:如何在F運算式中使用Substr
