當我對計數值執行orderby時，查詢需要很長時間才能執行-有解無憂

Django：使用 m2m 的查詢集的性能問題

我在這里問了這個問題，但沒有得到答案，所以我重新發布了一個更詳細的問題。

當我使用ORDER BY與Count匯總值，由于某些原因不使用索引和查詢需要很長的時間來執行。

該videos_video_tags列有大約 130 萬行。

以下將需要大約 500-800 毫秒。

SELECT "videos_tag"."id",
       "videos_tag"."name",
       COUNT("videos_video_tags"."video_id") AS "count"
FROM "videos_tag"
LEFT OUTER JOIN "videos_video_tags" ON ("videos_tag"."id" = "videos_video_tags"."tag_id")
GROUP BY "videos_tag"."id"
ORDER BY "count" DESC
LIMIT 100;

ORDER BY "count" DESC從此 SQL 陳述句中洗掉只需要大約2-10ms.

如果使用來查看執行計劃中的詳細資訊EXPLAIN，您會看到使用ORDER BY不使用索引的查詢沒有被使用。

                                                                         QUERY PLAN                                     
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=35198.66..35198.91 rows=100 width=37) (actual time=770.355..770.376 rows=100 loops=1)
   Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
   Buffers: shared hit=6928 read=4311
   ->  Sort  (cost=35198.66..35212.53 rows=5548 width=37) (actual time=770.354..770.366 rows=100 loops=1)
         Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
         Sort Key: (count(videos_video_tags.video_id)) DESC
         Sort Method: top-N heapsort  Memory: 37kB
         Buffers: shared hit=6928 read=4311
         ->  HashAggregate  (cost=34931.14..34986.62 rows=5548 width=37) (actual time=766.050..768.090 rows=5548 loops=1)
               Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
               Group Key: videos_tag.id
               Batches: 1  Memory Usage: 977kB
               Buffers: shared hit=6928 read=4311
               ->  Hash Right Join  (cost=221.83..28246.14 rows=1337000 width=45) (actual time=2.840..497.697 rows=1337000 loops=1)
                     Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
                     Inner Unique: true
                     Hash Cond: (videos_video_tags.tag_id = videos_tag.id)
                     Buffers: shared hit=6928 read=4311
                     ->  Seq Scan on public.videos_video_tags  (cost=0.00..24512.00 rows=1337000 width=32) (actual time=0.008..109.061 rows=1337000 loops=1)
                           Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
                           Buffers: shared hit=6831 read=4311
                     ->  Hash  (cost=152.48..152.48 rows=5548 width=29) (actual time=2.795..2.796 rows=5548 loops=1)
                           Output: videos_tag.id, videos_tag.name
                           Buckets: 8192  Batches: 1  Memory Usage: 399kB
                           Buffers: shared hit=97
                           ->  Seq Scan on public.videos_tag  (cost=0.00..152.48 rows=5548 width=29) (actual time=0.008..1.048 rows=5548 loops=1)
                                 Output: videos_tag.id, videos_tag.name
                                 Buffers: shared hit=97
 Planning:
   Buffers: shared hit=14
 Planning Time: 0.497 ms
 Execution Time: 770.812 ms
(32 rows)

Time: 772.336 ms

如果您沒有使用 ORDER BY，您將看到以下內容

                                                                                         QUERY PLAN                     
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.71..1689.61 rows=100 width=37) (actual time=0.069..9.664 rows=100 loops=1)
   Output: videos_tag.id, videos_tag.name, (count(videos_video_tags.video_id))
   Buffers: shared hit=7761
   ->  GroupAggregate  (cost=0.71..93700.72 rows=5548 width=37) (actual time=0.069..9.647 rows=100 loops=1)
         Output: videos_tag.id, videos_tag.name, count(videos_video_tags.video_id)
         Group Key: videos_tag.id
         Buffers: shared hit=7761
         ->  Merge Left Join  (cost=0.71..86960.24 rows=1337000 width=45) (actual time=0.060..8.222 rows=11375 loops=1)
               Output: videos_tag.id, videos_tag.name, videos_video_tags.video_id
               Merge Cond: (videos_tag.id = videos_video_tags.tag_id)
               Buffers: shared hit=7761
               ->  Index Scan using videos_tag_pkey on public.videos_tag  (cost=0.28..635.50 rows=5548 width=29) (actual time=0.011..0.066 rows=101 loops=1)
                     Output: videos_tag.id, videos_tag.name, videos_tag.is_actress, videos_tag.created_at
                     Buffers: shared hit=102
               ->  Index Scan using videos_video_tags_tag_id_2673cfc8 on public.videos_video_tags  (cost=0.43..69598.37 rows=1337000 width=32) (actual time=0.012..5.928 rows=11375 loops=1)
                     Output: videos_video_tags.id, videos_video_tags.video_id, videos_video_tags.tag_id
                     Buffers: shared hit=7659
 Planning:
   Buffers: shared hit=14
 Planning Time: 0.364 ms
 Execution Time: 9.734 ms
(21 rows)

Time: 10.639 ms

我認為 index 也沒有任何問題。

public | videos_tag_name_key                                          | index | postgres | videos_tag
public | videos_tag_pkey                                              | index | postgres | videos_tag
public | videos_video_tags_pkey                                       | index | postgres | videos_video_tags
public | videos_video_tags_tag_id_2673cfc8                            | index | postgres | videos_video_tags
public | videos_video_tags_video_id_8220dbb8                          | index | postgres | videos_video_tags
public | videos_video_tags_video_id_tag_id_f8d6ba70_uniq              | index | postgres | videos_video_tags

I have spent quite a bit of time on this and still have not been able to solve it. What do you think could be the cause?

uj5u.com熱心網友回復：

索引不直接用于選擇性。它們用于生成已按欄位排序的行，這對于連接和分組都很有用。一旦 100 個組以某種方便的（對系統而言）順序被擠出，它就可以停止，而且非常早。

但是對于 ORDER BY，除非您知道所有組的計數，否則您無法按所有組的計數進行排序。沒有提前停止的機會。由于這是使用索引的主要優勢，一旦機會消失，就沒有理由再使用索引了。無論如何，當它必須運行到完成時，散列連接可能更有效。

那么我該怎么做呢？由于我們正在實作分頁，因此需要按計數排序并獲得前100名左右

不要在資料庫中實作分頁。5548 不是很多，計算一次然后將它們全部發送到客戶端或應用程式服務器，讓它自己處理分頁。而且這不會很快改變，所以使用物化視圖來存盤摘要并每小時重新計算一次。

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/388209.html

標籤：sql postgresql

上一篇：Grafana列在查詢構建器中沒有值建議

下一篇：如何在F運算式中使用Substr