為什么Postgres全表掃描這么慢？-有解無憂

我正在測驗 Postgres 全表掃描（無索引）的性能，速度非常慢。

以下是在 AWS 的新 db.m5.8xlarge 機器上運行的：

CREATE TABLE test100m AS SELECT * FROM GENERATE_SERIES(1, 100000000) AS id;

SET max_parallel_workers_per_gather = 6;

EXPLAIN ANALYZE SELECT max(id) FROM test100m;

結果：

                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=651812.03..651812.04 rows=1 width=4) (actual time=1817.850..1819.931 rows=1 loops=1)
   ->  Gather  (cost=651811.40..651812.01 rows=6 width=4) (actual time=1817.788..1819.921 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         ->  Partial Aggregate  (cost=650811.40..650811.41 rows=1 width=4) (actual time=1814.193..1814.194 rows=1 loops=7)
               ->  Parallel Seq Scan on test100m  (cost=0.00..609144.72 rows=16666672 width=4) (actual time=0.003..902.986 rows=14285714 loops=7)
 Planning Time: 0.055 ms
 Execution Time: 1819.953 ms

因此，掃描 100M 行需要 1800 毫秒。在我的筆記本電腦上（限制為 6 核，性能與 db.m5.8xlarge 相似），掃描 100M 陣列條目需要 38 毫秒：

func TestTiming(t *testing.T) {
    {
        data := make([]int, 100000000)
        for i := 0; i < len(data); i   {
            data[i] = i
        }
        start := time.Now()
        max := data[0]
        for i := 0; i < len(data); i   {
            if max < data[i] {
                max = data[i]
            }
        }
        fmt.Printf("Timing: 100,000,000 %s\n", time.Since(start))
    }
}

這是大約50倍的差異。當然，我不是在這里將蘋果與蘋果進行比較，但我仍然預計性能差異會小得多。所有資料都可以輕松放入記憶體。

Postgres 在全表掃描上的性能能以某種方式顯著提高嗎？（除了增加 max_parallel_workers_per_gather）慢 50 倍有什么作用？

更新：

包括更詳細的查詢計劃：

> EXPLAIN (ANALYZE, BUFFERS, TIMING) SELECT max(id) FROM test100m;


                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=651812.03..651812.04 rows=1 width=4) (actual time=1953.561..1955.891 rows=1 loops=1)
   Buffers: shared hit=442478
   ->  Gather  (cost=651811.40..651812.01 rows=6 width=4) (actual time=1953.505..1955.885 rows=7 loops=1)
         Workers Planned: 6
         Workers Launched: 6
         Buffers: shared hit=442478
         ->  Partial Aggregate  (cost=650811.40..650811.41 rows=1 width=4) (actual time=1950.497..1950.497 rows=1 loops=7)
               Buffers: shared hit=442478
               ->  Parallel Seq Scan on test100m  (cost=0.00..609144.72 rows=16666672 width=4) (actual time=0.004..916.197 rows=14285714 loops=7)
                     Buffers: shared hit=442478
 Planning Time: 0.059 ms
 Execution Time: 1955.916 ms

uj5u.com熱心網友回復：

50 倍的差異，與僅僅掃描記憶體中的陣列相比，并不奇怪。想象一下資料庫需要做什么：

檢查所需的資料塊是否在快取中（假設使用一些復雜的演算法，一些鎖定以避免競爭條件等）；
決議每個 8kiB 的資料塊并將其轉換為某種記憶體存盤格式；
檢查這種記憶體存盤格式的每一行，如果它被當前事務可見（另一個復雜的演算法，需要避免事務 id 回繞引起的問題）；
在每一行上應用查詢計劃的一些內部表示，這比僅比較兩個整數要復雜得多；
從所有并行作業人員收集結果，這需要大量鎖定和同步以避免損壞記憶體；
合并并行查詢的結果，使用另一種復雜的鎖定演算法等。

這仍然被簡化了。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/529652.html

標籤：PostgreSQL表现查询优化数据库性能

上一篇：串列到串列字典（Python優化）

下一篇：有沒有辦法從另一個子行程跟蹤子行程的CPU和記憶體使用情況