帶有SELECTDISTINCT和連接的慢Postgres查詢-有解無憂

我們有一個從 Oracle 遷移的 Postgres 資料庫。有一個查詢在 Posgres 中很慢（8-11 秒），但在 Oracle 中卻很快（1-2 秒）。它涉及一個SELECT DISTINCT，其他幾個執行緒在 Postgres 中已將其標記為問題：1、2。我們可以做一些簡單的事情嗎，比如添加索引或其他東西，而無需進行重大重寫？

該查詢將表NED_PERSON_T與自身連接起來，以獲取UNIQUEIDENTIFIER其父值在子表中指定的所有行NIHSERVAO。如下SELECT DISTINCT所示。

select
    distinct nedperson0_.ID as ID1_21_,
    nedperson0_.BUILDINGNAME as BUILDING2_21_,
    nedperson0_.CREATED_DATE as CREATED_4_21_,
    nedperson0_.CURRENT_FLAG as CURRENT_5_21_,
    /* ...   30 other columns ... */
from
    NED_PERSON_T nedperson0_ 
inner join
    NED_PERSON_T nedperson1_ 
        on (
            nedperson1_.NIHSERVAO=nedperson0_.UNIQUEIDENTIFIER 
            and nedperson0_.CURRENT_FLAG='Y' 
            and nedperson1_.CURRENT_FLAG='Y'
        ) 
order by
    nedperson0_.NIHMIXCASESN,
    nedperson0_.MIXCASE_GIVENNAME

EXPLAIN ANALYZE給出了這個報告：

Unique  (cost=362155.58..390755.66 rows=119167 width=1197) (actual time=8722.383..11202.495 rows=838 loops=1)
        Sort Key: nedperson0_.nihmixcasesn, nedperson0_.mixcase_givenname, nedperson0_.id, nedperson0_.buildingname, nedperson0_.created_date, nedperson0_.description, ... ... )
        Sort Method: external merge  Disk: 78016kB
        ->  Gather  (cost=103675.72..227466.24 rows=119167 width=1197) (actual time=2485.391..3100.424 rows=97678 loops=1)
              Workers Planned: 2
              Workers Launched: 2
              ->  Parallel Hash Join  (cost=102675.72..214549.54 rows=49653 width=1197) (actual time=2477.318..2880.555 rows=32559 loops=3)
                    Hash Cond: (nedperson1_.nihservao = nedperson0_.uniqueidentifier)
                    ->  Parallel Index Scan using ned_person_t_current_flag_idx on ned_person_t nedperson1_  (cost=0.43..93430.46 rows=58867 width=11) (actual time=7.536..815.779 rows=46800 loops=3)
                          Index Cond: (current_flag = 'Y'::bpchar)
                    ->  Parallel Hash  (cost=93430.46..93430.46 rows=58867 width=1154) (actual time=1620.284..1620.285 rows=46800 loops=3)
                          Buckets: 4096  Batches: 64  Memory Usage: 1792kB
                          ->  Parallel Index Scan using ned_person_t_current_flag_idx on ned_person_t nedperson0_  (cost=0.43..93430.46 rows=58867 width=1154) (actual time=4.602..1226.402 rows=46800 loops=3)
                                Index Cond: (current_flag = 'Y'::bpchar)
Planning Time: 3.986 ms
Execution Time: 11218.403 ms

索引：

CREATE INDEX ned_person_t_ao_test1 ON ned.ned_person_t USING btree (nihmixcasesn, mixcase_givenname, uniqueidentifier)
CREATE INDEX ned_person_t_current_flag_idx ON ned.ned_person_t USING btree (current_flag)
CREATE INDEX ned_person_t_id_idx ON ned.ned_person_t USING btree (id)
CREATE INDEX ned_person_t_mdslink_idx ON ned.ned_person_t USING btree (nihmdslinktoadnih)
CREATE INDEX ned_person_t_nedid_idx ON ned.ned_person_t USING btree (uniqueidentifier)
CREATE INDEX ned_person_t_orgacronym_idx ON ned.ned_person_t USING btree (nihorgacronym)
CREATE INDEX ned_person_t_orgstat_idx ON ned.ned_person_t USING btree (organizationalstat)
CREATE UNIQUE INDEX ned_person_t_pk ON ned.ned_person_t USING btree (id)
CREATE INDEX ned_person_t_sac_idx ON ned.ned_person_t USING btree (nihsac)
CREATE INDEX ned_person_t_temp_idx ON ned.ned_person_t USING btree (nihservao, organizationalstat, nihsac)
CREATE INDEX ned_person_t_temp_idx2 ON ned.ned_person_t USING btree (lower(nihmdslinktoadnih), current_flag, nihservao, organizationalstat, nihsac)

NED_PERSON_T 上的統計資料：

select count(*) from ned_person_t

總計數 = 1243733

select count(*) from ned_person_t where current_flag = 'Y';

CURRENT_FLAG = Y：142540

select count(*) from ned_person_t where nihservao is not null;

with non-NULL NIHSERVAO: 644852

select count(distinct nihservao) from ned_person_t;

Distinct NIHSERVAO: 928

uj5u.com熱心網友回復：

使 EXISTS 查詢快速運行所需的是一個索引(current_flag, nihservao)（或者可能是在 current_flag 上過濾的 nihservao 上的部分索引），以便它可以使用僅索引掃描來獲取 nihservao 值，而無需跳轉到表的隨機部分對于每一行。這需要保持桌子充分吸塵才能有效。

如果您的許多其他查詢也只關注當前資料，那么按 current_flag 對資料進行磁區可能是有意義的，以使相關資料更密集地打包。不過，這比僅僅創建一個索引更加激烈。

uj5u.com熱心網友回復：

您的EXISTS查詢看起來不錯。

1243733 行中只有 142540 行具有CURRENT_FLAG='Y'.
并且只有 644852 有nihservao IS NOT NULL. （與第一行相結合很重要。）
沒有uniqueidentifier IS NULL.

如果您定期查詢該條件，則部分索引 應該非常有效。我建議這兩個：

CREATE INDEX ON ned.ned_person_t (nihservao)
WHERE current_flag = 'Y' AND nihservao IS NOT NULL;

由于您只需要回傳uniqueidentifier并且fullname（就像您稍后評論的那樣） - 如果fullname不是太大 -覆寫索引可以提供更多幫助。（需要對桌子進行足夠的吸塵。）

CREATE INDEX ON ned.ned_person_t (uniqueidentifier) INCLUDE (fullname)
WHERE current_flag = 'Y';

例子：

具有主鍵和外鍵的查詢是否比僅具有主鍵的查詢運行得更快？

同時檢查您的所有索引是否實際上（仍然）被使用。這個是完全多余的，洗掉它：

CREATE INDEX ned_person_t_id_idx ON ned.ned_person_t USING btree (id)

id已經被唯一索引覆寫ned_person_t_pk（它實際上可能是一個PRIMARY KEY而不是唯一索引）。

最后的排序操作不太重要，因為只剩844下行。

uj5u.com熱心網友回復：

作為更多資訊，根據上述建議，我 (1) 使用 EXISTS 重寫了查詢，(2) 將 MEM_SIZE 增加到 100 MB。但這仍然需要 7 秒。下面是解釋。如果我之后按順序重新運行此查詢，由于快取需要更少的時間，但第一次運行大約需要 7 秒，如下所示。

更新： Exists 重寫確實將時間縮短了一半，大約 4 秒。這次運行并沒有真正的代表性，我們嘗試了更多（斷開/重新連接后）并且 EXISTS 平均好一半左右。

詢問：

select
    nedperson0_.ID as ID1_21_,
    nedperson0_.BUILDINGNAME as BUILDING2_21_,
    nedperson0_.CREATED_DATE as CREATED_4_21_,
    nedperson0_.CURRENT_FLAG as CURRENT_5_21_,
    /* ...   30 other columns ... */
from
    NED_PERSON_T nedperson0_ 
    where
        exists (select uniqueidentifier from ned_person_t nedperson1_ 
                where nihservao = nedperson0_.uniqueidentifier and nedperson1_.current_flag = 'Y'
                )
        and 
        nedperson0_.current_flag = 'Y'
    order by
        nedperson0_.NIHMIXCASESN,
        nedperson0_.MIXCASE_GIVENNAME

解釋分析：

Sort  (cost=139290.39..139293.75 rows=1344 width=1195) (actual time=7627.351..7627.441 rows=844 loops=1)
  Sort Method: quicksort  Memory: 882kB
  ->  Nested Loop  (cost=97523.64..139220.55 rows=1344 width=1195) (actual time=5112.442..7621.895 rows=844 loops=1)
        ->  HashAggregate  (cost=97523.21..97531.04 rows=783 width=11) (actual time=5108.419..5109.041 rows=847 loops=1)
              Group Key: nedperson1_.nihservao
              ->  Index Scan using ned_person_t_current_flag_idx on ned_person_t nedperson1_  (cost=0.43..97172.64 rows=140228 width=11) (actual time=2.376..4999.905 rows=142540 loops=1)
                    Index Cond: (current_flag = 'Y'::bpchar)
        ->  Index Scan using ned_person_t_nedid_idx on ned_person_t nedperson0_  (cost=0.43..53.21 rows=2 width=1152) (actual time=2.964..2.964 rows=1 loops=847)
              Index Cond: (uniqueidentifier = nedperson1_.nihservao)
              Filter: (current_flag = 'Y'::bpchar)
              Rows Removed by Filter: 23
Planning Time: 10.259 ms
Execution Time: 7627.670 ms

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/422649.html

標籤：

上一篇：如何使用條件和位置加快pandas搜索？

下一篇：具有最大表單大小滯后的全屏表單