通過對大表的操作來加速組-有解無憂

我有兩個大表，tokens（100.000s 個條目）和（1.000.000s 個buy_orders條目）我需要有效地加入和分組。

如下所示，由合約地址（一個 20 位元組的十六進制字串）和一個 id（一個 256 位元組的整數）唯一標識的代幣：

TABLE tokens (
  contract TEXT NOT NULL
  token_id NUMERIC(78, 0) NOT NULL
  top_bid NUMERIC(78, 0)

  PRIMARY KEY (contract, token_id)
)

用戶可以對各種代幣進行投標。出價具有有效期（通過時間范圍表示）和價格（256 位元組整數）。出價只能是以下兩種型別之一：

型別 1：單一合約，token_id 范圍（例如contract start_token_id end_token_id）
型別 2：多個合約，多個 token_ids（例如[(contract1 token_id1), (contract2 token_id2), ...]）

下面是保持投標的表格。它是高度非規范化的，以適應出價可能具有的 2 種可能型別。

TABLE buy_orders (
  id INT NOT NULL PRIMARY KEY
  contract TEXT
  start_token_id NUMERIC(78, 0)
  end_token_id NUMERIC(78, 0)
  token_list_id INT REFERENCES token_lists(id)
  price NUMERIC(78, 0) NOT NULL,
  valid_between TSTZRANGE NOT NULL,
  cancelled BOOLEAN NOT NULL,
  executed BOOLEAN NOT NULL

  INDEX ON (contract, start_token_id, end_token_id DESC)
  INDEX ON (token_list_id)
  INDEX ON (price)
  INDEX ON (cancelled, executed)
  INDEX ON (valid_between) USING gist
)

以下是保存屬于每個串列的令牌的相應表：

TABLE token_lists (
  id INT PRIMARY KEY
)

TABLE token_lists_tokens (
  token_list_id INT NOT NULL REFERENCES token_lists(id)
  contract TEXT NOT NULL
  token_id NUMERIC(78, 0) NOT NULL

  FOREIGN KEY (contract, token_id) REFERENCES tokens(address, id)
  INDEX ON (contract, token_id)
)

As you can see in the tokens table, it keeps track of the top bid in order to make token data retrieval as efficiently as possible (we'll have a paginated API for retrieving all tokens of an address including their current top bid). As new bids come in, get cancelled/filled or expire, I need an efficient way to update the top bid for the tokens the bids are on. This is not a problem for bids of type 2, since those will most of the time reference an insignificant number of tokens, but it creates a problem for type 1 bids because in this case I might need to recalculate the top bid for 100.000s of tokens efficiently (eg. the type 2 bid could have a range of [1, 100.000]). Here's the query I'm using right now (I limited the results because otherwise it takes forever):

SELECT t.contract, t.token_id, max(b.price) FROM tokens t
JOIN buy_orders b ON t.contract = b.contract AND b.start_token_id <= t.token_id AND t.token_id <= b.end_token_id
WHERE t.contract = 'foo' AND NOT b.cancelled AND NOT b.filled AND b.valid_between @> now() 
GROUP BY t.contract, t.token_id
LIMIT 1000

And here is the execution plan for it:

 Limit  (cost=5016.77..506906.79 rows=1000 width=81) (actual time=378.231..19260.361 rows=1000 loops=1)
   ->  GroupAggregate  (cost=5016.77..37281894.72 rows=74273 width=81) (actual time=123.729..19005.567 rows=1000 loops=1)
         Group Key: t.contract, t.token_id
         ->  Nested Loop  (cost=5016.77..35589267.24 rows=225584633 width=54) (actual time=83.885..18953.853 rows=412253 loops=1)
               Join Filter: ((b.start_token_id <= t.token_id) AND (t.token_id <= b.end_token_id))
               Rows Removed by Join Filter: 140977658
               ->  Index Only Scan using tokens_pk on tokens t  (cost=0.55..8186.80 rows=99100 width=49) (actual time=0.030..5.394 rows=11450 loops=1)
                     Index Cond: (contract = 'foo'::text)
                     Heap Fetches: 0
               ->  Materialize  (cost=5016.21..51551.91 rows=20487 width=60) (actual time=0.001..0.432 rows=12348 loops=11450)
                     ->  Bitmap Heap Scan on buy_orders b  (cost=5016.21..51449.47 rows=20487 width=60) (actual time=15.245..116.099 rows=12349 loops=1)
                           Recheck Cond: (contract = 'foo'::text)
                           Filter: ((NOT cancelled) AND (NOT filled) AND (valid_between @> now()))
                           Rows Removed by Filter: 87771
                           Heap Blocks: exact=33525
                           ->  Bitmap Index Scan on buy_orders_contract_start_token_id_end_token_id_index  (cost=0.00..5011.09 rows=108072 width=0) (actual time=10.835..10.835 rows=100120 loops=1)
                                 Index Cond: (contract = 'foo'::text)
 Planning Time: 0.816 ms
 JIT:
   Functions: 15
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 3.922 ms, Inlining 106.877 ms, Optimization 99.947 ms, Emission 47.445 ms, Total 258.190 ms
 Execution Time: 19264.851 ms

我正在尋找的是一種提高此特定查詢效率的方法，如果可能的話，或其他建議以達到相同的結果。

我正在使用 Postgres 13。

uj5u.com熱心網友回復：

部分多列索引可能會有所幫助。如;

CREATE INDEX ON buy_orders (contract, valid_between) -- Multiple fields
  INCLUDE (price) -- non-key column for index only scan
  WHERE -- represents partial index
    NOT cancelled AND
    NOT filled;

這將允許索引掃描buy_orders洗掉更多行，這樣你就不會得到

Rows Removed by Join Filter: 140977658

這就是使您的查詢變得昂貴的原因。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/358413.html

標籤：sql PostgreSQL的表现 postgresql-13

上一篇：python中多個EC2實體之間最快的通信方式

下一篇：htaccessHSTS防止從www.subdomain.domain.com重定向到subdomain.domain.com