Pymongovspyscopg3：寫入的數量級差異？-有解無憂

我有一個從去中心化交易所收集加密價格的應用程式（沒有商業用途，目標主要是學習一些具有真實資料的資料庫技術）。我讓它與 MongoDB 和 PostgresSQL 一起作業。但是，在寫入方面，我看到了巨大的差異，支持 MongoDB。我說的是幾個數量級。我知道 MongoDB 是一個 NoSQL 資料庫，并且宣傳它對這類東西更有效，但我想知道我的 postgres 實作是否遺漏了一些東西。我將在下面描述我是如何實作邏輯的，并且我將嘗試提供可比較的指標（盡我所能）。

為了與資料庫通信，我使用 psycopg3 實作 Postgres，使用 pymongo 實作 MongoDB。

這是我正在寫入資料庫的資料結構：

class PricingInfo(NamedTuple):
    blockchain_name: str
    dex_name: str
    pair_address: str
    token0_symbol: str
    token1_symbol: str
    token0_address: str
    token1_address: str
    raw_reserve0: int
    raw_reserve1: int
    reserve0: float
    reserve1: float
    mid_price: float
    datetime: pendulum.DateTime
    inverted: bool = False

    @classmethod
    def from_dict(cls, doc: dict) -> PricingInfo:
        doc.pop("_id", None)
        return cls(**doc)

    def to_dict(self) -> dict:
        return self._asdict()

兩種實作方式都是一樣的。我每秒有數百個要寫入資料庫。這是我為 postgres 做的事情：

def register_prices(self, prices: list[PricingInfo]) -> None:
    query = """
    insert into prices (
        blockchain_name,
        dex_name,
        pair_address,
        token0_address,
        token1_address,
        raw_reserve0,
        raw_reserve1,
        reserve0,
        reserve1,
        mid_price,
        datetime
    )
    values (
        %(blockchain_name)s,
        %(dex_name)s,
        %(pair_address)s,
        %(token0_address)s,
        %(token1_address)s,
        %(raw_reserve0)s,
        %(raw_reserve1)s,
        %(reserve0)s,
        %(reserve1)s,
        %(mid_price)s,
        %(datetime)s
    )
    """

    keys_to_keep = {
        "blockchain_name",
        "dex_name",
        "pair_address",
        "token0_address",
        "token1_address",
        "raw_reserve0",
        "raw_reserve1",
        "reserve0",
        "reserve1",
        "mid_price",
        "datetime",
    }

    with psycopg.connect(self.db_uri) as conn:
        with conn.cursor() as cur:
            start = time.perf_counter()
            if len(prices) == 1:
                cur.execute(
                    query,
                    {
                        k: v
                        for k, v in prices[0].to_dict().items()
                        if k in keys_to_keep
                    },
                )
            elif len(prices) > 1:
                cur.executemany(
                    query,
                    [
                        {k: v for k, v in p.to_dict().items() if k in keys_to_keep}
                        for p in prices
                    ],
                )

        conn.commit()
    delta = time.perf_counter() - start

    if self.dex_name in {"pangolin", "trader_joe"}:
        logger.warning(f"Inserting {len(prices)}")
        logger.warning(f"Inserting prices took {delta} seconds")

這是我的表定義：

create table prices (
    id serial primary key,
    blockchain_name varchar(100) not null,
    dex_name varchar(100) not null,
    raw_reserve0 decimal not null,
    raw_reserve1 decimal not null,
    reserve0 decimal not null,
    reserve1 decimal not null,
    mid_price decimal not null,
    datetime timestamp with time zone not null,
    pair_address varchar(50) not null,
    token0_address varchar(50) not null,
    token1_address varchar(50) not null,
    foreign key (blockchain_name, dex_name, pair_address) references pairs (blockchain_name, dex_name, pair_address),
    foreign key (blockchain_name, dex_name, token0_address) references tokens (blockchain_name, dex_name, address),
    foreign key (blockchain_name, dex_name, token1_address) references tokens (blockchain_name, dex_name, address)
);

對于 MongoDB：

def register_prices(self, prices: list[PricingInfo]) -> None:

    start = time.perf_counter()

    prices_table = self._db["prices"]
    prices_table.insert_many(price.to_dict() for price in prices)

    delta = time.perf_counter() - start

    if self.dex_name in {"pangolin", "trader_joe"}:
        logger.warning(f"Inserting {len(prices)}")
        logger.warning(f"Inserting prices took {delta} seconds")

該應用程式的運行方式與兩個資料庫完全相同。使用 postgres 撰寫時有一個微小的差異，其中需要修改資料字典以適應模式（我在這里進行了一些規范化），但是因為我每次只有大約 600 個資料字典要修改寫，我不認為這是瓶頸。在這兩種情況下，我都有 8 個行程同時寫入資料庫。

對于 postgres，我得到了這些指標：

Inserting 587
Inserting prices took 1.175270811014343 seconds
Inserting 611
Inserting prices took 0.3126116280036513 seconds

對于蒙哥：

Inserting 588
Inserting prices took 0.03671051503624767 seconds
Inserting 612
Inserting prices took 0.032324473024345934 seconds

這些時間相對穩定，postgres 大約 1s 和 300ms，Mongo 大約 30ms。非常奇怪的是，postgres 對于大致相同的資料量有兩種不同的寫入時間。盡管如此，即使在最好的 postgres 情況下，它仍然比 mongo 慢 10 倍。

附加條款：

對于 postgres，我有一些外鍵約束。我嘗試消除這些限制，但它并沒有顯著影響時間
我嘗試alter user postgres set synchronous_commit to off;了 postgres，對時間沒有明顯影響
我通過 Docker 運行 postgres:14 和 mongo:4

我在 Postgres 上做錯了嗎？

uj5u.com熱心網友回復：

對于將資料批量匯入 Postgres，通常最快的方法是使用 Postgres 命令COPY。這psycopg3可以通過此處顯示的程序psycopg3 COPY 獲得。需要注意的是COPY，要么全部要么沒有，要么匯入所有資料，要么錯誤意味著沒有匯入任何資料。

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/430447.html

標籤：Python mongodb PostgreSQL 心理咨询师3

上一篇：mongojavaapp中斷獲取從池中檢索專案的許可

下一篇：MongoDB-如何用新值替換空值