如何使用Python將資料從csv檔案匯入到我在Sqlite3中的表中-有解無憂

我想使用 Python（Pandas 資料框）將csv 檔案（有4 列）中的所有資料（大約 200 000 行）匯入到sqlite3資料庫中已經創建的表（表有5 列）中。csv 和 table 中的資料型別相互滿足。問題是，在表中有一個額外的列是.index_of(Primary Key)

這些是我的 csv 檔案的前 3 行：

如何使用 Python 將資料從 csv 檔案匯入到我在 Sqlite3 中的表中

這就是我所能做的，我想它是否會起作用，大約需要 5-6 個小時，因為在這段代碼中，我使用 for 回圈來讀取每一行：

connection = _sqlite3.connect("db_name.sqlite")
cursor = connection.cursor()
with open('path_to_csv', 'r') as file:
    no_records = 0
    for row in file: 
        cursor.execute("INSERT INTO table_name (index_of, high, low, original, ship_date) VALUES (?,?,?,?,?)", row.split(","))
        connection.commit()
        no_records  = 1  
        
connection.close()

但它向我顯示了一個錯誤：發生例外：4 列的 OperationalError 5 個值

拜托，你能幫我解決這個問題嗎：

如何使用 Python 快速匯入 200 000 行？
如何將 csv 檔案中的所有列匯入表的特定列？

uj5u.com熱心網友回復：

您需要為第 5 列提供默認值。

如果在每個 SQL 陳述句中插入 100-200 行的塊，還可以提高腳本的性能。

uj5u.com熱心網友回復：

user3380595 已經在他們的回答中指出您需要為 column 提供一個值index_of。

cursor.execute("""
    INSERT INTO Quotes (index_of, high, low, original, ship_date)
    VALUES (?, ?, ?, ?, ?)
    """, [index, *row])

我創建了 200,000 行測驗資料，并且加載速度非常快（不到 2 秒）。請參閱第一個使用csvand 的示例sqlite3。

正如 user3380595 提到的，如果您擔心記憶體和性能，您可以分塊加載資料。這個場景實際上加載速度稍慢。請參閱使用pandasand 的第二個示例sqlalchemy。

使用csv和sqlite3

設定測驗環境

import csv
import sqlite3
import contextlib

import pandas as pd

test_data = r"/home/thomas/Projects/Playground/stackoverflow/data/test.csv"
test_db = r"/home/thomas/Projects/Playground/stackoverflow/data/test.db"

with contextlib.closing(sqlite3.connect(test_db)) as connection:
    
    cursor = connection.cursor()

    cursor.execute("DROP TABLE IF EXISTS Quotes;")

    cursor.execute("""
        CREATE TABLE IF NOT EXISTS Quotes (
            index_of INTEGER, -- PRIMARY KEY,
            high REAL,
            low REAL,
            original REAL,
            ship_date TEXT
        );
        """)

    connection.commit()

加載資料

with contextlib.closing(sqlite3.connect(test_db)) as connection:
    
    cursor = connection.cursor()

    with open(test_data, "r") as file:
        
        for index, row in enumerate(csv.reader(file)):
            cursor.execute("""
                INSERT INTO Quotes (index_of, high, low, original, ship_date)
                VALUES (?, ?, ?, ?, ?)
                """, [index, *row])

    connection.commit()

使用pandas和sqlalchemy

設定測驗環境

import pandas as pd

from sqlalchemy import create_engine

test_data = r"/home/thomas/Projects/Playground/stackoverflow/data/test.csv"
test_db = r"sqlite:////home/thomas/Projects/Playground/stackoverflow/data/test.db"

engine = create_engine(test_db, echo=True)

with engine.begin() as connection:

    engine.execute("DROP TABLE IF EXISTS Quotes;")

    engine.execute("""
        CREATE TABLE IF NOT EXISTS Quotes (
            index_of INTEGER, -- PRIMARY KEY,
            high REAL,
            low REAL,
            original REAL,
            ship_date TEXT
        );
        """)

加載資料（以塊為單位）

with engine.begin() as connection:

    reader = pd.read_csv(test_data, iterator=True, chunksize=50000)

    for chunk in reader:
        chunk["index_of"] = chunk.index
        chunk.to_sql("Quotes", con=engine, if_exists="append", index=False)

或者，pandas您也可以使用sqlite3.Cursor.executemany和處理行塊，而不是使用。

或者，您可以使用dask.dataframe.to_sql（示例假設現有資料庫帶有 Quotes 表）并并行寫入資料。但是，我認為您的資料不需要它。

import dask.dataframe as dd

test_data = r"/home/thomas/Projects/Playground/stackoverflow/data/test.csv"
test_db = r"sqlite:////home/thomas/Projects/Playground/stackoverflow/data/test.db"

df = dd.read_csv(test_data)  # , blocksize=2e6
df["index_of"] = df.index
df.to_sql("Quotes", uri=test_db, if_exists="append", index=False, parallel=True)

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/336550.html

標籤：Python 数据库 sqlite 文件进口

上一篇：SQLite在使用引數和事務時不插入多于一行

下一篇：為什么我的代碼給出錯誤：“sqlite3.OperationalError：沒有這樣的列：”？