處理具有不同標題的csv檔案的最佳方法-有解無憂

我只是在建立一個個人理財規劃師。我想做的是，以下。

下載不同銀行賬戶的 csv 檔案中的賬戶報表。
通過 Python 代碼在檔案夾中的每個欄位上回圈并讀取資料。
將每一列的資料寫入sql資料庫

我現在遇到的問題如下。每個 csv 檔案的 csv 結構略有不同（標題不同）。

我想知道現在處理這些資料的最佳方法是什么。

目前，我為每家銀行創建了不同的功能以創建相同的結構。這里的問題是，大部分代碼都是相同的，如果我想更改任何內容，我需要對所有功能進行更改。

有沒有辦法在 csv 中構建資料，以便我可以將它們全部發送到同一個函式進行處理。我在這里看到的問題是，我需要多次處理同一個 csv 檔案（首先構造它，然后讀取它）。

以下是一些標頭結構的示例：

銀行 1：['account_num', 'text', 'date', 'valutadate', 'amount', 'currency']

銀行 2：['account_num', 'text', 'valutadate', 'currency', 'amount_withdrawal', 'amount_deposit']

我將按以下順序需要它們：['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']

這是我的兩個函式的代碼。他們只是處于非常早期的階段。但只要從 csv 中提取資料并為我提供正確的結構，兩者都可以作業。

# function for readout bank1 data
def read_bank(filename):
    # Create header for data in csv
    header = ['account_num', 'text', 'date', 'valutadate', 'amount', 'currency']

    # Create empty pandas dataframe for transporting data to the database
    head_row = ['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']
    bank1_df = pd.DataFrame(columns=head_row)

    # Processes data from file
    with open(filename, 'r') as fhandle:
        csv_reader = csv.DictReader(fhandle, delimiter=';', fieldnames=header) # Adds a fieldname to the columns, so i can access them by name instead of index.
        # header = next(csv_reader) # Attention, if we use this line, we define the first row in the csv file as header and don′t read it.
        # print(header)

        # loop over data and write into dataframe
        for row_index, row in enumerate(csv_reader):

            # Safety check if all rows are equal in lenght, otherwise there could be some problem with the transformation of the data.
            if len(row) != 6:
                print(f"Row {row_index} is only {len(row)} elements long.")
                break
            
            # Definition of data elements
            account_num = row['account_num']
            text = row['text']
            date = datetime.strptime(row['date'], '%d.%m.%Y')
            valutadate = datetime.strptime(row['valutadate'], '%d.%m.%Y')
            amount = float(row['amount'].replace('.','').replace(',','.'))
            currency = row['currency']
            category = match_c.match_category(text)

            # Create hashvalue of each record, to add a uniqe identifier in the table.
            # hash_val = hash_func(account_num, text, valutadate, amount)

            # Write data elements into dataframe as new row (.loc[row_index])
            bank1_df.loc[row_index] = [account_num, text, valutadate, amount, currency, category]

            # print(f"hash: {hash_val.hexdigest()}; account_num: {account_num}; text: {text}; date: {date}; valutadate: {valutadate}; amount: {amount}; currency: {currency}")
      
        print(bank1_df)
        
        print(f"Total Rows: {row_index   1}")

# function for readout bank2 data
def read_bank2(filename):
    # Create header for data in csv
    header = ['account_num', 'text', 'valutadate', 'currency', 'amount_withdrawal', 'amount_deposit']

    # Create empty pandas dataframe for transporting data to the database
    head_row = ['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']
    bank2_df = pd.DataFrame(columns=head_row)

    # Processes data from file
    with open(filename, 'r') as fhandle:
        csv_reader = csv.DictReader(fhandle, delimiter=';', fieldnames=header) # Adds a fieldname to the columns, so i can access them by name instead of index.
        header = next(csv_reader) # Attention, if we use this line, we define the first row in the csv file as header and don′t read it.
        # print(header)

        # loop over data and write into dataframe
        for row_index, row in enumerate(csv_reader):

            # Safety check if all rows are equal in lenght, otherwise there could be some problem with the transformation of the data.
            if len(row) != 6:
                print(f"Row {row_index} is only {len(row)} elements long.")
                break
            
            # Definition of data elements
            account_num = row['account_num']
            text = row['text']
            valutadate = datetime.strptime(row['valutadate'], '%d.%m.%Y')
            currency = row['currency']
            amount_withdrawal = float(row['amount_withdrawal'].replace('.','').replace(',','.'))
            amount_deposit = float(row['amount_deposit'].replace('.','').replace(',','.'))
            amount = amount_deposit - amount_withdrawal
            category = match_c.match_category(text)

            # Create hashvalue of each record, to add a uniqe identifier in the table.
            # hash_val = hash_func(account_num, text, valutadate, amount)

            # Write data elements into dataframe as new row (.loc[row_index])
            bank2_df.loc[row_index] = [account_num, text, valutadate, amount, currency, category]

            # print(f"hash: {hash_val.hexdigest()}; account_num: {account_num}; text: {text}; valutadate: {valutadate}; currency: {currency}; amount_withdrawal: {amount_withdrawal}; amount_deposit: {amount_deposit}")
        
        print(bank2_df)

        print(f"Total Rows: {row_index   1}")

感謝您的任何反饋。

uj5u.com熱心網友回復：

您可以創建一個字典，其中包含給定列的所有可能名稱：

column_names = {
    'date': ['date', 'valutadate'],
    'amount': ['amount', 'amount_withdrawal']
}

然后，您可以在此字典中查找給定的輸入 csv 列名稱，以在最終輸出 csv 中找到該列的“真實”名稱。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/520953.html

標籤：Python熊猫数据库CSV数据处理

上一篇：列印第一行時關于python的csv.DictReader()的問題

下一篇：按記錄中包含的分隔符拆分