我只是在建立一個個人理財規劃師。我想做的是,以下。
- 下載不同銀行賬戶的 csv 檔案中的賬戶報表。
- 通過 Python 代碼在檔案夾中的每個欄位上回圈并讀取資料。
- 將每一列的資料寫入sql資料庫
我現在遇到的問題如下。每個 csv 檔案的 csv 結構略有不同(標題不同)。
我想知道現在處理這些資料的最佳方法是什么。
目前,我為每家銀行創建了不同的功能以創建相同的結構。這里的問題是,大部分代碼都是相同的,如果我想更改任何內容,我需要對所有功能進行更改。
有沒有辦法在 csv 中構建資料,以便我可以將它們全部發送到同一個函式進行處理。我在這里看到的問題是,我需要多次處理同一個 csv 檔案(首先構造它,然后讀取它)。
以下是一些標頭結構的示例:
銀行 1:['account_num', 'text', 'date', 'valutadate', 'amount', 'currency']
銀行 2:['account_num', 'text', 'valutadate', 'currency', 'amount_withdrawal', 'amount_deposit']
我將按以下順序需要它們:['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']
這是我的兩個函式的代碼。他們只是處于非常早期的階段。但只要從 csv 中提取資料并為我提供正確的結構,兩者都可以作業。
# function for readout bank1 data
def read_bank(filename):
# Create header for data in csv
header = ['account_num', 'text', 'date', 'valutadate', 'amount', 'currency']
# Create empty pandas dataframe for transporting data to the database
head_row = ['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']
bank1_df = pd.DataFrame(columns=head_row)
# Processes data from file
with open(filename, 'r') as fhandle:
csv_reader = csv.DictReader(fhandle, delimiter=';', fieldnames=header) # Adds a fieldname to the columns, so i can access them by name instead of index.
# header = next(csv_reader) # Attention, if we use this line, we define the first row in the csv file as header and don′t read it.
# print(header)
# loop over data and write into dataframe
for row_index, row in enumerate(csv_reader):
# Safety check if all rows are equal in lenght, otherwise there could be some problem with the transformation of the data.
if len(row) != 6:
print(f"Row {row_index} is only {len(row)} elements long.")
break
# Definition of data elements
account_num = row['account_num']
text = row['text']
date = datetime.strptime(row['date'], '%d.%m.%Y')
valutadate = datetime.strptime(row['valutadate'], '%d.%m.%Y')
amount = float(row['amount'].replace('.','').replace(',','.'))
currency = row['currency']
category = match_c.match_category(text)
# Create hashvalue of each record, to add a uniqe identifier in the table.
# hash_val = hash_func(account_num, text, valutadate, amount)
# Write data elements into dataframe as new row (.loc[row_index])
bank1_df.loc[row_index] = [account_num, text, valutadate, amount, currency, category]
# print(f"hash: {hash_val.hexdigest()}; account_num: {account_num}; text: {text}; date: {date}; valutadate: {valutadate}; amount: {amount}; currency: {currency}")
print(bank1_df)
print(f"Total Rows: {row_index 1}")
# function for readout bank2 data
def read_bank2(filename):
# Create header for data in csv
header = ['account_num', 'text', 'valutadate', 'currency', 'amount_withdrawal', 'amount_deposit']
# Create empty pandas dataframe for transporting data to the database
head_row = ['account_num', 'text', 'valutadate', 'amount', 'currency', 'category']
bank2_df = pd.DataFrame(columns=head_row)
# Processes data from file
with open(filename, 'r') as fhandle:
csv_reader = csv.DictReader(fhandle, delimiter=';', fieldnames=header) # Adds a fieldname to the columns, so i can access them by name instead of index.
header = next(csv_reader) # Attention, if we use this line, we define the first row in the csv file as header and don′t read it.
# print(header)
# loop over data and write into dataframe
for row_index, row in enumerate(csv_reader):
# Safety check if all rows are equal in lenght, otherwise there could be some problem with the transformation of the data.
if len(row) != 6:
print(f"Row {row_index} is only {len(row)} elements long.")
break
# Definition of data elements
account_num = row['account_num']
text = row['text']
valutadate = datetime.strptime(row['valutadate'], '%d.%m.%Y')
currency = row['currency']
amount_withdrawal = float(row['amount_withdrawal'].replace('.','').replace(',','.'))
amount_deposit = float(row['amount_deposit'].replace('.','').replace(',','.'))
amount = amount_deposit - amount_withdrawal
category = match_c.match_category(text)
# Create hashvalue of each record, to add a uniqe identifier in the table.
# hash_val = hash_func(account_num, text, valutadate, amount)
# Write data elements into dataframe as new row (.loc[row_index])
bank2_df.loc[row_index] = [account_num, text, valutadate, amount, currency, category]
# print(f"hash: {hash_val.hexdigest()}; account_num: {account_num}; text: {text}; valutadate: {valutadate}; currency: {currency}; amount_withdrawal: {amount_withdrawal}; amount_deposit: {amount_deposit}")
print(bank2_df)
print(f"Total Rows: {row_index 1}")
感謝您的任何反饋。
uj5u.com熱心網友回復:
您可以創建一個字典,其中包含給定列的所有可能名稱:
column_names = {
'date': ['date', 'valutadate'],
'amount': ['amount', 'amount_withdrawal']
}
然后,您可以在此字典中查找給定的輸入 csv 列名稱,以在最終輸出 csv 中找到該列的“真實”名稱。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/520953.html
上一篇:列印第一行時關于python的csv.DictReader()的問題
下一篇:按記錄中包含的分隔符拆分
