我有一個完美的作業代碼。但是,當我運行一個大型 CSV 檔案(大約 2GB)時,完整執行代碼大約需要 15-20 分鐘。有沒有一種方法可以優化我下面的代碼以減少完成執行的時間從而提高性能?
from csv import reader, writer
import pandas as pd
path = (r"data.csv")
data = pd.read_csv(path, header=None)
last_column = data.iloc[: , -1]
arr = [i 1 for i in range(len(last_column)-1) if (last_column[i] == 1 and last_column[i 1] == 0)]
ch_0_6 = []
ch_7_14 = []
ch_16_22 = []
with open(path, 'r') as read_obj:
csv_reader = reader(read_obj)
rows = list(csv_reader)
for j in arr:
# Channel 1-7
ch_0_6_init = [int(rows[j][k]) for k in range(1,8)]
bin_num = ''.join([str(x) for x in ch_0_6_init])
dec_num = int(f'{bin_num}', 2)
ch_0_6.append(dec_num)
ch_0_6_init = []
# Channel 8-15
ch_7_14_init = [int(rows[j][k]) for k in range(8,16)]
bin_num = ''.join([str(x) for x in ch_7_14_init])
dec_num = int(f'{bin_num}', 2)
ch_7_14.append(dec_num)
ch_7_14_init = []
# Channel 16-22
ch_16_22_init = [int(rows[j][k]) for k in range(16,23)]
bin_num = ''.join([str(x) for x in ch_16_22_init])
dec_num = int(f'{bin_num}', 2)
ch_16_22.append(dec_num)
ch_16_22_init = []
樣本資料:
0.0114,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,1
0.0112,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,1,0,0,0,0
0.0115,0,1,0,1,1,1,0,1,0,0,1,0,0,0,1,1,1,0,1,0,0,0,1
0.0117,0,1,0,1,1,1,0,1,0,0,1,0,0,0,1,1,1,0,1,0,0,0,0
0.0118,0,1,0,0,1,1,0,0,0,1,0,1,0,0,1,1,1,0,1,0,0,0,1
根據選擇的頻道,加入二進制數字以形成十進制數。
uj5u.com熱心網友回復:
僅使用csv模塊,您可以嘗試以下型別方法:
from csv import reader, writer
ch_0_6 = []
ch_7_14 = []
ch_16_22 = []
with open('data.csv', 'r') as f_input:
csv_input = reader(f_input)
last_row = ['0']
for row in csv_input:
if last_row[-1] == '1' and row[-1] == '0':
ch_0_6.append(int(''.join(row[1:8]), 2))
ch_7_14.append(int(''.join(row[8:16]), 2))
ch_16_22.append(int(''.join(row[16:23]), 2))
last_row = row
print(ch_0_6)
print(ch_7_14)
print(ch_16_22)
對于您的示例資料,這將顯示:
[32, 46]
[1, 145]
[104, 104]
如前所述,您最初的方法是將整個檔案兩次讀入記憶體。第一遍只是確定要決議的行。這可以在閱讀時通過跟蹤回圈中的前一行來完成。僅此一項就應該導致顯著的加速。
從二進制串列元素到十進制元素的轉換也更高效一些。
這種方法也適用于更大的檔案大小。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/403373.html
標籤:
上一篇:CSV資料的平均值
