我正在使用的分析軟體在 1 個 csv 檔案中輸出多組結果,并用 2 個空行分隔這些組。我想將結果分組,以便我可以分別分析它們。
我確定python(或其中一個庫)中有一個內置函式可以執行此操作,我嘗試了在某處找到的這段代碼,但它似乎不起作用。
import csv
results = open('03_12_velocity_y.csv').read().split("\n\n")
# Feed first csv.reader
first_csv = csv.reader(results[0], delimiter=',')
# Feed second csv.reader
second_csv = csv.reader(results[1], delimiter=',')
uj5u.com熱心網友回復:
如果您的行數在各組之間不一致,您將需要一個小狀態機來檢查您何時在組之間并對最后一組進行操作。
#!/usr/bin/env python3
import csv
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as out_f:
csv.writer(out_f).writerows(group)
with open("input.csv", newline="") as f:
reader = csv.reader(f)
group_i = 1
group = []
last_row = []
for row in reader:
if row == [] and last_row == [] and group != []:
write_group(group, group_i)
group = []
group_i = 1
continue
if row == []:
last_row = row
continue
group.append(row)
last_row = row
# flush remaining group
if group != []:
write_group(group, group_i)
我模擬了這個示例 CSV:
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
當我運行上面的程式時,我得到三個 CSV 檔案:
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
group_2.csv
g2r1c1,g2r1c2,g2r1c3
g2r2c1,g2r2c2,g2r2c3
group_3.csv
g3r1c1,g3r1c2,g3r1c3
g3r2c1,g3r2c2,g3r2c3
g3r3c1,g3r3c2,g3r3c3
g3r4c1,g3r4c2,g3r4c3
g3r5c1,g3r5c2,g3r5c3
uj5u.com熱心網友回復:
如果您的行數是一致的,您可以使用相當普通的 Python 或使用 Pandas 庫來做到這一點。
香草蟒
- 定義您的組大小和組之間的中斷大小(以“行”為單位)。
- 回圈遍歷所有行,將每一行添加到一個組累加器。
- 當組累加器達到預定義的組大小時,對其進行處理,重置累加器,然后跳過中斷大小行。
在這里,我將每個組寫入其自己的編號檔案:
import csv
group_sz = 5
break_sz = 2
def write_group(group, i):
with open(f"group_{i}.csv", "w", newline="") as f_out:
csv.writer(f_out).writerows(group)
with open("input.csv", newline="") as f_in:
reader = csv.reader(f_in)
group_i = 1
group = []
for row in reader:
group.append(row)
if len(group) == group_sz:
write_group(group, group_i)
group_i = 1
group = []
for _ in range(break_sz):
try:
next(reader)
except StopIteration: # gracefully ignore an expected StopIteration (at the end of the file)
break
group_1.csv
g1r1c1,g1r1c2,g1r1c3
g1r2c1,g1r2c2,g1r2c3
g1r3c1,g1r3c2,g1r3c3
g1r4c1,g1r4c2,g1r4c3
g1r5c1,g1r5c2,g1r5c3
與熊貓
我是 Pandas 的新手,邊走邊學,但看起來 Pandas 會自動從一大塊資料^1中修剪空白行/記錄。
考慮到這一點,您需要做的就是指定組的大小,并告訴 Pandas 以“迭代器模式”讀取您的 CSV 檔案,您可以在其中一次請求一大塊(您的組大小)記錄:
import pandas as pd
group_sz = 5
with pd.read_csv("input.csv", header=None, iterator=True) as reader:
i = 1
while True:
try:
df = reader.get_chunk(group_sz)
except StopIteration:
break
df.to_csv(f"group_{i}.csv")
i = 1
Pandas 在寫出 CSV 時會添加一個“ID”列和默認標題:
group_1.csv
,0,1,2
0,g1r1c1,g1r1c2,g1r1c3
1,g1r2c1,g1r2c2,g1r2c3
2,g1r3c1,g1r3c2,g1r3c3
3,g1r4c1,g1r4c2,g1r4c3
4,g1r5c1,g1r5c2,g1r5c3
uj5u.com熱心網友回復:
用你的輸出試試這個:
import pandas as pd
# csv file name to be read in
in_csv = 'input.csv'
# get the number of lines of the csv file to be read
number_lines = sum(1 for row in (open(in_csv)))
# size of rows of data to write to the csv,
# you can change the row size according to your need
rowsize = 500
# start looping through data writing it to a new file for each set
for i in range(1,number_lines,rowsize):
df = pd.read_csv(in_csv,
header=None,
nrows = rowsize,#number of rows to read at each loop
skiprows = i)#skip rows that have been read
#csv to write data to a new file with indexed name. input_1.csv etc.
out_csv = 'input' str(i) '.csv'
df.to_csv(out_csv,
index=False,
header=False,
mode='a', #append data to csv file
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/465303.html
