如何將 2 個或更多 csv 檔案與時間重疊資料合并?例如,
資料1是
Time u v w
0.24001821 0 0.009301949 0
0.6400364 0 0.009311552 0
0.84005458 0 0.0093211568 0
0.94034343 0 0.0094739951 0
資料2是
Time u v w
0.74041502 0 0.0095119512 0
0.84043291 0 0.0095214359 0
0.94045075 0 0.0095309047 0
1.2404686 0 0.0095403752 0
我想要的是:
Time u v w
0.24001821 0 0.009301949 0
0.6400364 0 0.009311552 0
0.74041502 0 0.0095119512 0
0.84043291 0 0.0095214359 0
0.94045075 0 0.0095309047 0
1.2404686 0 0.0095403752 0
所以第一個csv檔案的最后幾行資料被洗掉,第二個csv檔案被合并,使得時間序列增加。
怎么可能呢?謝謝。
uj5u.com熱心網友回復:
如果兩個檔案都已按時間單獨排序。使用 for 回圈就足夠了:
# csv cell should be separated by comma, change if required
dilimeter = ','
# open files and read lines
f1 = open('data1.csv', 'r')
f1_lines = f1.readlines()
f1.close()
f2 = open('data2.csv', 'r')
f2_lines = f2.readlines()
f2.close()
# extract header
output_lines = [f1_lines[0]]
# start scanning frome line 2 of both files (line 1 is header)
f1_index = 1
f2_index = 1
while True:
# all data1 are processed, append remaining lines from data2
if f1_index >= len(f1_lines):
output_lines = f2_lines[f2_index:]
break
# all data2 are processed, append remaining lines from data1
if f2_index >= len(f2_lines):
output_lines = f1_lines[f1_index:]
break
f1_line_time = float(f1_lines[f1_index].split(dilimeter)[0]) # get the time cell of data1
f2_line_time = float(f2_lines[f2_index].split(dilimeter)[0]) # get the time cell of data2
if f1_line_time < f2_line_time:
output_lines.append(f1_lines[f1_index])
f1_index = 1
elif f1_lines == f2_line_time:
# if they are equal in time, pick one
output_lines.append(f1_lines[f1_index])
f1_index = 1
f2_index = 1
else:
output_lines.append(f2_lines[f2_index])
f2_index = 1
f_output = open('out.csv', 'w')
f_output.write(''.join(output_lines))
f_output.close()
uj5u.com熱心網友回復:
另外一個選項:
import csv
delimiter = " "
with open("data1.csv", "r") as fin1,\
open("data2.csv", "r") as fin2,\
open("data.csv", "w") as fout:
reader1 = csv.reader(fin1, delimiter=delimiter)
reader2 = csv.reader(fin2, delimiter=delimiter)
writer = csv.writer(fout, delimiter=delimiter)
next(reader2)
first_row = next(reader2)
start2 = float(first_row[0])
writer.writerow(next(reader1))
for row in reader1:
if start2 <= float(row[0]):
break
writer.writerow(row)
writer.writerow(first_row)
writer.writerows(reader2)
假設檔案已經單獨訂購:
- 首先取第一個資料行
data2.csv并將其第一個條目轉換為 floatstart2。 data1.csv考慮到這一點,將時間小于的所有行寫入start2新檔案data.csv,一旦不再滿足條件,就退出回圈。- 然后將已經提取的第一個資料行寫入
data2.csv到data.csv,然后將其余的寫入data2.csv到data.csv。
結果為
data1.csv
Time u v w
0.24001821 0 0.009301949 0
0.6400364 0 0.009311552 0
0.84005458 0 0.0093211568 0
0.94034343 0 0.0094739951 0
data2.csv
Time u v w
0.74041502 0 0.0095119512 0
0.84043291 0 0.0095214359 0
0.94045075 0 0.0095309047 0
1.2404686 0 0.0095403752 0
是
Time u v w
0.24001821 0 0.009301949 0
0.6400364 0 0.009311552 0
0.74041502 0 0.0095119512 0
0.84043291 0 0.0095214359 0
0.94045075 0 0.0095309047 0
1.2404686 0 0.0095403752 0
更通用的解決方案(多個檔案)可能如下所示:
import csv
delimiter = " "
files = ["data1.csv", "data2.csv", "data3.csv"]
stops = []
for file in files[1:]:
with open(file, "r") as file:
reader = csv.reader(file, delimiter=delimiter)
header = next(reader)
stops.append(float(next(reader)[0]))
stops.append(float("inf"))
with open("data.csv", "w") as fout:
writer = csv.writer(fout, delimiter=delimiter)
writer.writerow(header)
for stop, file in zip(stops, files):
with open(file, "r") as fin:
next(fin)
reader = csv.reader(fin, delimiter=delimiter)
for row in reader:
if stop <= float(row[0]):
break
writer.writerow(row)
這適用于看起來像的重疊
1. file: |------|
2. file: |--------|
3. file: |------|
但不是
1. file: |--------|
2. file: |-------|
3. file: |--------------|
uj5u.com熱心網友回復:
Python 有一個出色的內置庫函式來幫助解決這個問題,稱為heapq.merge().
假設您的資料是空格分隔的,您可以按如下方式使用它:
from heapq import merge
import csv
filenames = ['data1.csv', 'data2.csv']
merge_list = []
for filename in filenames:
f_input = open(filename)
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
header = next(csv_input)
merge_list.append(csv_input)
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output, delimiter=' ')
csv_output.writerow(header)
csv_output.writerows(merge(*merge_list, key=lambda x: float(x[0])))
這將產生一個 CSV 輸出格式:
Time u v w
0.24001821 0 0.009301949 0
0.6400364 0 0.009311552 0
0.74041502 0 0.0095119512 0
0.84005458 0 0.0093211568 0
0.84043291 0 0.0095214359 0
0.94034343 0 0.0094739951 0
0.94045075 0 0.0095309047 0
1.2404686 0 0.0095403752 0
這適用于任意數量的輸入 CSV 檔案。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/466932.html
