根據行值python將大型csv檔案拆分為多個檔案-有解無憂

背景

我有一個特定格式 (NEM12) 的大型 csv 檔案，該檔案太大而無法使用。檔案格式如下；

檔案總是以 100 開頭
帶有 200 的行表示新資料集的開始
具有 300 或 400 的行表示資料集的資料
檔案總是以 900 結尾

下面的例子

100 NEM12               
200 NMI INFO    INFO        
300 20211001    0   0   0   0
400 20  20  F17     
300 20211002    0   0   0   0
300 20211003    0   0   0   0
200 NMI INFO    INFO        
300 20211001    0   0   0   0
300 20211002    0   0   0   0
300 20211003    0   0   0   0
300 20211004    0   0   0   0
300 20211005    0   0   0   0
…                   
200 NMI INFO    INFO        
300 20211001    0   0   0   0
300 20211002    0   0   0   0
400 20  20  F17     
300 20211003    0   0   0   0
300 20211004    0   0   0   0
900

我正在嘗試做什么

我正在嘗試將大檔案拆分為數百個較小的檔案。每個較小的檔案將包含一個 200 行以及相應的 300 和 400 行值。

我試過的

我試圖通過熊貓讀取檔案，但由于它的形狀不規則，這沒有用。

我已經成功地能夠通過下面的代碼遍歷行，但它將每個值拆分到自己的列中（即不是 200，而是 2、0、0）。

任何幫助表示贊賞。

for line in open(test):
    if left(line, 3) == '200':
        try:
            with open(fname, 'a', newline='') as f_object:
                writer_object = writer(f_object)
                writer_object.writerow('900')
            f_object.close()
        except NameError:
            print('ignore')
        fname = str(line.replace(',', '').replace('\n', ''))   '.csv'
        with open(fname, 'w', newline='') as f_object:
            writer_object = writer(f_object)
            writer_object.writerow('100')
            writer_object.writerow(line)
    if left(line, 3) == '300' or left(line, 3) == '400':
        with open(fname, 'a', newline='') as f_object:
            writer_object = writer(f_object)
            writer_object.writerow(line)

uj5u.com熱心網友回復：

這是一種方法。

fn = 'NEM12#000000000000001#CNRGYMDP#NEMMCO.csv'

cnt = 0
outfn = f'out_{cnt}.csv'

with open(fn, 'r') as f:
    for line in f:
        if line.startswith('100,'):  # don't write
            continue
        elif line.startswith('900'):  # don't write
            continue
        elif line.startswith('200,'):  # write detect start
            cnt  = 1
            outfn = f'out_{cnt}.csv'  # new filename
            
        if line.startswith(('200,', '300,', '400,')):
            with open(outfn, 'a') as w:  # write
                w.write(f'{line}'):

輸出將是 out_1.csv, out_2.csv etc

uj5u.com熱心網友回復：

感謝@Ferdy 的幫助，

使用您提供的代碼以及我的原始代碼，我能夠解決問題

from csv import writer

for line in open(test):
    if line.startswith('200'):
        try:
            with open(fname, 'a', newline='') as f:
                w = writer(f)
                w.writerow(['900'])
            f.close()
        except NameError:
            print('ignore')
        flist = [str(line).split(",")[x] for x in [1, 3, 6, 7, 8]]
        fname = '_'.join(flist)   '.csv'
        print(fname)
        with open(fname, 'w', newline='') as f:
            w = writer(f)
            w.writerow(['100', 'NEM12', 'DATECREATED', 'MDYMDP', 'NAME'])
            w.writerow(str(line).split(","))
    if line.startswith(('300,', '400,')):
        with open(fname, 'a', newline='') as f:
            w = writer(f)
            w.writerow(str(line).split(","))

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/376565.html

標籤：Python 文件出口

上一篇：如何將dd/mm/yyyyhh:mm格式的CSV日期匯入到R中的一般數字字串

下一篇：如何將特定引數作為自變數放入JMeter的csv檔案中？