當前5行有時超過1列時，如何讀取CSV檔案的不同部分？-有解無憂

我收到了一個 CSV 檔案，該檔案有時具有以下格式，其中前 5 行有多個列：

File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021,,,,,,
line two content,,,,,,
line 3 content.,,,,,,
line 4 content.,,,,,,
,,,,,,
1,TEAM,Bob Jones,Sar a  require transport,A,,18:34:04hrs on 17/10/21
2,TEAM,Peter Smith,Sar h,H,,20:43:49hrs on 17/10/21
3,TEAM,Neil Barnes,SAR H,H,,20:15:12hrs on 17/10/21

其他時候，前 4 行只有 1 列，第 5 行沒有：

File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021
line two content
line 3 content.
line 4 content.

1,TEAM,Bob Jones,Sar a  require transport,A,,18:34:04hrs on 17/10/21
2,TEAM,Peter Smith,Sar h,H,,20:43:49hrs on 17/10/21
3,TEAM,Neil Barnes,SAR H,H,,20:15:12hrs on 17/10/21

我需要能夠滿足這兩種型別的檔案。我想對前 4 行中的資料進行正則運算式，然后繼續從第 6 行開始的“正常”資料構建一個串列。

我正在使用csv_reader并嘗試測驗該行是否有多個列，但這僅適用于第二個示例。然后我嘗試測驗第二列是否為空：

if row[1] == None:

但我得到一個IndexError: list index out of range我理解

我無法測驗是否row[0]為數字，因為在第二個示例中，第 5 行沒有任何列。我需要做的是在前 x 行中讀取并正則運算式名稱、時間和日期，跳過空行然后正常讀取下一組行。

這是對任何一種格式的第一個塊的讀取，然后是我正在努力解決的下一個塊。

目前我有：

with open('file.csv', 'r') as csvfile:
    csv_reader = csv.reader(csvfile)
    for row in csv_reader:
    if len(row) > 0 and row[0] != "":
        print(row)

這在兩個示例中都遺漏了空行，但是當我嘗試測驗時，我正在努力解決串列索引超出范圍錯誤row[1]。

我確信有一種簡單的方法可以做到這一點，但我的谷歌搜索還沒有想出任何東西。

uj5u.com熱心網友回復：

在這兩種情況下，我都會將您的 CSV 概括如下：

第 1-4 行：特殊的文本“行”
第 5 行：垃圾（丟棄）
第 6-...行：有意義的“行”

這是代碼中的一般方法。該parse_special_csv函式將檔案名作為輸入并回傳兩個串列：

第一個是“行”串列（1-4）；它們在技術上是行，但更多的是關于你如何對待它們/你用它們做什么
第二個是行串列，（第 6 行-...）

我的想法是，一旦您將資料拆分出來并且檔案被完全決議，您就會知道該做什么lines以及如何處理rows：

import csv

def parse_special_csv(fname):
    lines = []
    rows = []
    with open(fname, 'r', newline='') as f:
        reader = csv.reader(f)

        # Treat lines 1-4 as just "lines"
        for i in range(4):
            row = next(reader)    # manually advance the reader
            lines.append(row[0])  # safe to index for first column, because *you know* these lines have column-like data
        
        # Discard line 5
        next(reader)

        # Treat the remaining lines as CSV rows
        for row in reader:
            rows.append(row)

    return lines, rows

lines, rows = parse_special_csv('sample1.csv')
print('sample1')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

lines, rows = parse_special_csv('sample2.csv')
print('sample2')
print('lines:')
print(lines)
print('rows:')
print(rows)
print()

我得到，根據你的樣本：

sample1
lines:
[
 'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
 'line two content',
 'line 3 content.',
 'line 4 content.'
]
rows:
[
 ['1', 'TEAM', 'Bob Jones', 'Sar a  require transport', 'A', '', '18:34:04hrs on 17/10/21'],
 ['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
 ['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

sample2
lines:
[
 'File For EMS Team Downloaded By Bob Mortimer At 17:22:36 09/11/2021',
 'line two content',
 'line 3 content.',
 'line 4 content.'
]
rows:
[
 ['1', 'TEAM', 'Bob Jones', 'Sar a  require transport', 'A', '', '18:34:04hrs on 17/10/21'],
 ['2', 'TEAM', 'Peter Smith', 'Sar h', 'H', '', '20:43:49hrs on 17/10/21'],
 ['3', 'TEAM', 'Neil Barnes', 'SAR H', 'H', '', '20:15:12hrs on 17/10/21']
]

此外，next(reader)可能看起來有點陌生，但這是手動推進 CSV 閱讀器^1（以及 Python 中的任何迭代器，通常為^2）的正確方法。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/359281.html

標籤：Python 文件

上一篇：將TSV檔案中的列加載到python串列中

下一篇：如何避免使用csvwriter出現雙空格