我正在閱讀一個壓縮的 csv 檔案,并且只想在不使用熊貓的情況下提取特定的列。我當前的代碼只回傳第一個串列理解的串列,但不回傳后面的串列。使用背景關系管理器時如何提取多列?
輸入檔案:
col1,col2,col3
1,2,3
a,b,c
我的代碼
import gzip
import csv
import codecs
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
col_2 = [row[1] for row in content] # Returns [2, "b"]
col_3 = [row[2] for row in content] # Returns []
預期輸出:
col_2: [2, "b"]
col_3: [3, "c"]
uj5u.com熱心網友回復:
問題不在于背景關系管理器,而在于只能讀取一次的生成器。
您可以使用itertools.tee以下方法復制它:
import gzip
import csv
import codecs
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
from itertools import tee
c1, c2 = tee(content) # from now on, do not use content anymore
col_2 = [row[1] for row in c1]
col_3 = [row[2] for row in c2]
輸出:
>>> col_2
['col2', '2', 'b']
>>> col_3
['col3', '3', 'c']
使用經典回圈
然而,更好的方法是使用經典回圈。這避免了必須回圈兩次值:
with gzip.open(r"myfile.csv.gz", "r") as f:
content = csv.reader(codecs.iterdecode(f, "utf-8"))
col_2 = []
col_3 = []
for row in content:
col_2.append(row[1])
col_3.append(row[2])
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/359253.html
上一篇:用額外的間距在csv檔案上書寫
