正如我在標題中所寫的那樣,我想閱讀一個 CSV,在同一個 CSV 上按列進行分組,應用 sum,然后使用盡可能少的庫(并避免使用大熊貓)將舊 CSV 替換為新值。我已經走到這一步了:
index = {}
with open('event.csv') as f:
cr = reader(f)
for row in cr:
index.setdefault(row[0], []).append(int(row[1]))
f.close()
with open('event.csv', 'w', newline='\n') as csv_file:
writer = writer(csv_file)
for key, value in index.items():
writer.writerow([key, value[0]])
csv_file.close()
但通過這種方式,我可以求平均值……而且我必須打開檔案兩次,這對我來說似乎并不明智。這是一個類似于以下內容的 CSV 檔案event.csv:
work1,100
work2,200
work3,200
work1,50
work3,20
期望的輸出:
work1,150
work2,200
work3,220
uj5u.com熱心網友回復:
你實際上非常接近。只需將重寫檔案時讀取的值相加即可。請注意,在with檔案上使用時,您不必明確關閉它們,它會自動為您完成。還要注意,CSV檔案應該被打開newline=''-用于讀取和寫入,按照該檔案。
import csv
index = {}
with open('event.csv', newline='') as csv_file:
cr = csv.reader(csv_file)
for row in cr:
index.setdefault(row[0], []).append(int(row[1]))
with open('event2.csv', 'w', newline='\n') as csv_file:
writer = csv.writer(csv_file)
for key, values in index.items():
value = sum(values)
writer.writerow([key, value])
print('-fini-')
通過消除一些臨時變數并使用生成器運算式,可以更簡潔地撰寫上述內容:
import csv
index = {}
with open('event.csv', newline='') as csv_file:
for row in csv.reader(csv_file):
index.setdefault(row[0], []).append(int(row[1]))
with open('event2.csv', 'w', newline='\n') as csv_file:
csv.writer(csv_file).writerows([key, sum(values)] for key, values in index.items())
print('-fini-')
uj5u.com熱心網友回復:
已經顯示的解決方案的另一個簡化,沒有額外的庫:
import csv
index = {}
with open('event.csv', newline='') as f:
cr = csv.reader(f)
for item,value in cr:
index[item] = index.get(item, 0) int(value) # sum as you go
with open('event2.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(index.items()) # write all the items in one shot
print('-fini-')
uj5u.com熱心網友回復:
使用額外的庫 - convtools,它提供了很多功能,無需每次都手動撰寫大量代碼。
from convtools import conversion as c
from convtools.contrib.tables import Table
rows = Table.from_csv("event.csv", header=False).into_iter_rows(list)
converter = (
c.group_by(c.item(0))
.aggregate(
(
c.item(0),
c.ReduceFuncs.Sum(c.item(1).as_type(int)),
)
)
.gen_converter()
)
processed_rows = converter(rows)
Table.from_rows(processed_rows, header=False).into_csv(
"event2.csv", include_header=False
)
uj5u.com熱心網友回復:
這是另一種思考方式。
不要在讀取期間存盤整數陣列,然后在寫入期間將它們“壓縮”為所需的值,而是預先表明您在讀取期間正在對某些內容求和:
import csv
from collections import defaultdict
summed_work = defaultdict(int)
with open('event_input.csv', newline='') as f:
reader = csv.reader(f)
for row in reader:
work_id = row[0]
work_value = int(row[1])
summed_work[work_id] = work_value
with open('event_processed.csv', 'w', newline='') as f:
writer = csv.writer(f)
for work_id, summed_value in summed_work.items():
writer.writerow([work_id, summed_value])
這在功能上等同于您的目標以及馬蒂諾為您提供的幫助,但是,我認為,可以更快更清楚地向您和您的讀者展示其意圖是什么。
從技術上講,它又使用了一個庫defaultdict,但這是一個標準庫,我不確定您對使用的庫數量的重視程度。
編輯
哦,我只記得集合中也有Counter類。可能更清楚:
summed_work = Counter()
其他一切都一樣。
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/337215.html
