import csv
def readCSV(filename, begin_date="01/07/2020", end_date="30/09/2020"):
file = open(filename)
csvreader = csv.reader(file)
header = []
header = next(csvreader)
if __name__ == '__main__':
raw_load_data = readCSV("Total_load_2020.csv")
raw_forecast_data = readCSV("Total_load_forecast_2020.csv")
資料遵循 csv(在線下載),如下所示:
RowDate,RowTime,TotalLoadForecast
01/01/2020,00:00,8600.52
01/01/2020,00:15,8502.06
01/01/2020,00:30,8396.45
...
但是輸出包含一些奇怪的字符(資料中不存在):
['???RowDate', 'RowTime', 'TotalLoad']
['???RowDate', 'RowTime', 'TotalLoadForecast']
當然,我可以輕松洗掉它。但首先為什么會發生這種情況?
uj5u.com熱心網友回復:
是的,這是一個 BOM,以 CP1252 編碼^1 表示。
我復制了您的示例 CSV 并通過GoCSV運行它以知道我正在添加一個 BOM:
% gocsv clean -add-bom sample.csv > tmp
% mv tmp sample.csv
import csv
with open('sample.csv', 'r', newline='', encoding='cp1252') as f:
# See if the first "char" is a BOM
bom_chars = f.read(3)
if (bom_chars != '???'):
f.seek(0) # Not a BOM, reset stream to beginning of file
else:
pass # skip BOM
reader = csv.reader(f)
for row in reader:
print(row)
如果您要讀取使用 UTF-8 編碼的檔案,該 BOM 檢查將如下所示:
with open('sample.csv', 'r', newline='') as f: # utf-8 is the default encoding
bom_char = f.read(1)
if (bom_char != '\ufeff'):
f.seek(0) # Not a BOM, reset stream to beginning of file
或者,讓 Python 為您處理猜測以消除 BOM(如果存在),使用utf_8_sig解碼器:
with open('sample.csv', 'r', newline='', encoding='utf_8_sig') as f:
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/349629.html
