我想將“類別”列中的值加載到熊貓 df 中,這是我的 tsv 檔案:
Tagname text category
j245qzx_8 hamburger toppings f
h833uio_7 side of fries f
d423jin_2 milkshake combo d
這是我的代碼:
with open(filename, 'r') as f:
df = pd.read_csv(f, sep='\t')
categoryColumn = df["category"]
categoryList = []
for line in categoryColumn:
categoryColumn.append(line)
但是,我得到了該行的UnicodeDecodeErrordf = pd.read_csv(f, sep='\t')并且我的代碼停在那里:
File "/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 440, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 787, in __init__
self._make_engine(self.engine)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1014, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1708, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 539, in pandas._libs.parsers.TextReader.__cinit__
File "pandas/_libs/parsers.pyx", line 737, in pandas._libs.parsers.TextReader._get_header
File "pandas/_libs/parsers.pyx", line 932, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2101, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 898: invalid start byte
任何想法為什么或如何解決這個問題?我的 tsv 中似乎沒有任何特殊字符,所以我不確定是什么導致了這種情況或該怎么做。
uj5u.com熱心網友回復:
修復
只需閱讀此 SO,我想我明白出了什么問題。
您將獲得 Python 的檔案句柄open()并將其傳遞給 Pandas 的read_csv(). open()確定檔案的編碼。
因此,嘗試在 中設定編碼open(),如下所示:
with open(filename, 'r', encoding='windows-1252') as f:
df = pd.read_csv(f, sep='\t')
categoryColumn = df["category"]
categoryList = []
for line in categoryColumn:
categoryColumn.append(line)
或者,根本不使用open():
df = pd.read_csv(filename, sep='\t', encoding='windows-1252')
categoryColumn = df["category"]
categoryList = []
for line in categoryColumn:
categoryColumn.append(line)
一些背景故事
我x89在你的示例的末尾回顯,然后運行 ??Python 的chardetect實用程式,它表明它是 Window-1252:
% echo -e '\x89' >> sample.csv
% cat sample.csv
Tagname text category
j245qzx_8 hamburger toppings f
h833uio_7 side of fries f
d423jin_2 milkshake combo d
?
% which chardetect
/Library/Frameworks/Python.framework/Versions/3.9/bin/chardetect
% chardetect sample.csv
sample.csv: Windows-1252 with confidence 0.73
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/359280.html
