我有看起來像這樣的 csv 資料,我正在嘗試將其讀入 pandas df 并且鑒于在線充足的檔案,我已經厭倦了各種組合 - 我嘗試過類似的事情:
pd.read_csv("https://www.nwrfc.noaa.gov/natural/nat_norm_text.cgi?id=TDAO3.csv", delimiter=',', skiprows=0, low_memory=False)
我得到這個錯誤 -
ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 989
或者,像這樣但得到一個空的資料框:
pd.read_csv('https://www.nwrfc.noaa.gov/natural/nat_norm_text.cgi?id=TDAO3.csv', skiprows=2,
skipfooter=3,index_col=[0], header=None,
engine='python', # c engine doesn't have skipfooter
sep='delimiter')
Out[31]:
Empty DataFrame
Columns: []
Index: []
csv 檔案的前 10 行如下所示:
# Water Supply Monthly Volumes for COLUMBIA - THE DALLES DAM (TDAO3)
# Volumes are in KAF
ID,Calendar Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
TDAO3,1948,,,,,,,,,,6866.8,4307.04,4379.38
TDAO3,1949,3546.71,4615.1,8513.31,15020.45,35251.67,21985.99,11226.06,6966.73,4727.37,4406.29,5266.74,5595.91
TDAO3,1950,4353.86,5540.21,9696.27,12854.81,23359.51,39246.78,23393.23,9676.77,5729.74,6990.31,8300.03,8779.57
TDAO3,1951,8032.32,10295.98,7948.59,16144.8,36000.88,28334.09,19735.49,9308.15,6546.95,8907.1,6461.14,6425.76
TDAO3,1952,4671,6222.25,6551.62,18678.3,34866.91,27120.65,15994.18,7907.55,4810.39,3954.32,3259.29,3231.49
TDAO3,1953,7839.72,7870.96,6527.74,9474.66,23384.47,32668.32,17422.63,8655.16,5220.04,5130.46,5183.5,5915.14
TDAO3,1954,5197.51,5967.07,6718.36,10813.69,29190.37,32673.26,29624.38,13456.13,9165.78,5440.92,5732.22,4973.53
謝謝,
uj5u.com熱心網友回復:
它不是直接鏈接到檔案 CSV,而是鏈接到使用標簽<pre>、<br>等將其顯示為 HTML 的頁面,這會產生問題。
但是您可以使用requests將其下載為文本。
稍后您可以使用標準- 函式在 andstring之間獲取文本<pre>并</pre>替換<br>為'\n'- 并且您將獲得具有正確 CSV 的文本。
稍后您可以使用io.StringIO在記憶體中創建檔案 - 加載它pd.read_csv()而不保存在磁盤上。
import pandas as pd
import requests
import io
url = "https://www.nwrfc.noaa.gov/natural/nat_norm_text.cgi?id=TDAO3.csv"
response = requests.get(url)
start = response.text.find('<pre>') len('<pre>')
end = response.text.find('</pre>')
pre = response.text[start:end]
text = pre.replace('<br>', '\n')
buf = io.StringIO(text) # file-like object in memory
df = pd.read_csv(buf, skiprows=2, low_memory=False)
print(df.to_string())
結果
ID Calendar Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
0 TDAO3 1948 NaN NaN NaN NaN NaN NaN NaN NaN NaN 6866.80 4307.04 4379.38
1 TDAO3 1949 3546.71 4615.10 8513.31 15020.45 35251.67 21985.99 11226.06 6966.73 4727.37 4406.29 5266.74 5595.91
2 TDAO3 1950 4353.86 5540.21 9696.27 12854.81 23359.51 39246.78 23393.23 9676.77 5729.74 6990.31 8300.03 8779.57
3 TDAO3 1951 8032.32 10295.98 7948.59 16144.80 36000.88 28334.09 19735.49 9308.15 6546.95 8907.10 6461.14 6425.76
4 TDAO3 1952 4671.00 6222.25 6551.62 18678.30 34866.91 27120.65 15994.18 7907.55 4810.39 3954.32 3259.29 3231.49
5 TDAO3 1953 7839.72 7870.96 6527.74 9474.66 23384.47 32668.32 17422.63 8655.16 5220.04 5130.46 5183.50 5915.14
6 TDAO3 1954 5197.51 5967.07 6718.36 10813.69 29190.37 32673.26 29624.38 13456.13 9165.78 5440.92 5732.22 4973.53
7 TDAO3 1955 4124.26 3570.41 3843.46 7993.82 18505.47 31619.54 20408.54 8922.94 4983.31 5842.70 6982.45 9076.44
8 TDAO3 1956 8079.70 5366.62 8818.69 19754.46 40600.06 40447.34 19846.89 9726.93 5503.69 5446.20 4988.98 6006.80
9 TDAO3 1957 3940.08 4411.33 9155.00 12271.77 40111.86 27864.70 11585.75 6795.70 4613.31 4767.38 4087.55 4789.04
10 TDAO3 1958 4838.12 8246.89 7303.03 13902.66 33958.88 26239.62 12516.52 6898.78 4968.03 5198.19 6662.24 7616.43
... rest ...
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/433952.html
下一篇:迭代時如何跳過空值?
