我正在嘗試撰寫一個 python 腳本,該腳本將從https://data.cms.gov/provider-data/dataset/g6vv-u9sr下載資料并對資料集執行不同的操作。我在自動提取這些資料時遇到問題,并且不確定如何正確撰寫將回傳整個資料集的查詢(最好以熊貓的 csv 形式)。任何指標?
uj5u.com熱心網友回復:
您可以使用requests模塊下載 CSV 資料,例如:
import pandas as pd
from io import StringIO
r = requests.get(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv"
)
df = pd.read_csv(StringIO(r.text))
print(df.dtypes)
print(len(df))
印刷:
Federal Provider Number object
Provider Name object
Provider Address object
Provider City object
Provider State object
Provider Zip Code int64
Penalty Date object
Penalty Type object
Fine Amount float64
Payment Denial Start Date object
Payment Denial Length in Days float64
Location object
Processing Date object
dtype: object
27881
編輯:正如@Parfait 所說,您可以直接使用該網址pd.read_csv。但是,在這種情況下需要顯式設定enoding=引數(“latin1”/“iso_8859-1”有效):
df = pd.read_csv(
"https://data.cms.gov/provider-data/sites/default/files/resources/72ed1971c684c81da254c00145da1b47_1647887934/NH_Penalties_Mar2022.csv",
encoding="iso_8859-1",
)
print(len(df))
印刷:
27881
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/461473.html
上一篇:如何讀取csv檔案中的特定列?
