如果網址https://www.nseindia.com/companies-listing/corporate-filings-announcements 在瀏覽器的選項卡中打開,我可以使用另一個網址https://www.nseindia.com/api下載 CSV 檔案/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true來自同一瀏覽器中的另一個選項卡。否則不是,它說“找不到資源”。如何使用 pandas 在 python 中實作它。
uj5u.com熱心網友回復:
此頁面使用 Cookie 來檢查檔案是否從第一頁打開。
您必須使用requestsandSession來獲取第一頁和 cookie,下一次使用requestsand Session(使用來自先前請求的 cookie)來獲取 file csv,最后您必須pandas使用io記憶體中的哪個模擬檔案發送資料。
順便說一句:它似乎使用BOM ( Byte Order Mark) 發送檔案,所以我從中讀取位元組資料r.content而不是文本資料r.text并將pandas跳過BOM
import requests
import pandas as pd
import io
# --- create Session with User-Agent from real browser ---
headers = {
'User-Agent': 'Mozilla/5.0'
}
s = requests.Session()
s.headers.update(headers)
# --- get first page to get cookies ---
url = 'https://www.nseindia.com/companies-listing/corporate-filings-announcements'
r = s.get(url)
# --- get file ---
url = 'https://www.nseindia.com/api/corporate-announcements?index=equities&from_date=14-01-2022&to_date=20-01-2022&csv=true'
r = s.get(url)
print(r.text[:100]) # code `???` at the beginning means BOM
# so I will use `r.content` instead of `r.text`
# --- read file from memory ---
#df = pd.read_csv(io.StringIO(r.text)) # it doesn't remove BOM
df = pd.read_csv(io.BytesIO(r.content)) # it removes BOM
# --- show it ---
print(df.head())
結果:
???"SYMBOL","COMPANY NAME","SUBJECT","DETAILS","BROADCAST DATE/TIME","RECEIPT","DISSEMINATION","DIFF
SYMBOL ... DIFFERENCE
0 TATAELXSI ... 00:00:08
1 RIIL ... 00:00:10
2 ERIS ... 00:00:06
3 RIIL ... 00:00:09
4 INGERRAND ... 00:00:09
[5 rows x 8 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/418863.html
標籤:
