我已經從這個網站刮表succsessfully因為用戶hrbrmstr給他的回答這個問題, 5年前我的。最近有關網站的某些內容發生了變化,我無法再獲取資料。
URL <- "http://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
library(httr)
library(rvest)
res <- POST(url = URL,
query = list(lang="is"),
body = list(magn = "Sundurlidun",
hofn = "87",
dagurFra = format(lubridate::today()-4, "%d.%m.%Y"),
dagurTil = format(lubridate::today(), "%d.%m.%Y"),
hnappur = "S?kja"),
encode = "form")
doc <- content(res, as="parsed")
這就是我過去能夠找到并提取表格的方式,但現在輸出為空:
html_nodes(doc, xpath=".//table[contains(., 'Magn')]") %>%
html_table(header=TRUE)
該站點的外觀沒有任何變化,但最近他們為此資料庫打開了這個Power BI(表格位于第 3 頁),因此他們可能在此期間更改了一些我不知道的內容。
有什么建議?
uj5u.com熱心網友回復:
嘗試將日期中的格式更改為'%d.%m.%Y'. 并嘗試更改http://為https://
URL <- "https://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
library(httr)
library(rvest)
res <- POST(url = URL,
query = list(lang="is"),
body = list(magn = "Sundurlidun",
hofn = "87",
dagurFra = format(lubridate::today()-4, '%d.%m.%Y') ,
dagurTil = format(lubridate::today(), '%d.%m.%Y'),
hnappur = "S?kja"),
encode = "form")
doc <- content(res, as="parsed")
在 Python 中:
import requests
import pandas as pd
from datetime import datetime, timedelta
url = "https://www.fiskistofa.is/veidar/aflaupplysingar/landanir-eftir-hofnum/"
today = datetime.now()
payload = {
'magn' : "Sundurlidun",
'hofn' : "87",
'dagurFra' : (today - timedelta(days=4)).strftime("%d.%m.%Y"),
'dagurTil' : today.strftime("%d.%m.%Y"),
'hnappur' : "S?kja"}
df = pd.read_html(requests.post(url, data=payload).text)[-1]
輸出:
print(df)
0 1 ... 4 5
0 L?ndun dags Skipnr. ... V?rutegund Magn
1 25.11.2021 2999 ... Steinbítur /sl?geur 5
2 25.11.2021 2999 ... YSA/óSL./VS (HAFRO) 690
3 25.11.2021 2999 ... Ysa /ósl?ge 415
4 25.11.2021 2999 ... TORSKUR/óSL./VS (HAFRO) 861
5 25.11.2021 2999 ... Torskur / ósl?geur 4.870
6 26.11.2021 2615 ... YSA/óSL./VS (HAFRO) 14
7 26.11.2021 2615 ... Ysa /ósl?ge 1.005
8 26.11.2021 2615 ... TORSKUR/óSL./VS (HAFRO) 164
9 26.11.2021 2615 ... Torskur / ósl?geur 1.507
10 27.11.2021 2842 ... TORSKUR/óSL./VS (HAFRO) 271
11 27.11.2021 2842 ... Torskur / ósl?geur 5.703
12 27.11.2021 2842 ... Torskur-undirmál/ósl 151
13 27.11.2021 2842 ... Hlyri /ósl?geur 13
14 27.11.2021 2842 ... Gullkarfi 27
15 27.11.2021 2842 ... Ufsi /ósl?geur 29
16 27.11.2021 2842 ... Keila /ósl?ge 11
17 27.11.2021 2842 ... Lysa /ósl?ge 2
18 27.11.2021 2842 ... Ysa /ósl?ge 3.072
19 27.11.2021 2842 ... Ysa-undirmál/ósl?ge 8
20 28.11.2021 2256 ... Ysa /ósl?ge 1.888
21 28.11.2021 2256 ... Torskur-undirmál/ósl 551
22 28.11.2021 2256 ... Torskur / ósl?geur 4.212
23 28.11.2021 2256 ... Steinbítur /sl?geur 4
24 28.11.2021 2256 ... YSA/óSL./VS (HAFRO) 243
25 28.11.2021 2615 ... Ysa /ósl?ge 829
26 28.11.2021 2615 ... Torskur / ósl?geur 2.659
27 28.11.2021 2615 ... Gullkarfi 34
28 28.11.2021 2842 ... Keila /ósl?ge 11
29 28.11.2021 2842 ... Gullkarfi 18
30 28.11.2021 2842 ... TORSKUR/óSL./VS (HAFRO) 95
31 28.11.2021 2842 ... Hlyri /ósl?geur 17
32 28.11.2021 2842 ... Torskur-undirmál/ósl 79
33 28.11.2021 2842 ... Langa /ósl?ge 18
34 29.11.2021 1136 ... Tindabikkja 599
35 29.11.2021 1136 ... Torsklifur 1.787
[36 rows x 6 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/369325.html
上一篇:如何使條件替換在R中更有效?
