我想用 R 語言從這個網站上抓取表格資料。
我的代碼
library(XML)
url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"
doc <- htmlParse(url)
tableNodes = getNodeSet(doc,"//table")
tb = readHTMLTable(tableNodes[[1]])
但我收到一個錯誤,看起來像 在此處輸入影像描述
uj5u.com熱心網友回復:
您可以使用 {rvest} 包
library(rvest)
url <- "https://www.westmetall.com/en/markdaten.php?action=show_table&field=LME_Cu_cash"
tables <- read_html(url) |>
html_table()
該tables串列包含頁面中找到的所有表格,您可以對其進行檢查
str(tables)
#> List of 5
#> $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ Official LME-Prices in US Dollar: chr [1:7] "in US Dollar per ton" "Copper" "Tin" "Lead" ...
#> ..$ 07. October 2022 : chr [1:7] "Settlement Kasse" "7,575.50" "20,000.00" "2,078.00" ...
#> ..$ : chr [1:7] "3 months" "7,554.00" "19,950.00" "2,050.00" ...
#> ..$ : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
#> $ : tibble [7 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ LME stocks : chr [1:7] "in tons" "Copper" "Tin" "Lead" ...
#> ..$ 07. October 2022: chr [1:7] "" "143,775" "4,690" "31,875" ...
#> ..$ Changes : chr [1:7] "" "3,575" "15" "0" ...
#> ..$ : chr [1:7] "Chart\nTable\nAverage" "" "" "" ...
#> $ : tibble [3 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ Exchange Rates : chr [1:3] "EUR/USD LME-FX-rate (MTLE)" "ECB-Fixing (14:15 Uhr)" "EUR/USD-Basis DEL-Notiz"
#> ..$ 07. October 2022: num [1:3] 0.979 0.98 0.979
#> ..$ 06. October 2022: num [1:3] 0.987 0.986 0.987
#> ..$ : logi [1:3] NA NA NA
#> $ : tibble [15 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ German Metal Prices: chr [1:15] "in Euro per 100 kg" "lower Copper WM-Notiz" "higher Copper WM-Notiz" "lower DEL-Notiz (until February 11, 2022)" ...
#> ..$ 07. October 2022 : chr [1:15] "" "786.54" "789.89" "-" ...
#> ..$ 06. October 2022 : chr [1:15] "" "797.52" "800.84" "-" ...
#> ..$ : chr [1:15] "Chart\nTable\nAverage" "" "" "" ...
#> $ : tibble [5 × 4] (S3: tbl_df/tbl/data.frame)
#> ..$ Precious metals : chr [1:5] "Gold London Fixing in USD/oz." "Gold in Euro/kg" "Gold, processed in Euro/kg" "Fine Silver in Euro/kg" ...
#> ..$ 07. October 2022: chr [1:5] "1,711.50" "55,190.00" "62,080.00" "666.90 / 733.80" ...
#> ..$ 06. October 2022: chr [1:5] "1,716.00" "54,840.00" "61,670.00" "658.90 / 725.10" ...
#> ..$ : logi [1:5] NA NA NA NA NA
然后你只需要選擇你想要的表格和你想要的格式
tables[[2]]
#> # A tibble: 7 × 4
#> `LME stocks` `07. October 2022` Changes ``
#> <chr> <chr> <chr> <chr>
#> 1 in tons "" "" "Chart\nTable\nAverage"
#> 2 Copper "143,775" "3,575" ""
#> 3 Tin "4,690" "15" ""
#> 4 Lead "31,875" "0" ""
#> 5 Zinc "53,475" "150" ""
#> 6 Aluminium "327,625" "-1,225" ""
#> 7 Nickel "52,362" "942" ""
使用reprex v2.0.2創建于 2022-10-09
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/512263.html
標籤:r网页抓取html表
