我正在使用 R 編程語言。
我試圖在此網站https://www.yellowpages.ca/search/si/2/pizza/Canada上抓取披薩店的名稱和地址(例如https://www.yellowpages.ca/search/si/ 2/pizza/加拿大,https: //www.yellowpages.ca/search/si/3/pizza/Canada,https ://www.yellowpages.ca/search/si/4/pizza/Canada等)
我正在嘗試遵循此處提供的答案:Scraping Yellowpages in R
library(rvest)
library(stringr)
url <- "https://www.yellowpages.com.au/search/listings?clue=plumbers&locationClue=Greater Sydney, NSW&lat=&lon=&selectedViewMode=list"
library(rvest)
library(stringr)
testscrape <- function(url){
webpage <- read_html(url)
docname <- webpage %>%
html_nodes(".left .listing-name") %>%
html_text()
ph_no <- webpage %>%
html_nodes(".contact-phone .contact-text") %>%
html_text()
email <- webpage %>%
html_nodes(".contact-email") %>%
html_attr("href") %>%
as.character() %>%
str_remove_all(".*:") %>%
str_remove_all("\\?(.*)") %>%
str_replace_all("@","@")
n <- seq_len(max(length(practice), length(ph_no), length(email)))
tibble(docname = practice[n], ph_no = ph_no[n], email = email[n])
}
testscrape(url)
但是這段代碼需要很長時間才能運行。我試圖通過運行函式的各個部分來進行調查,我想我發現了問題:“read_html”陳述句本身不起作用。我試圖用另一個陳述句替換它:
library(httr)
webpage <- GET(url)
這行得通,但現在格式不一樣了。
有人可以告訴我如何做到這一點嗎?
最后,我希望輸出看起來像這樣:
id name address
1 1 OJ's Steak & Pizza 9906B Franklin Ave, Fort McMurray, AB T9H 2K5
2 2 MJs Pizza & Grill 10012 Franklin Ave, Fort McMurray, AB T9H 2K6
3 3 Hu's Pizza & Donairs 10020 Franklin Ave, Fort McMurray, AB T9H 2K6
# sample results
sample_results = structure(list(id = c(1, 2, 3), name = c("OJ's Steak & Pizza",
"MJs Pizza & Grill", "Hu's Pizza & Donairs"), address = c("9906B Franklin Ave, Fort McMurray, AB T9H 2K5",
"10012 Franklin Ave, Fort McMurray, AB T9H 2K6", "10020 Franklin Ave, Fort McMurray, AB T9H 2K6"
)), class = "data.frame", row.names = c(NA, -3L))
謝謝!
uj5u.com熱心網友回復:
快速,但不穩健。(我認為,如果缺少姓名或地址,代碼就會中斷。)
library(tidyverse)
library(rvest)
scraper <- function(url) {
page <- url %>%
read_html()
tibble(
name = page %>%
html_elements(".jsListingName") %>%
html_text2(),
address = page %>%
html_elements(".listing__address--full") %>%
html_text2()
)
}
scraper("https://www.yellowpages.ca/search/si/2/pizza/Canada")
# A tibble: 35 x 2
name address
<chr> <chr>
1 OJ's Steak & Pizza 9906B Franklin Ave, Fort McMurray, AB T~
2 MJs Pizza & Grill 10012 Franklin Ave, Fort McMurray, AB T~
3 Hu's Pizza & Donairs 10020 Franklin Ave, Fort McMurray, AB T~
4 Eagle Ridge Convenience Store & Pizza 117-375 Loutit Rd, Fort McMurray, AB T9~
5 Cosmos Pizza 9713 Hardin St, Fort McMurray, AB T9H 1~
6 Boston Pizza 10202 MacDonald Ave, Fort McMurray, AB ~
7 Jomaa's Pizza & Chicken Beacon Hill Shpg Plaza, Fort McMurray, ~
8 Abasand PK's Pizza 101-307 Athabasca Ave, Fort McMurray, A~
9 Pizza 73 1-289 Powder Dr, Ft McMurray, AB T9K 0M5
10 Boston Pizza 110 Millennium Dr, Fort McMurray, AB T9~
# ... with 25 more rows
# i Use `print(n = ...)` to see more rows
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/507420.html