我正在嘗試從該網站('https://alliedoffsets.com/#/profile/2)中提取資料。它有很多這樣的專案,我想得到估計平均批發價格和估計年減排量的值。當我嘗試使用漂亮的湯列印代碼時,它沒有給出這些標簽并給出空值。我知道這可能是一件基本的事情,但我被困住了。可能是使用 javascript 在網站上填充資料,但我無法找到一種方法來做到這一點。
import pandas as pd
import requests
from bs4 import BeautifulSoup
url='https://alliedoffsets.com/#/profile/1'
r=requests.get(url)
url=r.content
soup = BeautifulSoup(url,'html.parser')
tab=soup.find("thead",{"class":"sr-only"})
print(tab)
uj5u.com熱心網友回復:
網頁以 JavaScript 呈現,因此無法使用 BeautifulSoup 直接提取 HTML 元素。Selenium可用于提取呈現的 HTML,然后按 ID、類、XPath 等搜索元素。
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import re
url = 'https://alliedoffsets.com/#/profile/1'
s = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s)
# web driver goes to page
driver.get(url)
# use WebDriverWait to wait until page is rendered
# find Estimated Average Wholesale Price
elt = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.ID, 'direct-price-panel'))
)
# extract just the price from the text
print(re.sub(r'.*(\$\S ).*', r'\1', elt.text))
# find Estimated Annual Emission Reduction
elt = driver.find_element(By.XPATH, "//*[strong[contains(., 'Estimated Annual Emission Reduction')]]")
print(elt.text.split(":")[1])
輸出:
$5.06
11603 tCO2
uj5u.com熱心網友回復:
您看到的資料是通過 JavaScript 從外部 URL 加載的。requests要使用/模塊加載資料,json您可以使用以下示例:
import json
import requests
url = "https://carbon-registry.herokuapp.com/1.0/provider/1"
params = {
"embedded": '{"provider_capital_types":1,"provider_capital_types.capital_type":1,"provider_countries":1,"provider_countries.country":1,"contacts":1,"contacts.office":1,"provider_currencies":1,"provider_currencies.currency":1,"provider_languages":1,"provider_languages.language":1,"offices":1,"offices.country":1,"provider_sectors":1,"provider_sectors.sector":1,"provider_social_medias":1,"provider_social_medias.social_media":1,"provider_provider_types":1,"provider_provider_types.provider_type":1,"provider_stats":1,"provider_stats.stat":1,"provider_descriptions":1,"provider_descriptions.description":1,"relationships":1,"relationships.description":1,"provider_statuses":1,"provider_statuses.status":1}'
}
headers = {"Authorization": "Bearer 8hCH4MuPCa5t6ra8wtAz8xOQfJdjLvDVZk07ib60TZ"}
data = requests.get(url, headers=headers, params=params).json()
# uncomment to print all data:
# print(json.dumps(data, indent=4))
stats = {s["stat"]["name"]: s for s in data["provider_stats"]}
print(f"{stats['Estimated Direct Price']['value']=}")
print(f"{stats['Estimated Annual Emission Reduction']['value']=}")
印刷:
stats['Estimated Direct Price']['value']=5.0630778182036105
stats['Estimated Annual Emission Reduction']['value']=11603
uj5u.com熱心網友回復:
該網站是動態的。因此,您可以按照下一個示例 selenium 和 bs4 來獲取正確的資料。
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import time
url = 'https://alliedoffsets.com/#/profile/1'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get(url)
driver.maximize_window()
time.sleep(5)
soup = BeautifulSoup(driver.page_source,'lxml')
driver.close()
Price = soup.select_one('p#direct-price-panel').contents[1].strip().replace('/tCO2e','')
Reduction= soup.select('.panel')[-1].contents[1].strip().replace('tCO2','')
print('Estimated Average Wholesale Price: ' str(Price))
print('Estimated Annual Emission Reduction: ' str(Reduction))
輸出:
Estimated Average Wholesale Price: $5.06
Estimated Annual Emission Reduction: 11603
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/459905.html
