我正在嘗試使用以下代碼從此網址https://coinmarketcap.com/historical/20210328/下載 html 內容:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
url = "https://coinmarketcap.com/historical/20210328/"
driver = webdriver.Firefox()
driver.get(url)
time.sleep(2)
driver.find_element_by_css_selector(".cmc-cookie-policy-banner__close").click()
time.sleep(2)
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
data = driver.page_source
print(data)
我使用點擊功能按下頁面底部的“加載更多”按鈕,因為我不僅需要前 200 個元素,而且至少需要達到 1000 個。但是當我列印頁面源時,它只顯示第一個200,就好像在頁面第一次加載的html內容處就停了,并沒有考慮到我在頁面上的操作。我怎樣才能解決這個問題?
uj5u.com熱心網友回復:
并不是您的問題的真正答案,但是對您嘗試翻錄的網頁的一些分析表明它直接從該端點提取資料:
https://web-api.coinmarketcap.com/v1/cryptocurrency/listings/historical?convert=USD,USD,BTC&date=2021-03-28&limit=200&start=401
這將回傳 JSON,然后您可以更輕松地將其匯入 Python。
# import requests module
import requests
# Making a get request
response = requests.get('https://web-api.coinmarketcap.com/v1/cryptocurrency/listings/historical?convert=USD,USD,BTC&date=2021-03-28&limit=200&start=401')
# print response
print(response)
# print json content
print(response.json())
uj5u.com熱心網友回復:
單擊“加載更多”之間的延遲以及在最后一次單擊之后添加延遲后,page_source我看到單擊“加載更多”確實改變了內容data = driver.page_source
下面的代碼顯示初始page_source長度為 396447,而最終page_source長度為 946180
import time
from selenium import webdriver
url = "https://coinmarketcap.com/historical/20210328/"
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)
time.sleep(2)
driver.find_element_by_css_selector(".cmc-cookie-policy-banner__close").click()
time.sleep(2)
data = driver.page_source
print(len(data))
# print(data)
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
time.sleep(2)
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
time.sleep(2)
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
time.sleep(2)
driver.find_element_by_css_selector(".cmc-table-listing__loadmore > button:nth-child(1)").click()
time.sleep(2)
data = driver.page_source
print(len(data))
# print(data)
driver.quit()
必須改進此代碼以洗掉多余的硬編碼睡眠,但基本上它可以作業。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/444114.html
