我正在嘗試從網頁上抓取 URL,它們位于需要幾秒鐘才能加載的排名表內。
我想要做的是等到排名表完成加載,然后通過它的 id 抓取它并迭代元素。
這是我用來抓取頁面并等待的代碼:
driver = webdriver.Chrome(cred_path)
driver.get(page)
wait(driver, 5).until(EC.presence_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
#soup = BeautifulSoup(driver.page_source, features='lxml')
#print(soup.prettify())
rankings = soup.find_all('div', {'class': "sc-ljMRFG hgfcNB rankings-table"})[0]
print(rankings)
據我所知,代碼實際上一直在運行(我可以在視窗打開時直觀地看到表正在加載),但是它會引發超時錯誤:
Traceback (most recent call last):
File "ethereum_scraper_dappRadarv2.py", line 377, in <module>
general_dapp_page()
File "ethereum_scraper_dappRadarv2.py", line 39, in general_dapp_page
_ = wait(driver, 5).until(EC.visibility_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
File "/Users/trentfowler/opt/anaconda3/lib/python3.8/site-packages/selenium/webdriver/support/wait.py", line 89, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
Stacktrace:
0 chromedriver 0x0000000104dd4269 __gxx_personality_v0 582729
1 chromedriver 0x0000000104d5fc33 __gxx_personality_v0 106003
2 chromedriver 0x000000010491ce28 chromedriver 171560
3 chromedriver 0x00000001049523d2 chromedriver 390098
4 chromedriver 0x0000000104952591 chromedriver 390545
5 chromedriver 0x00000001049846b4 chromedriver 595636
6 chromedriver 0x000000010496f9fd chromedriver 510461
7 chromedriver 0x0000000104982462 chromedriver 586850
8 chromedriver 0x000000010496fc23 chromedriver 511011
9 chromedriver 0x000000010494575e chromedriver 337758
10 chromedriver 0x0000000104946a95 chromedriver 342677
11 chromedriver 0x0000000104d908ab __gxx_personality_v0 305803
12 chromedriver 0x0000000104da7863 __gxx_personality_v0 399939
13 chromedriver 0x0000000104dacc7f __gxx_personality_v0 421471
14 chromedriver 0x0000000104da8bba __gxx_personality_v0 404890
15 chromedriver 0x0000000104d84e51 __gxx_personality_v0 258097
16 chromedriver 0x0000000104dc4158 __gxx_personality_v0 516920
17 chromedriver 0x0000000104dc42e1 __gxx_personality_v0 517313
18 chromedriver 0x0000000104ddb6f8 __gxx_personality_v0 612568
19 libsystem_pthread.dylib 0x00007fff205d18fc _pthread_start 224
20 libsystem_pthread.dylib 0x00007fff205cd443 thread_start 15
(請注意,據我所知,不執行后續rankings =和print陳述句)
我目前的解釋是 selenium 正在執行等待命令就好了,但是因為沒有直接向它提供進一步的指令(即我沒有呼叫click()任何東西)而超時。
我有 RTFM,但是 selenium 檔案非常稀少。真的沒有等待元素加載然后轉移到其他處理任務的概念嗎?我是否必須以某種方式與元素進行互動,如果是這樣,鑒于我真正想要的只是迭代內部元素,最好的互動方式是什么?
uj5u.com熱心網友回復:
大概您使用了錯誤的定位器,因為sc-ljMRFG hgfcNB rankings-table不能是ID屬性的值,而可能是class屬性的值。
所以有效地你需要改變:
wait(driver, 5).until(EC.presence_of_element_located((By.ID, 'sc-ljMRFG hgfcNB rankings-table')))
為visibility_of_element_located()引入WebDriverWait,您可以使用以下任一定位器策略:
使用CLASS_NAME:
wait(driver, 5).until(EC.visibility_of_element_located((By.CLASS_NAME, 'rankings-table')))使用CSS_SELECTOR:
wait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.sc-ljMRFG.hgfcNB.rankings-table')))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/391813.html
標籤:Python 硒 硒网络驱动程序 网页抓取 网络驱动程序等待
