我想通過分頁導航來提取所有資訊。
分頁部分的源代碼是:
<tr class="pagination" valign="middle" align="center">
<td colspan="9">
<table>
<tbody>
<tr>
<td><span>1</span></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$2')">2</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$3')">3</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$4')">4</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$5')">5</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$6')">6</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$7')">7</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$8')">8</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$9')">9</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$10')">10</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$11')">...</a></td>
<td><a href="javascript:__doPostBack('ctl00$cph1$grdRfqSearch','Page$Last')">Last</a></td>
</tr>
</tbody>
</table>
</td>
</tr>
我的硒腳本是:
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from datetime import date,timedelta
import sqlite3
import os
today = date.today()
yesterday = today - timedelta(days=2)
d3 = yesterday.strftime("%m-%d-%Y")
URL = 'https://www.dibbs.bsm.dla.mil//rfq/rfqdates.aspx?category=recent'
options = webdriver.ChromeOptions()
driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)
driver.get(URL)
driver.find_element_by_id("butAgree").click()
driver.find_element_by_partial_link_text(d3).click()
col1 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[1]')
col2 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[2]')
col3 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[3]')
col4 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[4]')
col5 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[5]')
col6 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[6]')
col7 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[7]')
col8 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[8]')
col9 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[9]')
col1_data = [s.text for s in col1]
col2_data = [s.text for s in col2]
col3_data = [s.text for s in col3]
col4_data = [s.text for s in col4]
col5_data = [s.text for s in col5]
col6_data = [s.text for s in col6]
col7_data = [s.text for s in col7]
col8_data = [s.text for s in col8]
col9_data = [s.text for s in col9]
此代碼從一頁中提取資料,我想從分頁中列出的所有頁面中提取資料。
uj5u.com熱心網友回復:
您可以將代碼包裝在 while 回圈中,變數pagination_starting_point和初始值可以設定為 2,并且每次迭代我們都會增加計數器。
代碼 :
driver = webdriver.Chrome(driver_path)
driver.maximize_window()
driver.implicitly_wait(30)
wait = WebDriverWait(driver, 30)
today = date.today()
yesterday = today - timedelta(days=2)
d3 = yesterday.strftime("%m-%d-%Y")
URL = 'https://www.dibbs.bsm.dla.mil//rfq/rfqdates.aspx?category=recent'
driver.get(URL)
driver.find_element_by_id("butAgree").click()
driver.find_element_by_partial_link_text(d3).click()
pagination_starting_point = 2
while True:
col1 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[1]')
col2 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[2]')
col3 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[3]')
col4 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[4]')
col5 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[5]')
col6 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[6]')
col7 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[7]')
col8 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[8]')
col9 = driver.find_elements_by_xpath('//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[9]')
col1_data = [s.text for s in col1]
col2_data = [s.text for s in col2]
col3_data = [s.text for s in col3]
col4_data = [s.text for s in col4]
col5_data = [s.text for s in col5]
col6_data = [s.text for s in col6]
col7_data = [s.text for s in col7]
col8_data = [s.text for s in col8]
col9_data = [s.text for s in col9]
wait.until(EC.element_to_be_clickable((By.XPATH, f"//a[contains(@href,'Page${pagination_starting_point}')]"))).click()
print("Click on page " pagination_starting_point)
pagination_starting_point = pagination_starting_point 1
if pagination_starting_point == 45:
break
進口:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
更新為使用 xpath//a[contains(@href,'Page${pagination_starting_point}')]而不是頁碼。
uj5u.com熱心網友回復:
不要這樣組織你的代碼,避免重復使用函式或串列推導式:
all_cols = [driver.find_elements_by_xpath(f'//table[@id="ctl00_cph1_grdRfqSearch"]/tbody/tr/td[{i}]') for i in range(1,10)]
all_cols_data = [[s.text for s in col] for col in all_cols]
現在您可以通過索引訪問您的資料
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/338297.html
標籤:Python 蟒蛇-3.x 硒 硒网络驱动程序 网页抓取
上一篇:NaukriApi更新簡歷
