我正在嘗試從 EDGAR中抓取一份清單。
我需要的資訊(例如“物體名稱”)在“td”類中。但是,我目前擁有的代碼沒有回傳任何內容。我將不勝感激任何幫助。提前致謝!
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup
s = Service('/PATH/chromedriver')
driver = webdriver.Chrome(service=s)
driver.get("https://www.sec.gov/edgar/search/#/q=%22cyber%20insurance%22&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K")
try:
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, 'entity-name')))
except TimeoutException:
print('Page timed out after 10 secs.')
page = BeautifulSoup(driver.page_source,'html.parser')
print(page)
uj5u.com熱心網友回復:
要從entity-name列而不是present_of_all_elements_located ()中提取文本,您必須為visibility_of_all_elements_located()引入WebDriverWait ,并且可以使用以下任一定位器策略:
使用CSS_SELECTOR和text屬性:
driver.get('https://www.sec.gov/edgar/search/#/q=%22cyber%20insurance%22&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K') print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "td.entity-name")))])使用XPATH和
get_attribute("innerHTML"):driver.get('https://www.sec.gov/edgar/search/#/q=%22cyber%20insurance%22&dateRange=custom&category=form-cat1&startdt=2011-01-01&enddt=2022-03-12&filter_forms=10-K') print([my_elem.get_attribute("innerHTML") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//td[@class='entity-name']")))])控制臺輸出:
['Excel Corp ', 'PROGRESSIVE CORP/OH/ (PGR) ', 'Electromed, Inc. (ELMD) ', 'HOOKER FURNITURE CORP (HOFT) ', 'HOOKER FURNITURE CORP (HOFT) ', 'SOUTHERN CO (SO, SOJA, SOJB, SOJC, SOJD, SOLN) <br> ALABAMA POWER CO (ALPVN, APRCP, APRDM, APRDN, APRDO, APRDP, ALP-PQ) <br> GEORGIA POWER CO (GPJA) <br> MISSISSIPPI POWER CO <br> SOUTHERN Co GAS <br> SOUTHERN POWER CO ', 'HOOKER FURNITURE CORP (HOFT) ', 'SOUTHERN CO (SO, SOJA, SOJB, SOJC, SOJD, SOLN) <br> ALABAMA POWER CO (ALPVN, APRCP, APRDM, APRDN, APRDO, APRDP, ALP-PQ) <br> GEORGIA POWER CO (GPJA) <br> MISSISSIPPI POWER CO <br> SOUTHERN Co GAS <br> SOUTHERN POWER CO ', 'BENCHMARK ELECTRONICS INC (BHE) ', 'MARRIOTT INTERNATIONAL INC /MD/ (MAR) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'CF BANKSHARES INC. (CFBK) ', 'Repay Holdings Corp (RPAY) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'MARRIOTT INTERNATIONAL INC /MD/ (MAR) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'Albertsons Companies, Inc. (ACI) ', 'MARRIOTT INTERNATIONAL INC /MD/ (MAR) ', 'MARRIOTT INTERNATIONAL INC /MD/ (MAR) ', 'HENNESSY ADVISORS INC (HNNA) ', 'Repay Holdings Corp (RPAY, RPAYW) ', 'Repay Holdings Corp (RPAY, RPAYW, TBRGU) ', 'Arlo Technologies, Inc. (ARLO) ', 'Repay Holdings Corp (RPAY, RPAYW) ', 'NATIONAL HEALTH INVESTORS INC (NHI) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'RGC RESOURCES INC (RGCO) ', 'Arlo Technologies, Inc. (ARLO) ', 'CRYOLIFE INC (CRY) ', 'Mimecast Ltd (MIME) ', 'RGC RESOURCES INC (RGCO) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'NOODLES & Co (NDLS) ', 'PAPA JOHNS INTERNATIONAL INC (PZZA) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'PAPA JOHNS INTERNATIONAL INC (PZZA) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'MOTORCAR PARTS AMERICA INC (MPAA) ', 'GARMIN LTD (GRMN) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'nDivision Inc. (NDVN) ', 'nDivision Inc. (NDVN) ', 'nDivision Inc. (NDVN) ', 'WEYCO GROUP INC (WEYS) ', 'DiamondRock Hospitality Co (DRH) ', 'Pebblebrook Hotel Trust (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'Sprouts Farmers Market, Inc. (SFM) ', 'MYR GROUP INC. (MYRG) ', 'Chatham Lodging Trust (CLDT, CLDT-PA) ', 'WEYCO GROUP INC (WEYS) ', 'INFINITE GROUP INC (IMCI) ', 'DiamondRock Hospitality Co (DRH) ', 'Pebblebrook Hotel Trust (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'DiamondRock Hospitality Co (DRH, DRH-PA) ', 'Pebblebrook Hotel Trust (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'DLH Holdings Corp. (DLHC) ', 'Summit Hotel Properties, Inc. (INN) ', 'BOYD GAMING CORP (BYD) ', 'Summit Hotel Properties, Inc. (INN) ', 'DiamondRock Hospitality Co (DRH, DRH-PA) ', 'CINCINNATI FINANCIAL CORP (CINF) ', 'Summit Hotel Properties, Inc. (INN) ', 'Pebblebrook Hotel Trust (PEB, PEB-PC, PEB-PD, PEB-PE, PEB-PF) ', 'ARTIVION, INC. (AORT) ', 'STAR GROUP, L.P. (SGU) ', 'Pebblebrook Hotel Trust (PEB, PEB-PE, PEB-PF, PEB-PG, PEB-PH) ', 'RGC RESOURCES INC (RGCO) ', 'INFINITE GROUP INC (IMCI) ', 'LEGGETT & PLATT INC (LEG) ', 'RGC RESOURCES INC (RGCO) ', 'COSTCO WHOLESALE CORP /NEW (COST) ', 'DLH Holdings Corp. (DLHC) ', 'CANTERBURY PARK HOLDING CORP ', 'WEYCO GROUP INC (WEYS) ', 'DLH Holdings Corp. (DLHC) ', 'WEYCO GROUP INC (WEYS) ', 'Canterbury Park Holding Corp (CPHC) ', 'RGC RESOURCES INC (RGCO) ', 'IEC ELECTRONICS CORP (IEC) ', 'INFINITE GROUP INC (IMCI) ', 'Canterbury Park Holding Corp (CPHC) ', 'WEYCO GROUP INC (WEYS) ', 'Canterbury Park Holding Corp (CPHC) ', 'AMERICAN STATES WATER CO (AWR) <br> Golden State Water CO ', 'LEGGETT & PLATT INC (LEG) ', 'Vy Global Growth (VYGG, VYGG-UN, VYGG-WT) ', 'Summit Hotel Properties, Inc. (INN) ', 'Vy Global Growth (VYGG, VYGG-UN, VYGG-WT) ', 'Sunstone Hotel Investors, Inc. (SHO, SHO-PE, SHO-PF) ', 'CRYOLIFE INC (CRY) ', 'BOYD GAMING CORP (BYD) ', 'Sunstone Hotel Investors, Inc. (SHO, SHO-PE, SHO-PF) ', 'Summit Hotel Properties, Inc. (INN, INN-PE, INN-PF) ', 'Green Bancorp, Inc. (GNBC) ', 'TELKONET INC (TKOI) ', 'COHEN & STEERS INC (CNS) ', 'Sunstone Hotel Investors, Inc. (SHO, SHO-PE, SHO-PF) ', 'Green Bancorp, Inc. (GNBC) ']注意:您必須添加以下匯入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/443346.html
標籤:Python 硒 路径 css 选择器 网络驱动程序等待
上一篇:如何使用BeautifulSoup從網站獲取不可見的資料
下一篇:如何處理隱藏的選擇下拉選單
