我有一個使用“div”元素創建的表,它具有基于選擇的動態內容以及使用 javascript 生成的要顯示的資料。html結構是這樣的:
<div class="container-jKD0Exn-">
<div class="shrinkShadowPosition-OFmmj-q_">
<div class="shrinkShadowWrap-OFmmj-q_">
<div class="shrinkShadow-OFmmj-q_">
</div></div></div>
<div class="titleWrap-jKD0Exn-" style="box-shadow:none">
<div class="offsetPadding-jKD0Exn-" style="width:0"></div>
<span class="title-jKD0Exn- apply-overflow-tooltip">Total common shares outstanding</span></div>
<div class="filling-jKD0Exn-"></div>
<div class="values-jKD0Exn- values-ZmRZjHnV">
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>?22.32B?</div>
</div></div>
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV">
<div>?21.34B?</div>
</div></div>
<div class="value-25PNPwRV">
<div class="wrap-25PNPwRV"><div>?20.50B?</div>
</div></div>
使用下面的 python 代碼,結果是這樣的: 已發行普通股總數22.32B21.34B20.50B19.02B17.77B16.98B16.43B16.33B 相反我會在這樣的串列或 dtaframe 中:
['Total common shares outstanding?',22.32,21.34,??20.50B?,19.02,17.77,??16.98B?,16.43,??16.33,]
我用來抓取資料的 Python 代碼是這樣的:
from selenium import webdriver
import pandas as pd
import requests, bs4
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
url ='https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/'
driver = webdriver.Chrome('chromedriver',options=options)
driver.get(url)
html = driver.page_source
#print(html)
soup = bs4.BeautifulSoup(html, 'html.parser')
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
print(title.text '\n')
selenium 或 beautifulsoap 有什么方法可以得到這樣的串列嗎?
uj5u.com熱心網友回復:
作為一種方法,如果沒有 api,您應該更喜歡使用什么,您可以使用BeautifulSoupand stripped_strings:
data = []
for title in soup.find_all("div", {"class": "container-jKD0Exn-"}):
data.append(list(title.stripped_strings))
pd.DataFrame(data)
輸出資料框:
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
|---|---|---|---|---|---|---|---|---|---|
| 關鍵資料 | |||||||||
| 已發行普通股總數 | 22.32B | 21.34B | 20.50B | 19.02B | 17.77B | 16.98B | 16.43B | 16.33B | |
| 流通股流通股 | 22.29B | 21.32B | 20.48B | 18.99B | 17.75B | 16.96B | 16.41B | 16.32B | |
| 在職員工人數 | 110.00K | 116.00K | 123.00K | 132.00K | 137.00K | 147.00K | 154.00K | — | |
| 股東人數 | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | 23.50K | — | |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
uj5u.com熱心網友回復:
使用Selenium列印所需的文本,您必須為visibility_of_all_elements_located()誘導WebDriverWait,您可以使用以下定位器策略:
使用xpath:
driver.get("https://www.tradingview.com/symbols/NASDAQ-AAPL/financials-statistics-and-ratios/") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//span[text()='Accept']"))).click() df = pd.DataFrame([my_elem.text.replace('\u202a', ' ').replace('\u202c', ' ') for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[text()='Total common shares outstanding']//following::div[2]//div[starts-with(@class, 'wrap')]/div")))], columns = ['Total common shares outstanding']) print(df) driver.quit()控制臺輸出:
Total common shares outstanding 0 22.32B 1 21.34B 2 20.50B 3 19.02B 4 17.77B 5 16.98B 6 16.43B 7 16.33B注意:您必須添加以下匯入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/408591.html
標籤:
