我正在練習一些 Python 抓取,但我對以下練習有些困惑。目的是刮掉應用一些過濾器時產生的代碼。代碼如下:
url = ("http://finviz.com/quote.ashx?t=" 'ttd')
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
html = soup(webpage, "html.parser")
tickers = []
counter = 1
while True:
url = ("https://finviz.com/screener.ashx?v=111&f=cap_large&r=" str(counter))
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
html = soup(webpage, "html.parser")
rows = html.select('table[bgcolor="#d3d3d3"] tr')
for i in rows[1:]:
a1, a2, a3, a4 = (x.text for x in i.find_all('td')[1:5])
i = a1
tickers.append(i)
counter =20
if tickers[-1]==tickers[-2]:
break
我不確定如何只提取 1 列,所以我使用所有這些的代碼 (a1, a2, a3, a4 = (x.text for x in i.find_all('td')[1:5] )),有沒有辦法獲得第一列?
有沒有辦法避免在腳本中硬編碼“20”?
當我運行代碼時,它會創建最后一個代碼的副本,是否有另一種方法可以讓代碼在遍歷所有條目時停止?
謝謝
uj5u.com熱心網友回復:
您可以使用 nth-child 范圍過濾掉表格中的第一行,然后使用 nth-child(2) 來獲取剩余表格行中的行情列
tickers = [td.text for td in html.select('table[bgcolor="#d3d3d3"] tr:nth-child(n 2) td:nth-child(2)')]
使用現有串列
tickers.extend([td.text for td in html.select('table[bgcolor="#d3d3d3"] tr:nth-child(n 2) td:nth-child(2)')])
在此處閱讀有關 nth-child 的資訊:
http://nthmaster.com/
和
https://developer.mozilla.org/en-US/docs/Web/CSS/:nth-child
uj5u.com熱心網友回復:
因此,您只對股票行情列的值感興趣,請更具體地選擇它 - 根據其內容<a>:
html.select('table[bgcolor="#d3d3d3"] a.screener-link-primary')
為避免使用硬編碼的20jsut,請查看是否有下一頁元素并使用其href:
html.select_one('.tab-link:-soup-contains("next")')
例子
import requests,time
from bs4 import BeautifulSoup
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36','accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9'}
tickers = []
url = ("https://finviz.com/screener.ashx?v=111&f=cap_large")
r = requests.get(url, headers=headers)
html = BeautifulSoup(r.text, "html.parser")
for a in html.select('table[bgcolor="#d3d3d3"] a.screener-link-primary'):
tickers.append(a.text)
while url if (url := html.select_one('.tab-link:-soup-contains("next")')) else False:
url = ("https://finviz.com/" url['href'])
r = requests.get(url, headers=headers)
html = BeautifulSoup(r.text, "html.parser")
for a in html.select('table[bgcolor="#d3d3d3"] a.screener-link-primary'):
tickers.append(a.text)
# be kind and add some delay between your requests
time.sleep(1)
tickers
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/397234.html
上一篇:py混淆請幫忙
