我正在嘗試從檢測現場足球賠率下降的網站上抓取一些資料,如果頁面的 HTML 發生特定更改,它會向我發送通知給我制作的 Telegram 機器人......這里是我的代碼:
from distutils.command.clean import clean
import time
import requests
from bs4 import BeautifulSoup as bs
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
ids_list=[]
game_urls=[]
game_name=[]
gfix=[]
livecapper_url ="https://livecapper.ru/bet365/" #the website link
while(True):
page=requests.get(livecapper_url,verify=False).text
soup = bs(page , "html.parser")
game_ids = soup.find_all(game_id=True) #getting the IDs of every football game
for g in game_ids:
x=g.get('game_id')
ids_list.append(x) #putting the IDs on a list
for id in ids_list:
game_url = f"https://livecapper.ru/bet365/event.php?id={id}" #the URL of every single football game
game_urls.append(game_url)
for g in game_urls:
response=requests.get(g).text
soup = bs(response, "html.parser")
for t in soup.find_all("td",class_=['red1','red2','red3'], limit=1): #detecting the change in HTML
for g in soup.find_all("h1"):
game_name.append(g.get_text()) if g.get_text() not in game_name else game_name
for f in game_name:
game_url= 'https://api.telegram.org/botTOKEN/sendMessage?chat_id=-609XXXXXX&text=Fixed Alert : {}'.format(f) #sending notification to telegram bot
if game_url not in gfix:
gfix.append(game_url)
requests.get(game_url)
else:
pass
ids_list.clear
game_name.clear
game_urls.clear
time.sleep(1)
如您所見,我正在使用該While (True):方法 24/7 運行代碼,但問題是每次迭代的持續時間大約是前一次迭代的兩倍。
例如第一次迭代=10s | 第二次迭代=20s | 第三次迭代=40s | 第四次迭代=80s
我能做些什么來讓所有的迭代盡可能快地作業?
uj5u.com熱心網友回復:
更改這些:
ids_list.clear
game_name.clear
game_urls.clear
到:
ids_list.clear()
game_name.clear()
game_urls.clear()
沒有括號,您不會呼叫方法,而只是訪問它們然后丟棄它們(即,它什么都不做)。
uj5u.com熱心網友回復:
代碼有很多問題,但最終每次花費更長的時間的原因是您繼續追加到串列中,因此在每次迭代之后,串列會變得越來越大(包括重復項)。你可以做幾件事:
- 將那些初始的空串列放在你的回圈中
- 從串列中洗掉重復項,因此它不會在每次迭代中多次請求相同的內容
- 正確使用
.clear()
我只是做了 1,因為看起來你想要的是用一個清晰??的串列開始每次迭代。
from distutils.command.clean import clean
import time
import requests
from bs4 import BeautifulSoup as bs
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
gfix=[]
livecapper_url ="https://livecapper.ru/bet365/" #the website link
while(True):
ids_list=[]
game_urls=[]
game_name=[]
page=requests.get(livecapper_url,verify=False).text
soup = bs(page , "html.parser")
game_ids = soup.find_all(game_id=True) #getting the IDs of every football game
for g in game_ids:
x=g.get('game_id')
ids_list.append(x) #putting the IDs on a list
for id in ids_list:
game_url = f"https://livecapper.ru/bet365/event.php?id={id}" #the URL of every single football game
game_urls.append(game_url)
for g in game_urls:
response=requests.get(g).text
soup = bs(response, "html.parser")
for t in soup.find_all("td",class_=['red1','red2','red3'], limit=1): #detecting the change in HTML
for g in soup.find_all("h1"):
game_name.append(g.get_text()) if g.get_text() not in game_name else game_name
for f in game_name:
game_url= 'https://api.telegram.org/botTOKEN/sendMessage?chat_id=-609XXXXXX&text=Fixed Alert : {}'.format(f) #sending notification to telegram bot
if game_url not in gfix:
gfix.append(game_url)
requests.get(game_url)
else:
pass
time.sleep(1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/446703.html
標籤:Python 循环 网页抓取 美丽的汤 while循环
