我想用 beautifulsoup4 和 requests 撰寫一個 websraper。它在特定表上抓取特定表的特定列的資料。它刮一次,等待一段時間,再刮一次,然后比較兩個“刮”。如果有差異,則列印"something has changed",如果沒有差異,則列印"no changes"
這是整個代碼:
import requests
import time
from bs4 import BeautifulSoup
URL = "https://website.com"
website = requests.get(URL)
soup = BeautifulSoup(website.content, "html.parser")
data = []
table = soup.find("table", class_="table table-bordered table-sm table-responsive")
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
cols = row.find_all('td')[0]
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele]) # Get rid of empty values
cols2 = row.find_all('td')[1]
cols2 = [ele.text.strip() for ele in cols2]
data.append([ele for ele in cols2 if ele]) # Get rid of empty values
cols3 = row.find_all('td')[2]
cols3 = [ele.text.strip() for ele in cols3]
data.append([ele for ele in cols3 if ele]) # Get rid of empty values
cols4 = row.find_all('td')[3]
cols4 = [ele.text.strip() for ele in cols4]
data.append([ele for ele in cols4 if ele])
cols5 = row.find_all('td')[5]
cols5 = [ele.text.strip() for ele in cols5]
data.append([ele for ele in cols5 if ele])
print(cols, cols2, cols3, cols4, cols5)
time.sleep(600)
for row in rows:
cols11 = row.find_all('td')[0]
cols11 = [ele.text.strip() for ele in cols11]
data.append([ele for ele in cols11 if ele]) # Get rid of empty values
cols22 = row.find_all('td')[1]
cols22 = [ele.text.strip() for ele in cols22]
data.append([ele for ele in cols22 if ele]) # Get rid of empty values
cols33 = row.find_all('td')[2]
cols33 = [ele.text.strip() for ele in cols33]
data.append([ele for ele in cols33 if ele]) # Get rid of empty values
cols44 = row.find_all('td')[3]
cols44 = [ele.text.strip() for ele in cols44]
data.append([ele for ele in cols44 if ele])
cols55 = row.find_all('td')[5]
cols55 = [ele.text.strip() for ele in cols55]
data.append([ele for ele in cols55 if ele])
print(cols11, cols22, cols33, cols44, cols55)
if(cols == cols11, cols2 == cols22, cols5 == cols55):
print("no changes")
else:
print("something has changed")
問題是:它總是說"no changes"即使我知道有些事情發生了變化。如何解決這個問題?
uj5u.com熱心網友回復:
雖然可以通過這種方式比較串列,但不清楚您是如何得出可以在條件中使用逗號,代替邏輯 AND&&運算子的結論的if。
通過將條件括在括號中()并用逗號,將它們連接起來(似乎無意中),您在這里所做的是創建一個tuple結構;所有非空tuples 評估為True。因此,您的腳本會不斷點擊您認為只有在您的任何資料結構之間沒有更改時才應輸入的邏輯分支。
相反,&&按照您的意圖正確使用邏輯 AND (并且不要將真值本身轉換為元組):
if cols == cols11 && cols2 == cols22 && cols5 == cols55:
print("no changes")
else:
print("something has changed")
與您的問題的核心相切,但您的代碼將受益于 (a) 以更具描述性的方式命名您的變數,以及 (b) 使用更適合您的用例的資料型別,而不是引入全新的編號變數每個索引和不必要的重復代碼。
uj5u.com熱心網友回復:
除了其他人所說的,您必須在暫停一段時間后再次向該 URL 發出 GET 請求,以便檢測網頁資料的任何更改。
你正在做的是:
- 向 URL 發出 GET 請求
- 創建
soup回應物件。 - 從 中提取資料
soup并將它們存盤在變數中。 - 暫停一會兒——
time.sleep(600) - 再次從相同的資訊中提取相同的資訊
soup- (這將始終相等)而不發出任何新的 GET 請求。
因此,您需要在time.sleep(600)陳述句之后立即添加此代碼以從網頁(如果有)獲取任何修改過的資料。
URL = "https://website.com"
website = requests.get(URL)
soup = BeautifulSoup(website.content, "html.parser")
table = soup.find("table", class_="table table-bordered table-sm table-responsive")
table_body = table.find('tbody')
rows = table_body.find_all('tr')
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/381181.html
