我正在嘗試在多個元素上使用 Selenium 抓取(出于個人學習原因,因此出于個人教學原因,非營利性)。具有多個刮取元素的多次刮取,這些元素創建了一個適合資料庫的行。到目前為止,我從未創建過多次抓取,但我總是抓取單個元素。所以代碼中存在一些問題。
我想為錦標賽的每一輪(第 1 輪、第 2 輪等)創建這一行:Round, Date, Team_Home, Team_Away, Result_Home, Result_Away。詳細地說,僅供參考并為您提供更好的主意,每個錦標賽回合將有 8 行。總轉數為 26。我沒有收到任何錯誤,但輸出只是 >>>。我只收到這個 >>>,沒有文本或錯誤。
PS:請求和代碼的目的僅用于個人學習原因,因此出于個人教學原因,沒有任何收益。此問題和此代碼不用于商業或盈利目的。
我想得到,例如,這個:
#SWEDEN ALLSVENKAN
#Round, Date, Team_Home, Team_Away, Result_Home, Result_Away
Round 1, 11/31/2021 20:45, AIK Stockholm, Malmo, 2, 1
Round 1, 11/31/2021 20:45, Elfsborg, Gothenburg, 2, 3
...and the rest of the other matches of the 1st round
Round 2, 06/12/2021 20:45, Gothenburg, AIK Stockholm, 0, 1
Round 2, 06/12/2021 20:45, Malmo, Elfsborg, 1, 1
...and the rest of the other matches of the 2st round
Round 3, etc.
用于抓取的 Python 代碼:
Values_Allsvenskan = []
#SCRAPING
driver.get("link")
driver.implicitly_wait(12)
driver.minimize_window()
for Allsvenskan in multiple_scraping:
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except:
pass
multiple_scraping = round, date, team_home, team_away, score_home, score_away
#row/record
round = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__time']")
team_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "a[href^='/squadra'][class^='event__score event__score--away']")
Allsvenskan_text = round.text, date.text, team_home.text, team_away.text, score_home.text, score_away.text
Values_Allsvenskan.append(tuple([Allsvenskan_text]))
print(Allsvenskan_text)
driver.close
#INSERT IN DATABASE
con = sqlite3.connect('/database.db')
cursor = con.cursor()
sqlite_insert_query_Allsvenskan = 'INSERT INTO All_Score (round, date, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?);'
cursor.executemany(sqlite_insert_query_Allsvenskan, Values_Allsvenskan)
con.commit()
根據我的 python 代碼,你能告訴我如何修復和修復代碼嗎?謝謝
更新插入資料庫
#INSERT IN DATABASE
con = sqlite3.connect('database.db')
cursor = con.cursor()
sqlite_insert_query_Allsvenskan = 'INSERT INTO All_Score(current_round, date, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?);'
cursor.executemany(sqlite_insert_query_Allsvenskan, results = [])
con.commit()
邏輯代碼的最終更新,在最終答案之后:我只添加了解釋步驟的評論。 如果我錯過了評論或需要添加一些內容,請繼續。我想確保我理解代碼的邏輯
#I search for rows with event__round or event__match
all_rows = driver.find_elements(By.CSS_SELECTOR, "div[class^='event__round'],div[class^='event__match']")
#Initializing an empty list
results = []
#Value default of the round before the for loop
current_round = '?'
#Check which classes of event__round and event__match have lines. It is used to recognize the row with Round?????
for row in all_rows:
classes = row.get_attribute ('class')
## If round number and match both have rows, then I use find_element to get the rest of the other data to scrape
if.........
else.....
uj5u.com熱心網友回復:
您用于find_elements獲取包含 all rounds、 all date、 all team_home、 allteam_away等的串列,因此您在分隔串列中有值,您應該使用zip()[ single round, single date, single team_home, ...]等串列對值進行分組
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
我跳過了,round因為它會產生更多問題,需要完全不同的代碼。
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
round = driver.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
date = driver.find_elements(By.CSS_SELECTOR, "[class^='event__time']") #data e ora è tutto un pezzo su diretta.it
team_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = driver.find_elements(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
results = []
for row in zip(date, team_home, team_away, score_home, score_away):
row = [item.text for item in row]
print(row)
results.append(row)
結果:
['01.11. 19:00', 'Degerfors', 'G?teborg', '0', '1']
['01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['31.10. 17:30', '?rebro', 'Djurgarden', '0', '1']
['31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['30.10. 15:00', 'Varbergs', '?stersunds', '3', '0']
['28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['28.10. 19:00', 'G?teborg', 'Djurgarden', '3', '0']
['28.10. 19:00', 'Halmstad', '?rebro', '1', '1']
['28.10. 19:00', 'Norrkoping', 'Mjallby', '2', '2']
['27.10. 19:00', 'Kalmar', 'Varbergs', '2', '2']
['27.10. 19:00', 'Malmo FF', 'AIK Stockholm', '1', '0']
['27.10. 19:00', '?stersunds', 'Hacken', '1', '1']
['27.10. 19:00', 'Sirius', 'Hammarby', '0', '1']
['25.10. 19:00', '?rebro', 'Degerfors', '1', '2']
['24.10. 17:30', 'AIK Stockholm', 'Norrkoping', '1', '0']
...
但是這種方法有時可能會產生問題 - 如果某行有空位,那么它會將值從下一行移動到當前行,等等。這樣它就可以創建錯誤的行。
更好的是找到所有行(div或tr in table),然后使用for-loop單獨處理每一行并使用row.find_elements而不是driver.find_elements. 這也應該解決round需要讀取值并稍后在下一行中復制它的問題。
I search rows with event__round or event__match and next I check what classes has row. If it has event__round then I get round. If it has event__match then I use find_element without s at the end to get single date, single team_home, single team_away, etc (because in single row there are only single values) and use them with current_round to create row.
import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://www.diretta.it/calcio/svezia/allsvenskan/risultati/")
driver.implicitly_wait(12)
#driver.minimize_window()
wait = WebDriverWait(driver, 10)
try:
wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button[id='event__more event__more--static']"))).click()
except Exception as ex:
print('EX:', ex)
all_rows = driver.find_elements(By.CSS_SELECTOR, "div[class^='event__round'],div[class^='event__match']")
results = []
current_round = '?'
for row in all_rows:
classes = row.get_attribute('class')
#print(classes)
if 'event__round' in classes:
#round = row.find_elements(By.CSS_SELECTOR, "[class^='event__round event__round--static']")
#current_round = row.text # full text `Round 20`
current_round = row.text.split(" ")[-1] # only `20` without `Round`
else:
datetime = row.find_element(By.CSS_SELECTOR, "[class^='event__time']")
date, time = datetime.text.split(" ")
date = date.rstrip('.') # right-strip to remove `.` at the end of date
team_home = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--home']")
team_away = row.find_element(By.CSS_SELECTOR, "[class^='event__participant event__participant--away']")
score_home = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--home']")
score_away = row.find_element(By.CSS_SELECTOR, "[class^='event__score event__score--away']")
# old version
#row = [current_round, datetime.text, team_home.text, team_away.text, score_home.text, score_away.text]
row = [current_round, date, time, team_home.text, team_away.text, score_home.text, score_away.text]
results.append(row)
print(row)
# --- database ---
import sqlite3
con = sqlite3.connect('database.db')
cursor = con.cursor()
query = 'DROP TABLE IF EXISTS All_Score;'
cursor.execute(query)
# old version - with only `date`
#query = 'CREATE TABLE IF NOT EXISTS All_Score(current_round, date, team_home, team_away, score_home, score_away);'
# new version - with `date` and `time`
query = 'CREATE TABLE IF NOT EXISTS All_Score(current_round, date, time, team_home, team_away, score_home, score_away);'
cursor.execute(query)
# old version - with only `date`
#query = 'INSERT INTO All_Score(current_round, date, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?);'
# new version - with `date` and `time`
query = 'INSERT INTO All_Score(current_round, date, time, team_home, team_away, score_home, score_away) VALUES (?, ?, ?, ?, ?, ?, ?);'
cursor.executemany(query, results)
con.commit()
Result:
['Giornata 26', '01.11. 19:00', 'Degerfors', 'G?teborg', '0', '1']
['Giornata 26', '01.11. 19:00', 'Halmstad', 'AIK Stockholm', '1', '0']
['Giornata 26', '01.11. 19:00', 'Mjallby', 'Hammarby', '2', '0']
['Giornata 26', '31.10. 17:30', '?rebro', 'Djurgarden', '0', '1']
['Giornata 26', '31.10. 15:00', 'Norrkoping', 'Elfsborg', '3', '2']
['Giornata 26', '30.10. 17:30', 'Hacken', 'Kalmar', '1', '4']
['Giornata 26', '30.10. 15:00', 'Sirius', 'Malmo FF', '2', '3']
['Giornata 26', '30.10. 15:00', 'Varbergs', '?stersunds', '3', '0']
['Giornata 25', '28.10. 19:00', 'Degerfors', 'Elfsborg', '1', '2']
['Giornata 25', '28.10. 19:00', 'G?teborg', 'Djurgarden', '3', '0']
['Giornata 25', '28.10. 19:00', 'Halmstad', '?rebro', '1', '1']
# ...
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/349834.html
