我想獲得嵌套在多個頁面上的完全相同的資訊。然后我將 URL 放在一個串列中并撰寫了一個for 回圈來迭代這些頁面。刮板在第一個 URL 上作業正常,但不幸的是在第二個 URL 上卡住了,我得到了一個MaxRetryError.
我對 Selenium 的想法是打開一個頁面,獲取我需要的資訊,將其放入資料框中,然后關閉頁面。然后,打開另一個頁面,獲取類似資訊,附加資料框,關閉頁面等,并將資料框保存為.csv檔案。
這是代碼:
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
PATH = "C:\Program Files (x86)\chromedriver.exe"
driver = webdriver.Chrome(PATH, options=options)
driver.maximize_window()
driver.implicitly_wait(30)
time.sleep(10)
wait = WebDriverWait(driver,30)
# Create the csv at the good place
csv_file = open('\path_to_folder.csv', 'w', newline='')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['titre', 'contrat', 'localisation', 'description'])
# A list of two URL's
listurl = ['https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1','https://candidat.pole-emploi.fr/offres/emploi/ouvrier-agricole/s1m2']
# Loop through the list
for i in listurl:
driver.get(i)
# Click cookies popup
wait.until(EC.element_to_be_clickable((By.LINK_TEXT,"Continuer sans accepter"))).click()
time.sleep(3)
# Get the elements
try:
zone = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "zone-resultats"))
)
offres = zone.find_elements_by_css_selector("div.media-body")
offres2 = zone.find_elements_by_css_selector("div.media-right.media-middle.hidden-xs")
for offre in offres:
titre = (offre.find_element_by_css_selector("h2.t4.media-heading")).text
print(titre)
localisation = (offre.find_element_by_css_selector("span")).text
print(localisation)
description =(offre.find_element_by_class_name("description")).text
print(description)
for offre2 in offres2:
contrat = (offre2.find_element_by_class_name("contrat")).text
print(contrat)
csv_writer.writerow([titre, contrat, localisation, description])
except Exception as ex:
print(ex)
finally:
csv_file.close()
driver.quit()
這是錯誤訊息:
MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=52938): Max retries exceeded with url: /session/cab64f2c3688431768dfcdba1c4ca98f/url (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000021FBC3B6730>: Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it'))
uj5u.com熱心網友回復:
此代碼應該適合您:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = webdriver.ChromeOptions()
# options.add_argument("--incognito")
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
# # Create the csv at the good place
# csv_file = open('\path_to_folder.csv', 'w', newline='')
# csv_writer = csv.writer(csv_file)
# csv_writer.writerow(['titre', 'contrat', 'localisation', 'description'])
data = {
"titre": [],
"contrat": [],
"localisation": [],
"description": []
}
# A list of two URL's
listurl = ['https://candidat.pole-emploi.fr/offres/emploi/horticulteur/s1m1',
'https://candidat.pole-emploi.fr/offres/emploi/ouvrier-agricole/s1m2']
# Loop through the list
for i in listurl:
driver = webdriver.Chrome("D:\chromedriver\94\chromedriver.exe", options=options)
driver.maximize_window()
driver.get(i)
# Click cookies popup
WebDriverWait(driver,30).until(EC.element_to_be_clickable((By.LINK_TEXT,"Continuer sans accepter"))).click()
# Get the elements
try:
zone = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CLASS_NAME, "zone-resultats"))
)
offres = zone.find_elements_by_css_selector("div.media-body")
offres2 = zone.find_elements_by_css_selector("div.media-right.media-middle.hidden-xs")
for offre in offres:
titre = (offre.find_element_by_css_selector("h2.t4.media-heading")).text
print(titre)
localisation = (offre.find_element_by_css_selector("span")).text
print(localisation)
description =(offre.find_element_by_class_name("description")).text
print(description)
for offre2 in offres2:
contrat = (offre2.find_element_by_class_name("contrat")).text
print(contrat)
data["titre"].append(titre)
data["contrat"].append(contrat)
data["localisation"].append(localisation)
data["description"].append(description)
except Exception as ex:
print(ex)
driver.quit()
df = pd.DataFrame.from_dict(data)
print(df)
df.to_csv("data.csv")
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/324928.html
