我目前正在創建一個腳本來抓取 Indeed 上的招聘資訊,該腳本將捕獲職位、公司、地點和職位描述。目前我的腳本將遍歷前五頁并列印出每頁的資料幀。但是,我的第 2 頁資料框僅包含 15 個職位發布中的 3 個。我認為這可能是由于彈出框要求您提供電子郵件。為了解決這個問題,我嘗試合并一個 .click 以退出彈出視窗。不幸的是,這導致了“超時例外”的回傳。我添加了 element = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CLASS_NAME, "popover-x-button-close icl-CloseButton"))) 希望它能解決這個問題,但沒有骰子所以遠的。此外,當我匯出到 CSV 時,放入 CSV 的唯一結果頁是第 5 頁。我在下面包含了我的代碼。如果這些是非常簡單的問題,我很抱歉,我三天前才開始學習Python,以便進行作業代碼研究。先感謝您!
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
options = Options()
options.add_argument("window-size=1400,1400")
PATH = "C://Program Files (x86)//chromedriver.exe"
driver = webdriver.Chrome(PATH)
for i in range(0,50,10):
driver.get('https://www.indeed.com/jobs?q=chemical engineer&l=united states&start=' str(i))
driver.implicitly_wait(5)
jobtitles = []
companies = []
locations = []
descriptions = []
jobs = driver.find_elements_by_class_name("slider_container")
for job in jobs:
jobtitle = job.find_element_by_class_name('jobTitle').text.replace("new", "").strip()
jobtitles.append(jobtitle)
company = job.find_element_by_class_name('companyName').text.replace("new", "").strip()
companies.append(company)
location = job.find_element_by_class_name('companyLocation').text.replace("new", "").strip()
locations.append(location)
description = job.find_element_by_class_name('job-snippet').text.replace("new", "").strip()
descriptions.append(description)
element = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CLASS_NAME, "popover-x-button-close icl-CloseButton")))
close_popup = driver.find_element_by_class_name("popover-x-button-close icl-CloseButton")
close_popup.click()
df_da=pd.DataFrame()
df_da['JobTitle']=jobtitles
df_da['Company']=companies
df_da['Location']=locations
df_da['Description']=descriptions
print(df_da)
df_da.to_csv('C:/Users/Dan/Desktop/AZNext/file_name1.csv')
uj5u.com熱心網友回復:
這里有幾個問題:
- 彈出視窗僅出現一次,僅在您每次回圈迭代等待此元素時出現在第二頁上。你應該檢查這個元素是否出現,并且只有當它出現點擊它時。否則只能通過。
- 這個元素有幾個類名屬性。因此,您應該使用 CSS Selector 或 XPath 來定位它,而不是
by_class_name因為此方法接受單個類名,而不是由空格分隔的類名序列。 - 可以
click()直接在回傳的WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button.popover-x-button-close.icl-CloseButton")))元素上使用method ,不需要再次獲取這個元素driver.find_element
我建議如下:
import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
options = Options()
options.add_argument("window-size=1400,1400")
PATH = "C://Program Files (x86)//chromedriver.exe"
driver = webdriver.Chrome(PATH)
for i in range(0,50,10):
driver.get('https://www.indeed.com/jobs?q=chemical engineer&l=united states&start=' str(i))
driver.implicitly_wait(5)
jobtitles = []
companies = []
locations = []
descriptions = []
jobs = driver.find_elements_by_class_name("slider_container")
for job in jobs:
jobtitle = job.find_element_by_class_name('jobTitle').text.replace("new", "").strip()
jobtitles.append(jobtitle)
company = job.find_element_by_class_name('companyName').text.replace("new", "").strip()
companies.append(company)
location = job.find_element_by_class_name('companyLocation').text.replace("new", "").strip()
locations.append(location)
description = job.find_element_by_class_name('job-snippet').text.replace("new", "").strip()
descriptions.append(description)
try:
WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button.popover-x-button-close.icl-CloseButton"))).click()
except:
pass
df_da=pd.DataFrame()
df_da['JobTitle']=jobtitles
df_da['Company']=companies
df_da['Location']=locations
df_da['Description']=descriptions
print(df_da)
df_da.to_csv('C:/Users/Dan/Desktop/AZNext/file_name1.csv')
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/385897.html
上一篇:如何解決串列索引超出范圍錯誤?
