我想抓取第一個圖片帖子并將網址列入黑名單以進行下一次搜索,他跳過已經使用的網址并搜索下一個圖片帖子。我試過這個來找到第一張圖片,但它不起作用。
driver = webdriver.Chrome()
driver.get('https://9gag.com/funny')
time.sleep(2)
driver.find_element(By.XPATH, value='//*[@id="qc-cmp2-ui"]/div[2]/div/button[1]/span').click()
time.sleep(2)
gagpost = driver.find_element(By.CSS_SELECTOR,value=".image-post img")
gagpostsurl = gagpost.get_attribute('src')
gagposttitle = gagpost.get_attribute('alt')
print(gagpostsurl)
print(gagposttitle)
錯誤:回溯(最近一次呼叫最后):檔案“C:\Users\klaus\PycharmProjects\testTEST\main.py”,第 37 行,在 gagposttitle = gagpost.find_element(By,value='img').get_attribute(' alt') 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py”,第 763 行,在 find_element 回傳 self._execute(Command .FIND_CHILD_ELEMENT,檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webelement.py”,第 740 行,在 _execute 中回傳 self。parent.execute(command, params) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\webdriver.py”,第 428 行,在執行回應中= self.command_executor.execute(driver_command, params) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\remote_connection.py”,第 345 行,在執行資料 = utils.dump_json(params) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\site-packages\selenium\webdriver\remote\utils.py”,第 23 行,在dump_json 回傳 json.dumps(json_struct) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\ json_init.py”,第 231 行,在轉儲中回傳 _default_encoder.encode(obj) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\json\encoder.py”,第 199 行,在編碼塊中= self.iterencode(o, _one_shot=True) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\json\encoder.py”,第 257 行,在 iterencode 回傳 _iterencode(o, 0 ) 檔案“C:\Users\klaus\AppData\Local\Programs\Python\Python310\lib\json\ encoder.py ”,第 179 行,默認引發 TypeError( f'Object of type {o.class.name } ' TypeError:型別型別的物件不是 JSON 可序列化的
行程以退出代碼 1 結束
我也試過這個,有時它有效,有時沒有。
driver = webdriver.Chrome()
driver.get('https://9gag.com/funny')
time.sleep(2)
driver.find_element(By.XPATH, value='//*[@id="qc-cmp2-ui"]/div[2]/div/button[1]/span').click()
time.sleep(2)
gagpost = driver.find_element(By.CSS_SELECTOR,value=".image-post img")
gagpostsurl = gagpost.get_attribute('src')
gagposttitle = gagpost.get_attribute('alt')
print(gagpostsurl)
print(gagposttitle)
我將不勝感激任何幫助。
uj5u.com熱心網友回復:
您可以這樣實作:
from selenium.common.exceptions import NoSuchElementException
...
# Get the feed element
feed = driver.find_element(By.CSS_SELECTOR, "div.main-wrap section#list-view-2")
# Get the streams from the feed
streams = feed.find_elements(By.CLASS_NAME, "list-stream")
# Debug number of streams
print(f"Streams: {len(streams)}")
# Iterate over each stream
for stream in streams:
# Find articles within the stream; these are the 'posts'
articles = stream.find_elements(By.TAG_NAME, "article")
# Debug number of articles
print(f"Articles: {len(articles)}")
# Iterate over each article
for article in articles:
# Try/except here because some articles are adverts, these are skipped
try:
# Find the article title
title = article.find_element(By.CSS_SELECTOR, "header > a")
except NoSuchElementException:
continue
# Print the article title
print(f"Title: {title.text}")
這列印出來
Streams: 1
Articles: 3
Title: Hahahahaha Git Gud
Title: How to impress your guests
這并沒有列印出頁面上的所有帖子,因為它們是延遲加載的。這意味著在您滾動時會從服務器獲取帖子。要加載它們,您需要對上述代碼實作滾動功能。幸運的是,Python Selenium 的檔案有一個針對這種特殊情況的示例。您還可以參考我之前的回答,了解實作的外觀。
我只添加了足夠的代碼來獲取標題,您可以從article嵌入式回圈中的變數中提取所需的其余資訊。
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/482285.html
標籤:Python python-3.x 硒 硒网络驱动程序 网络
