所以我現在寫了這段代碼:
from urllib import request
from bs4 import BeautifulSoup
import requests
import csv
import re
serch_term = input('What News are you looking for today? ')
url = f'https://edition.cnn.com/search?q={serch_term}'
page = requests.get(url).text
doc = BeautifulSoup(page, "html.parser")
page_text = doc.find_all('<h3 >')
print(page_text)
但是如果我列印(page_text)有人可以幫助我,結果我會變得空[]
uj5u.com熱心網友回復:
有幾個問題:
內容是由動態提供的
JavaScript,所以你不會得到它requests我們不知道您的搜索詞,可能沒有結果
BeautifulSoup不適<h3 >用于選擇之類的東西。
怎么修?使用selenium它就像瀏覽器一樣作業,也可以呈現JavaScript并且可以為您提供page_source預期的效果。
例子
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
service = Service(executable_path='YOUR PATH TO CHROMEDRIVER')
driver = webdriver.Chrome(service=service)
driver.get('https://edition.cnn.com/search?q=python')
soup = BeautifulSoup(driver.page_source,'html.parser' )
soup.select('h3.cnn-search__result-headline')
輸出
[<h3 class="cnn-search__result-headline">
<a href="//www.cnn.com/travel/article/airasia-malaysia-snake-plane-rerouted-intl-hnk/index.html">AirAsia flight in Malaysia rerouted after snake found on board plane</a>
</h3>,
<h3 class="cnn-search__result-headline">
<a href="//www.cnn.com/2021/11/19/cnn-underscored/athleta-gift-shop-holiday/index.html">With gift options under $50 plus splurge-worthy seasonal staples, Athleta's Gift Shop is a holiday shopping haven</a></h3>,...]
.text要在迭代您的同時獲取標題呼叫方法ResultSet并獲取其包含的href使用價值['href']<a>
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/443863.html
標籤:Python python-3.x 网页抓取 美丽的汤 蟒蛇请求
