我是新手,正在努力讓 BeautifulSoup 作業。我在恢復類和標簽時遇到了 Html 問題。我走近了,但有些地方我錯了。我插入錯誤的標簽和類來抓取新聞專案的標題、時間、鏈接和文本。
我想刮掉垂直串列中的所有這些標題,然后刮掉日期、標題、鏈接和內容。

你能幫我正確的html類和標記嗎?
我沒有收到任何錯誤,但 python 控制臺保持為空
>>>
代碼
import requests
from bs4 import BeautifulSoup
site = requests.get('url')
beautify = BeautifulSoup(site.content,'html5lib')
news = beautify.find_all('div', {'class','$00'})
arti = []
for each in news:
time = each.find('span', {'class','hh serif'}).text
title = each.find('span', {'class','title'}).text
link = each.a.get('href')
r = requests.get(url)
soup = BeautifulSoup(r.text,'html5lib')
content = soup.find('div', class_ = "read__content").text.strip()
print(" ")
print(time)
print(title)
print(link)
print(" ")
print(content)
print(" ")
uj5u.com熱心網友回復:
這是一個解決方案,您可以嘗試一下,
import requests
from bs4 import BeautifulSoup
# mock browser request
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'
}
site = requests.get('https://www.tuttomercatoweb.com/atalanta/', headers=headers)
soup = BeautifulSoup(site.content, 'html.parser')
news = soup.find_all('div', attrs={"class": "tcc-list-news"})
for each in news:
for div in each.find_all("div"):
print("-- Time ", div.find('span', attrs={'class': 'hh serif'}).text)
print("-- Href ", div.find("a")['href'])
print("-- Text ", " ".join([span.text for span in div.select("a > span")]))
-- Time 11:36
-- Href https://www.tuttomercatoweb.com/atalanta/?action=read&idtmw=1661241
-- Text focus Serie A, punti nel 2022: Juve prima, ma un solo punto in più rispetto a Milan e Napoli
------------------------------
-- Time 11:24
-- Href https://www.tuttomercatoweb.com/atalanta/?action=read&idtmw=1661233
-- Text focus Serie A, chi più in forma? Le ultime 5 gare: Sassuolo e Juve in vetta, crisi Venezia
------------------------------
-- Time 11:15
-- Href https://www.tuttomercatoweb.com/atalanta/?action=read&idtmw=1661229
-- Text Le pagelle di Cissé: come nelle migliori favole. Dalla seconda categoria al gol in serie A
------------------------------
...
...
編輯:
為什么這里需要標頭? 如何使用 Python 請求來偽造瀏覽器訪問并生成用戶代理?
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/448279.html
標籤:Python html python-3.x 网页抓取 美丽的汤
