怎么寫才能符合題目要求-有解無憂

QUESTION

Before you get started, explore the website https://www.abc.net.au/news/justin

We are interested in the titles of and hyperlinks to the news items in the Just In section. More specifically, that's the section that is highlighted in the following image:

The content on that page has obviously been updated since this exercise was set up. Still, even though the exact content changed, the principle stays the same.

What you need to do is:

Scrape all the (1) titles, (2) the underlying hyperlinks, and the (3) descriptions of the news items in the highlighted section of the Just In page. In the historic example in the screenshot that would be 'Serious grounds to be concerned about...', 'Tourism sandbox: Phuket...', etc. for the titles, the urls you are directed to if you would click those titles, and the descriptions 'Allies of jailed...', 'Thailand's resort island...', etc. that belong to the titles.
Save the information into a csv file named ‘abcnews.csv’ that contains three variables: ‘title’, ‘url’, and 'descriptions'. One row for each article, combining title, hyperlink, and description for that article.

目前寫成這樣怎么也實作不了

from urllib.request import Request, urlopen
import ssl
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://www.abc.net.au/news/justin'

#################################################
#################################################
###

headers={'User-Agent': 'Mozilla/5.0 (Macinstosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36(KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
req = Request(url, headers=headers)
context = ssl._create_unverified_context()

uClient= urlopen(req, context=context)
html = uClient.read()
uClient.close()

#################################################
#################################################

soup = BeautifulSoup(html, 'html.parser')
divofinterest = soup.find_all('div',class_='_3OXQ1 _26IxR _3bGVu')

dataset = []

for item in divofinterest('a'):
title = item.find('p').getText()
url = item['href']
print(title)
print(url)
print()

dataset.append({'title':title,'url':url})

dataset = pd.DataFrame(dataset)
dataset.to_csv('abcnews.csv',sep=';',index=False)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/275604.html

標籤：腳本語言(Perl/Python)

上一篇：求助 proteus8.7 無法識別sub代碼

下一篇：網路時間服務器（NTP時鐘服務器）是做什么用途的