我正在嘗試從
我的嘗試:
在這里,首先,我嘗試提取博客鏈接,但也有一些不需要的鏈接
import httplib2
import csv
from bs4 import BeautifulSoup, SoupStrainer
http = httplib2.Http()
links = []
status, response = http.request(f'https://www.forexcrunch.com/category/forex-weekly-outlook/gbp-usd-outlook/page/24/')
for link in BeautifulSoup(response,'html.parser', parse_only=SoupStrainer('a')):
if link.has_attr('href'):
if 'gbp' in link['href'] :
if link['href'] not in links:
print(link['href'])
links.append(link['href'])
來自這些鏈接的博客文章containing weekly forecast need to be filtered。然后,從鏈接文章日期和標題應該檢索并存盤它們,
header = ['Year', 'Date', 'Title', 'Link']
f = open('summary.csv', 'w')
writer = csv.writer(f)
writer.writerow(header)
for link in links :
data = [Year,Date,Title,Article Link] # get link data and store
writer.writerow(data)
f.close()
uj5u.com熱心網友回復:
您需要選擇所有父元素并迭代它們中的每一個。
有 2 種型別的文章,大的 1 和 10 最小的。
那里有一些示例代碼:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.forexcrunch.com/category/forex-weekly-outlook/gbp-usd-outlook/page/24/")
soup = BeautifulSoup(page.content, 'html.parser')
big_article = soup.find("div", class_='col-sm-12 col-md-6')
title = big_article.find("div", class_="post-detail").find("h5")
print("Title: " title.text)
print("Link: " title.find("a")["href"])
author_year = big_article.find("div", class_="post-author")
print("Author: " author_year.find("a").text)
print("Date: " author_year.find_all("li")[-1].text)
print("---------------------")
all_articles = soup.find_all("div", class_='col-sm-12 col-md-3')
for article in all_articles:
title_author_link = article.find("div", class_="post-detail").find_all("a")
print("Title: " title_author_link[0].text)
print("Link: " title_author_link[0]["href"])
print("Author: " title_author_link[1].text)
print("Date: " article.find_all("li")[-1].text)
print("---------------------")
輸出:
Title: GBP/USD Forecast May 22-26
Link: https://www.forexcrunch.com/gbpusd-forecast-may-22-26/
Author: Kenny Fisher
Date: 5 years
---------------------
Title: GBP/USD Forecast May 15-19
Link: https://www.forexcrunch.com/gbpusd-forecast-may-15-19/
Author: Kenny Fisher
Date: 5 years
---------------------
我希望我能夠幫助你。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/492874.html
標籤:Python python-3.x 网页抓取 美丽的汤
上一篇:資料在熊貓中被覆寫
