簡單的網頁抓取-有解無憂

我正在嘗試從簡單的網頁抓取

我的嘗試：

在這里，首先，我嘗試提取博客鏈接，但也有一些不需要的鏈接

import httplib2
import csv
from bs4 import BeautifulSoup, SoupStrainer

http = httplib2.Http()

links =  []
 
status, response = http.request(f'https://www.forexcrunch.com/category/forex-weekly-outlook/gbp-usd-outlook/page/24/')




for link in BeautifulSoup(response,'html.parser', parse_only=SoupStrainer('a')):
    if link.has_attr('href'):

      if 'gbp' in link['href'] :

        if link['href'] not in links:
          print(link['href'])
          links.append(link['href'])

來自這些鏈接的博客文章containing weekly forecast need to be filtered。然后，從鏈接文章日期和標題應該檢索并存盤它們，

header = ['Year', 'Date', 'Title', 'Link']
f = open('summary.csv', 'w')
 
writer = csv.writer(f)
writer.writerow(header)

for link in links :

      data = [Year,Date,Title,Article Link] # get link data and store
      writer.writerow(data)

f.close()

uj5u.com熱心網友回復：

您需要選擇所有父元素并迭代它們中的每一個。
有 2 種型別的文章，大的 1 和 10 最小的。
那里有一些示例代碼：

from bs4 import BeautifulSoup
import requests

page = requests.get("https://www.forexcrunch.com/category/forex-weekly-outlook/gbp-usd-outlook/page/24/")
soup = BeautifulSoup(page.content, 'html.parser')

big_article = soup.find("div", class_='col-sm-12 col-md-6')
title = big_article.find("div", class_="post-detail").find("h5")
print("Title: "   title.text)
print("Link: "   title.find("a")["href"])
author_year = big_article.find("div", class_="post-author")
print("Author: "   author_year.find("a").text)
print("Date: "   author_year.find_all("li")[-1].text)
print("---------------------")

all_articles = soup.find_all("div", class_='col-sm-12 col-md-3')
for article in all_articles:
    title_author_link = article.find("div", class_="post-detail").find_all("a")
    print("Title: "   title_author_link[0].text)
    print("Link: "   title_author_link[0]["href"])
    print("Author: "   title_author_link[1].text)
    print("Date: "   article.find_all("li")[-1].text)
    print("---------------------")

輸出：

Title: GBP/USD Forecast May 22-26
Link: https://www.forexcrunch.com/gbpusd-forecast-may-22-26/
Author: Kenny Fisher
Date: 5 years
---------------------
Title: GBP/USD Forecast May 15-19
Link: https://www.forexcrunch.com/gbpusd-forecast-may-15-19/
Author: Kenny Fisher
Date: 5 years
---------------------

我希望我能夠幫助你。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/492874.html

標籤：Python python-3.x 网页抓取美丽的汤

上一篇：資料在熊貓中被覆寫

下一篇：單擊標題以使用selenium和scrapy抓取資料