我需要實作一個腳本,該腳本從博客頁面中洗掉 URL,并確定 URL 是否在鏈接中包含某些關鍵詞,然后在 CSV 檔案中列印出哪個博客文章 URL 具有標識的關鍵詞鏈接。
由于博客頁面有分頁和超過 35 頁/300 篇博客文章,我不確定我是怎么做的。我要查找的 URL 位于每個單獨的博客文章中。
到目前為止,我已經設法遵循了一些關于如何從主頁獲取每個博客文章 URL 的教程。
uj5u.com熱心網友回復:
幾乎相同,定義你的空串列來存盤 specialUrls 的結果并迭代你的初始結果串列 url:
data = []
for url in result:
r=requests.get(url).text
soup=BeautifulSoup(r,"lxml")
data.append('specialUrl')
為避免重復/不必要的請求迭代set():
data = []
for url in set(result):
r=requests.get(url).text
soup=BeautifulSoup(r,"lxml")
data.append('FINDSPECIALURL')
以防萬一您也可以使用break退出 while 回圈。
例子
注意 這只會刮掉從第一個博客頁面到您的結果的鏈接 - 從最后洗掉 break 以刮掉所有博客頁面
from bs4 import BeautifulSoup
import pandas as pd
page=1
result=[]
while True:
r=requests.get(f"https://www.snapfish.co.uk/blog/page/{page}/").text
soup=BeautifulSoup(r,"lxml")
product=soup.find_all("article",{'class':'post_list'})
for data in product:
result.append(data.find('a').get('href'))
if soup.find("a",class_='next page-numbers') is None:
break
page =1
break#remove break to scrape all the blog pages
data = []
for url in result:
r=requests.get(url).text
soup=BeautifulSoup(r,"lxml")
for a in soup.select('a[href*="design-detail"]'):
data.append({
'urlFrom':url,
'urlTo':a['href']
})
pd.DataFrame(data).drop_duplicates().to_csv('result.csv', index=False)
輸出
| urlFrom | 網址到 |
|---|---|
| https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ | https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=35d18daa85f844b78c9a7ed0550ca0cf&designId=2b2dbb6233084675828e48e238e2eb9b&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Anniversary Gold Heart&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview |
| https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ | https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=008cec6cdece48c6bf25f13c425f9e4a&designId=acb3720df6a1480ea99dd2f18eec7807&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=Heart Wreath Anniversary&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview |
| https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ | https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=b2132bd5de1849479182735dba8857d3&designId=60d4a98f824e48d6badfe4fb443b591f&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=XOXO Bold&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview |
| https://www.snapfish.co.uk/blog/what-loving-message-sentiment-to-write-in-your-anniversary-card/ | https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=8261f8e29d8e4178b526ba80012d05f3&designId=c4ac847f6aef4c87a8588ab83d7a7065&sku=CommerceProduct_355343&ptype=cards&pcat=greeting_cards_1989_snapfish_uk&scat=anniversary_cards_10905_snapfish_uk&filters=subCategories~anniversary_cards_10905_snapfish_uk&searchPhrase=&designName=I Found You&withSku=N&qty=1&dgCatId=anniversary_cards_10905_snapfish_uk&pcatName=Greeting Cards&eoption=CommerceOption_281506#/dgview |
| https://www.snapfish.co.uk/blog/what-to-write-in-a-custom-snapfish-18th-birthday-card/ | https://www.snapfish.co.uk/design-detail?category=StoreCat_29641&dgId=2c8420a9f582492c9801dd8a2fb89ba3&designId=765f31622df648fb908b28d73fbf8b40&sku=CommerceProduct_355343&ptype=cards&pcat=birthday_cards_1989_snapfish_uk&scat=for_her_10993_1561482027_snapfish_uk&filters=subCategories~for_friends_10993_1561482050_snapfish_uk|for_her_10993_1561482027_snapfish_uk&searchPhrase=&designName=Make A Wish&withSku=N&qty=1&dgCatId=for_friends_10993_1561482050_snapfish_uk&pcatName=Birthday Cards&eoption=CommerceOption_281506#/dgview |
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/426943.html
