我試圖在搜索詞下從 Elle.com 抓取資料。我注意到當我單擊按鈕時,它會發送一個更新 url 中 &page=2 的請求。但是,下面的代碼只會讓我得到很多重復的條目。我需要幫助找到一種方法來為回圈的每次迭代設定起點(我認為)。有任何想法嗎?
import requests,nltk,pandas as pd
from bs4 import BeautifulSoup as bs
def get_hits(url):
r = requests.get(url)
soup = bs(r.content, 'html')
body = []
for p in soup.find_all('p',{'class':'body-text'}):
sentences = nltk.sent_tokenize(p.text)
result1 = [s for s in sentences if 'kim' in s]
body.append(result1)
result2 = [s for s in sentences if 'kanye' in s]
body.append(result2)
body = [a for a in body if a!=[]]
if body == []:
body.append("no hits")
return body
titles =[]
key_hits = []
urls = []
counter = 1
for i in range(1,10):
url = f'https://www.elle.com/search/?page={i}&q=kanye'
r = requests.get(url)
soup = bs(r.content, 'html')
groups = soup.find_all('div',{'class':'simple-item grid-simple-item'})
for j in range(len(groups)):
urls.append('https://www.elle.com' groups[j].find('a')['href'])
titles.append(groups[j].find('div',{'class':'simple-item-title item-title'}).text)
key_hits.append(get_hits('https://www.elle.com' groups[j].find('a')['href']))
if (counter == 100):
break
counter =1
data = pd.DataFrame({
'Title':titles,
'Body':key_hits,
'Links':urls
})
data.head()
如果有什么我不明白但我可能應該理解的地方,請告訴我。只是一名試圖在這里學習強大工具的營銷研究人員。
uj5u.com熱心網友回復:
要在 sige 上進行分頁,您可以使用其無限滾動 API URL(此示例將列印9*42標題):
import requests
from bs4 import BeautifulSoup
api_url = "https://www.elle.com/ajax/infiniteload/"
params = {
"id": "search",
"class": "CoreModels\\search\\TagQueryModel",
"viewset": "search",
"trackingId": "search-results",
"trackingLabel": "kanye",
"params": '{"input":"kanye","page_size":"42"}',
"page": "1",
"cachebuster": "undefined",
}
all_titles = set()
for page in range(1, 10):
params["page"] = page
soup = BeautifulSoup(
requests.get(api_url, params=params).content, "html.parser"
)
for title in soup.select(".item-title"):
print(title.text)
all_titles.add(title.text)
print()
print("Unique titles:", len(all_titles)) # <-- 9 * 42 = 378
印刷:
...
Kim Kardashian and Kanye West Respond to Those Divorce Rumors
People Are Noticing Something Fishy About Taylor Swift's Response to Kim Kardashian
Kim Kardashian Just Went on an Intense Twitter Rant Defending Kanye West
Trump Is Finally Able to Secure a Meeting With a Kim
Kim Kardashian West is Modeling Yeezy on the Street Again
Aziz Ansari's Willing to Model Kanye's Clothes
Unique titles: 378
uj5u.com熱心網友回復:
實際上,加載更多分頁是從 api 呼叫純 html 回應生成的,每個頁面鏈接/url 都是相對 url,并使用 urljoin 方法將其轉換為絕對 url,我在 api_urls 中進行分頁。
代碼:
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
api_urls = ["https://www.elle.com/ajax/infiniteload/?id=search&class=CoreModels\search\TagQueryModel&viewset=search&trackingId=search-results&trackingLabel=kanye¶ms={"input":"kanye","page_size":"42"}&page=" str(x) "&cachebuster=undefined" for x in range(1,4)]
Base_url = "https://store.steampowered.com"
for url in api_urls:
req = requests.get(url)
soup = BeautifulSoup(req.content,"lxml")
cards = soup.select("div.simple-item.grid-simple-item")
for card in cards:
title = card.select_one("div.simple-item-title.item-title")
p = card.select_one("a")
l=p['href']
abs_link=urljoin(Base_url,l)
print("Title:" title.text " Links: " abs_link)
print("-" * 80)
輸出:
Title:Inside Kim Kardashian and Kanye West’s Current Relationship Amid Dinner Sighting Links: https://store.steampowered.com/culture/celebrities/a37833256/kim-kardashian-kanye-west-reconciled/
Title:Kim Kardashian And Ex Kanye West Left For SNL Together Amid Reports of Reconciliation Efforts Links: https://store.steampowered.com/culture/celebrities/a37919434/kim-kardashian-kanye-west-leave-for-snl-together-reconciliation/
Title:Kim Kardashian Wore a Purple Catsuit for Dinner With Kanye West Amid Reports She's Open to Reconciling Links: https://store.steampowered.com/culture/celebrities/a37822625/kim-kardashian-kanye-west-nobu-dinner-september-2021/
Title:How Kim Kardashian Really Feels About Kanye West Saying He ‘Wants Her Back’ Now Links:
https://store.steampowered.com/culture/celebrities/a37463258/kim-kardashian-kanye-west-reconciliation-feelings-september-2021/
Title:Why Irina Shayk and Kanye West Called Off Their Two-Month Romance Links: https://store.steampowered.com/culture/celebrities/a37366860/why-irina-shayk-kanye-west-broke-up-august-2021/
Title:Kim Kardashian and Kanye West Reportedly Are ‘Working on Rebuilding’ Relationship and May Call Off Divorce Links: https://store.steampowered.com/culture/celebrities/a37421190/kim-kardashian-kanye-west-repairing-relationship-divorce-august-2021/
Title:What Kim Kardashian and Kanye West's ‘Donda’ Wedding Moment Really Means for Their Relationship Links: https://store.steampowered.com/culture/celebrities/a37415557/kim-kardashian-kanye-west-donda-wedding-moment-explained/
Title:What Kim Kardashian and Kanye West's Relationship Is Like Now: ‘The Tension Has Subsided’ Links: https://store.steampowered.com/culture/celebrities/a37383301/kim-kardashian-kanye-west-relationship-details-august-2021/
Title:How Kim Kardashian and Kanye West’s Relationship as Co-Parents Has Evolved Links: https://store.steampowered.com/culture/celebrities/a37250155/kim-kardashian-kanye-west-co-parents/Title:Kim Kardashian Went Out in a Giant Shaggy Coat and a Black Wrap Top for Dinner in NYC Links: https://store.steampowered.com/culture/celebrities/a37882897/kim-kardashian-shaggy-coat-black-outfit-nyc-dinner/
Title:Kim Kardashian Wore Two Insane, Winter-Ready Outfits in One Warm NYC Day Links: https://store.steampowered.com/culture/celebrities/a37906750/kim-kardashian-overdressed-fall-outfits-october-2021/
Title:Kim Kardashian Dressed Like a Superhero for Justin Bieber's 2021 Met Gala After Party Links: https://store.steampowered.com/culture/celebrities/a37593656/kim-kardashian-superhero-outfit-met-gala-after-party-2021/
Title:Kim Kardashian Killed It In Her Debut as a Saturday Night Live Host Links: https://store.steampowered.com/culture/celebrities/a37918950/kim-kardashian-saturday-night-live-best-sketches/
Title:Kim Kardashian Has Been Working ‘20 Hours a Day’ For Her Appearance On SNL Links: https://store.steampowered.com/culture/celebrities/a37915962/kim-kardashian-saturday-night-live-preperation/
Title:Why Taylor Swift and Joe Alwyn Skipped the 2021 Met Gala Links: https://store.steampowered.com/culture/celebrities/a37446411/why-taylor-swift-joe-alwyn-skipped-met-gala-2021/
Title:Kim Kardashian Says North West Still Wants to Be an Only Child Five Years Into Having Siblings Links: https://store.steampowered.com/culture/celebrities/a37620539/kim-kardashian-north-west-only-child-comment-september-2021/
Title:How Kim Kardashian's Incognito 2021 Met Gala Glam Came Together Links: https://store.s
teampowered.com/beauty/makeup-skin-care/a37584576/kim-kardashians-incognito-2021-met-gala-beauty-breakdown/
Title:Kim Kardashian Completely Covered Her Face and Everything in a Black Balenciaga Look at the 2021 Met Gala Links: https://store.steampowered.com/culture/celebrities/a37578520/kim-kardashian-faceless-outfit-met-gala-2021/
Title:How Kim Kardashian Feels About Kanye West Singing About Their Divorce and ‘Losing My Family’ on Donda Album Links: https://store.steampowered.com/culture/celebrities/a37113130/kim-kardashian-kanye-west-divorce-song-donda-album-feelings/
Title:Kanye West Teases New Song In Beats By Dre Commercial Starring Sha'Carri Richardson Links: https://store.steampowered.com/culture/celebrities/a37090223/kanye-west-teases-new-song-in-beats-by-dre-commercial-starring-shacarri-richardson/
Title:Inside Kim Kardashian and Kanye West's Relationship Amid His Irina Shayk Romance Links: https://store.steampowered.com/culture/celebrities/a37077662/kim-kardashian-kanye-west-relationship-irina-shayk-romance-july-2021/
等等
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/317044.html
