我將 url 作為輸入: url = "https://www.amazon.in/s?k=headphones&page=1" 這作業正常,但停在第 19 頁而不是我們在第 19 頁中斷,我想給出下一個輸入為“https://www.amazon.in/s?k=”
- “演講者&頁面=1”
- "earbuds&page=1" 等回圈運行
from bs4 import BeautifulSoup as soup
import pandas as pd
import requests
data =[]
def getdata (url):
header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
req = urllib.request.Request(url, headers=header)
amazon_html = urllib.request.urlopen(req).read()
a_soup = soup(amazon_html,'html.parser')
for e in a_soup.select('div[data-component-type="s-search-result"]'):
try:
title = e.find('h2').text
except:
title = None
data.append({
'title':title
})
return a_soup
def getnextpage(a_soup):
page= a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})
page = page['href']
url = 'http://www.amazon.in' str(page)
return url
while True:
geturl = getdata(url)
url = getnextpage(geturl)
if not url:
break
print(url)```
```output = pd.DataFrame(data)
output
此代碼回傳正確的結果,但不是每次我希望它輸入一個可以在 url 末尾添加的專案串列時都提供一個新的 url 以獲取可以添加到的結果DataFrame 注意:搜索結果在第 19 頁停止
uj5u.com熱心網友回復:
為您的關鍵字制作一個串列,對其進行迭代并將 while 回圈包含在每次迭代中。
keywords = ['speakers','earbuds']
for k in keywords:
url = 'https://www.amazon.in/s?k=' k
while True:
geturl = getdata(url)
url = getnextpage(geturl)
if not url:
break
print(url)
請注意,亞馬遜不喜歡對其頁面進行這種自動訪問,并且可以很快識別訪問模式。為了稍微降低請求的頻率,您至少應該包含一些 delay time.sleep()。當然,使用官方的api會更好。
例子
from bs4 import BeautifulSoup as soup
import pandas as pd
import requests
import urllib
data =[]
def getdata (url):
header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' }
req = urllib.request.Request(url, headers=header)
amazon_html = urllib.request.urlopen(req).read()
a_soup = soup(amazon_html,'html.parser')
for e in a_soup.select('div[data-component-type="s-search-result"]'):
try:
title = e.find('h2').text
except:
title = None
data.append({
'title':title,
'url':'http://www.amazon.in' e.h2.a['href']
})
return a_soup
def getnextpage(a_soup):
try:
page = a_soup.find('a',attrs={"class": 's-pagination-item s-pagination-next s-pagination-button s-pagination-separator'})['href']
url = 'http://www.amazon.in' str(page)
except:
url = None
return url
keywords = ['speakers','earbuds']
for k in keywords:
url = 'https://www.amazon.in/s?k=' k
while True:
geturl = getdata(url)
url = getnextpage(geturl)
if not url:
break
print(url)
輸出(列印)
http://www.amazon.in/s?k=speakers&page=2&qid=1649420352&ref=sr_pg_1
...
http://www.amazon.in/s?k=speakers&page=20&qid=1649420373&ref=sr_pg_19
http://www.amazon.in/s?k=earbuds&page=2&qid=1649420375&ref=sr_pg_1
...
http://www.amazon.in/s?k=earbuds&page=20&qid=1649420394&ref=sr_pg_19
輸出 ( pd.DataFrame(data))
| 標題 | 網址 | |
|---|---|---|
| 0 | Echo Dot (3rd Gen) - #1 smart speaker brand in India with Alexa (Black) | http://www.amazon.in/gp/bestsellers/electronics/15765862031/ref=sr_bs_0_15765862031_1 |
| 1 | TimbreSonic Rhythm Speaker Wired Karaoke Ultimate Sound Party Portable Speaker | http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_atf_aps_sr_pg1_1?ie=UTF8&adId=A01688993VZM1IH2U6JB5&url=/TimbreSonic-Speaker-Karaoke-Ultimate-Portable/dp/B096M2T346/ref=sr_1_2_sspa?keywords=speakers&qid=1649421227&sr=8-2-spons&psc=1&smid=AK0P65LCJ5QQN&qualifier=1649421227&id=2899208110237385&widgetName=sp_atf |
| 2 | boAt Stone 180 5W Bluetooth Speaker with Upto 10 Hours Playback, 1.75" Driver, IPX7 and TWS Feature(Black) | http://www.amazon.in/boAt-Stone-Bluetooth-Speaker-Black/dp/B08JMC1988/ref=ice_ac_b_dpb?keywords=speakers&qid=1649421227&sr=8-3 |
| 3 | Speaker | http://www.amazon.in/Generic-Speaker/dp/B09X5M77MZ/ref=sr_1_omk_4?keywords=speakers&qid=1649421227&sr=8-4 |
| 4 | Zebronics Zeb-Warrior 2.0 Multimedia Speaker with Aux Connectivity,USB Powered and Volume Control | http://www.amazon.in/gp/bestsellers/computers/1375442031/ref=sr_bs_4_1375442031_1 |
| ... | ... | ... |
| 847 | Zebronics Zeb-Sound Bomb 5 TWS Earbuds with Bluetooth v5.0, up to 22H Backup, Flash Connect, Splash Proof, Voice Assistant, Touch Control, 10mm Driver, Built in Microphone and Type C(Black) | http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A09061362IHFGLF39FZ4K&url=/Zebronics-Zeb-Sound-Bluetooth-Assistant-Microphone/dp/B09NNNLBVD/ref=sr_1_308_sspa?keywords=earbuds&qid=1649420939&sr=8-308-spons&psc=1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf |
| 848 | boAt Airdopes 141 True Wireless Earbuds with 42H Playtime, Beast Mode(Low Latency Upto 80ms) for Gaming, ENx Tech, ASAP Charge, IWP, IPX4 Water Resistance, Smooth Touch Controls(Bold Black) | http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_mtf_aps_sr_pg20_1?ie=UTF8&adId=A08646093S9SKZXE3VDX4&url=/boAt-Airdopes-141-Wireless-Resistance/dp/B09N3ZNHTY/ref=sr_1_309_sspa?keywords=earbuds&qid=1649420939&sr=8-309-spons&psc=1&qualifier=1649420939&id=2014190349292195&widgetName=sp_mtf |
| 849 | Skyfly Xbot GE100 Wired in Ear Earphones with Mic (Black) | http://www.amazon.in/Skyfly-Xbot-Gaming-Earphones-Detachable/dp/B07ZYR78B3/ref=sr_1_310?keywords=earbuds&qid=1649420939&sr=8-310 |
| 850 | JBL C115 TWS, True Wireless Earbuds with Mic, Jumbo 21 Hours Playtime with Quick Charge, True Bass, Dual Connect, Bluetooth 5.0, Type C & Voice Assistant Support for Mobile Phones (Black) | http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A0791293Y8WP49FN4EZU&url=/JBL-Wireless-Bluetooth-Assistance-Integration/dp/B08L5ZC8R3/ref=sr_1_311_sspa?keywords=earbuds&qid=1649420939&smid=A14CZOWI0VEHLG&sr=8-311-spons&psc=1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf |
| 851 | Crossbeats Airpop Bluetooth Truly Wireless In Ear Earbuds With Mic, with 30Hrs Playtime Ultralight Bluetooth Earphone with Mic & Voice Assistant, Passive Noise Cancelling Headset, Type-C Fasting Charging - Blue | http://www.amazon.in/gp/slredirect/picassoRedirect.html/ref=pa_sp_btf_aps_sr_pg20_1?ie=UTF8&adId=A10368023R9B7RAUU82SP&url=/Crossbeats-Bluetooth-Ultralight-Assistant-Cancelling/dp/B09PDSVQTW/ref=sr_1_312_sspa?keywords=earbuds&qid=1649420939&sr=8-312-spons&psc=1&qualifier=1649420939&id=2014190349292195&widgetName=sp_btf |
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/457778.html
下一篇:如何抓取網站上的特定資訊
