我正在嘗試通過制作一個小專案來學習 python/beautifulsoup 和 Django。對于這個專案,我正在嘗試抓取一個網站以獲取食譜,然后顯示一個隨機選擇的頁面。為此,我撰寫了一段代碼,當我剛拿到第一頁時,它就可以完美運行,35 個食譜。但是:我也想從第 2 頁和第 3 頁獲取食譜。我想我應該為此撰寫一個回圈,但我似乎無法做到這一點。該回圈非常適合抓取網站,但僅將最后一個回圈存盤在為配方專案制作的串列中。如何獲取此代碼以將資訊添加到串列而不是覆寫?該代碼非常適合串列中的前 35 個專案(一頁上有 35 個食譜),但不適用于更高的專案。
from django.shortcuts import render
import requests
import re
from bs4 import BeautifulSoup
import random
# Create your views here.
def recipe(request):
#Create soup
for page in range(0,2):
webpage_response = requests.get(f"https://www.ah.nl/allerhande/recepten-zoeken?page={page}" )
webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")
recipe_links = soup.find_all('a', attrs={'class' : re.compile('^display-card_root__.*')})
recipe_pictures = soup.find_all('img', attrs={'class' : re.compile('^card-image-set_imageSet__.*')})
recipe_prep_time = [ul.find('li').text
for ul in soup.find_all('ul',
attrs={'class': re.compile('^recipe-card-properties_root')})]
#Set up lists
links = []
titles = []
pictures = []
#create prefix for link
prefix = "https://ah.nl"
#scrape page for recipe
for link in recipe_links:
links.append(prefix link.get('href'))
for title in recipe_links:
titles.append(title.get('aria-label'))
for img in recipe_pictures:
pictures.append(img.get('data-srcset'))
#create random int to select a recipe
nummer = random.randint(0,105)
#select correct link for image
pic_url = pictures[nummer].split(' ')
#create context
context = {
"titles" : titles[nummer],
"pictures" : pic_url[16],
"preptime" : recipe_prep_time[nummer],
"link" : links[nummer]
}
#render page
return render(request, "randomRecipe/recipe.html", context)
uj5u.com熱心網友回復:
好主意 - 我自己總是無法決定何時提供如此好的和壓倒性的問題。
正如@Barmar 已經提到的那樣,使用更結構化的方法存盤抓取的資訊會更精簡 - 例如,一個data包含具有類似結構的 dicts的串列context。
您還可以選擇更具體的元素:
data = []
for e in soup.select('a[data-testhook="recipe-card"]'):
data.append({
'title' : e.span.text,
'picture' : e.img.get('data-srcset').split()[1],
'preptime' : e.li.text,
'link' : prefix e['href']
})
例子
from django.shortcuts import render
import requests
import re
from bs4 import BeautifulSoup
import random
# Create your views here.
def recipe(request):
#create prefix for link
prefix = "https://ah.nl"
#Create soup
data = []
for page in range(0,2):
webpage_response = requests.get(f"https://www.ah.nl/allerhande/recepten-zoeken?page={page}" )
webpage = webpage_response.content
soup = BeautifulSoup(webpage, "html.parser")
for e in soup.select('a[data-testhook="recipe-card"]'):
data.append({
'title' : e.span.text,
'picture' : e.img.get('data-srcset').split()[1],
'preptime' : e.li.text,
'link' : prefix e['href']
})
#create random int to select a recipe
nummer = random.randint(0,len(data))
context = data[nummer]
#render page
return render(request, "randomRecipe/recipe.html", context)
語境
{'title': 'Noedels met sticky sriracha-aubergine, cashewnoten en garnalen',
'pictures': 'https://static.ah.nl/static/recepten/img_RAM_PRD159203_220x162_JPG.jpg',
'preptime': '45 min',
'link': 'https://ah.nl/allerhande/recept/R-R1196327/noedels-met-sticky-sriracha-aubergine-cashewnoten-en-garnalen'}
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/456938.html
上一篇:使用Beautifulsouppython進行網頁抓取-無法抓取所有結果
下一篇:職工管理系統(代碼回顧1)
