基本上,我從他們的腳本標簽上可用的網路上抓取資料,但我無法將資料提取到正確的布局中,有我的腳本標簽原始資料
{
"@context": "https://schema.org/",
"@type": "Product",
"name": "I Got Toddler Problems Tee",
"url": "https://www.inspireuplift.com/I-Got-Toddler-Problems-Tee/iu/3136",
"sku": "BMRSUQNGGS",
"image": [
"https://cdn.inspireuplift.com/uploads/images/seller_product_variant_images/i-got-toddler-problems-tee-3136/1629196991_Toddlerproblemsmauv.png",
"https://cdn.inspireuplift.com/uploads/images/seller_product_variant_images/i-got-toddler-problems-tee-3136/1629196991_Toddlerproblemsltgray.png",
"https://cdn.inspireuplift.com/uploads/images/seller_product_variant_images/i-got-toddler-problems-tee-3136/1629196991_Toddlerproblemsblk.png",
"https://cdn.inspireuplift.com/uploads/images/seller_product_variant_images/i-got-toddler-problems-tee-3136/1629196991_Toddlerproblemspk.png",
],
"description": "BMRSUQNGGS",
"brand": {"@type": "Thing", "name": "InspireUplift"},
"aggregateRating": {"@type": "AggregateRating", "ratingValue": 0, "reviewCount": 0},
"offers": {
"@type": "AggregateOffer",
"highPrice": 32.97,
"lowPrice": 29.97,
"offerCount": 24,
"priceCurrency": "USD",
"offers": [
{
"@type": "Offer",
"url": "https://www.inspireuplift.com/I-Got-Toddler-Problems-Tee/iu/3136?variant=37621",
"priceCurrency": "USD",
"sku": "BMRSUQNGGS-1",
"alternateName": "I Got Toddler Problems Tee - Mauve/S",
"price": 29.97,
"priceValidUntil": "2022-01-10",
"availability": "https://schema.org/InStock",
"seller": {"@type": "Organization", "name": "InspireUplift"},
},
{
"@type": "Offer",
"url": "https://www.inspireuplift.com/I-Got-Toddler-Problems-Tee/iu/3136?variant=37622",
"priceCurrency": "USD",
"sku": "BMRSUQNGGS-2",
"alternateName": "I Got Toddler Problems Tee - Mauve/M",
"price": 29.97,
"priceValidUntil": "2022-01-10",
"availability": "https://schema.org/InStock",
"seller": {"@type": "Organization", "name": "InspireUplift"},
},
{
"@type": "Offer",
"url": "https://www.inspireuplift.com/I-Got-Toddler-Problems-Tee/iu/3136?variant=37623",
"priceCurrency": "USD",
"sku": "BMRSUQNGGS-3",
"alternateName": "I Got Toddler Problems Tee - Mauve/L",
"price": 29.97,
"priceValidUntil": "2022-01-10",
"availability": "https://schema.org/InStock",
"seller": {"@type": "Organization", "name": "InspireUplift"},
},
],
"shippingDetails": {
"@type": "OfferShippingDetails",
"shippingRate": {
"@type": "MonetaryAmount",
"value": "0",
"currency": "USD",
},
},
},
}
我想通過提取變體 url 來提取所有變體名稱、影像 url、大小、顏色我想以這種方式回傳 我想要這個布局中的資料 任何人請幫助我我正在學習 python 這是我的代碼
r = requests.get(link, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
scripts = soup.find('script', type='application/ld json').string
data = json.loads(scripts)
image = data["image"]
try:
altname = data["offers"]["offers"]
except KeyError:
print("not found")
for item in altname:
area = item["alternateName"]
detail = {"image": image, "name": area}
print(detail)
newlist.append(detail)
print("saving")
df = pd.DataFrame(newlist)
df.to_csv("first_list.csv")
我又回到這一點,在變異顏色的細胞inspite盈方所有圖片的URL I,M找回這種方式
uj5u.com熱心網友回復:
解決方案是基于一個json檔案(一個產品)提供的。兩個上傳的截圖都是一樣的。最好使用data.get('key')而不是data['key'].
[data.get("name")] [""] * (len(offer) - 1) 創建相同長度的列,否則在創建資料框時會出錯,因為產品名稱只是第一次在單元格內。
r = requests.get(link, headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
scripts = soup.find('script', type='application/ld json').string
# if below line did not work try with data = json.loads(scripts)
data = json.loads(json.dumps(scripts))
size, color, url = [], [], []
offer = data.get("offers").get("offers")
product_name = [data.get("name")] [""] * (len(offer) - 1)
if offer:
for item in offer:
size_color_list = item["alternateName"].split(" - ")[1].split("/")
url.append(item["url"])
color.append(size_color_list[0])
size.append(size_color_list[1])
detail = {
"product_name": product_name,
"variant_color_name": color,
"variant_size": size,
"variant_image": url,
}
try:
df = pd.DataFrame(detail)
except Exception as e:
raise e
else:
df.index = 1
# df.to_csv('first_list.csv')
df.to_excel("first_list.xlsx")
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407888.html
標籤:
上一篇:網頁抓取時列印出奇怪的字符
