我正在嘗試從那里抓取網路,我正在獲取所有資料,并且我正在嘗試不同的格式來表示資料。這是代碼:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_info = []
for item in soup.select(".pd-info"):
title = item.h2.get_text(strip=True)
price = item.select_one('span:-soup-contains("Price")').span.text
try:
store_name = item.select_one('span.hilighted').find_next('img')['alt']
shipping = item.select_one('span.shipping').text.strip()
price_per_100_g = item.select_one('p.unit-price').text.strip()
except:
store_name = ''
shipping = ''
price_per_100_g = ''
d = {"Title": title, "Price": price, "Store": store_name, "Shipping": shipping, "Price_Per_100_g": price_per_100_g}
for i, p in enumerate(item.select(".sp-price"), 1):
try:
store_name = p.find_next("img")["alt"]
except:
store_name = ''
d[f"store_{i:>02}"] = store_name
for i, p in enumerate(item.select(".sp-price"), 1):
d[f"price_{i:>02}"] = p.get_text(strip=True)
for i, p in enumerate(item.select(".shipping"), 1):
d[f"shipping_{i:>02}"] = p.get_text(strip=True)
all_info.append(d)
df = pd.DataFrame(all_info).fillna("")
print(df.head())
df.to_csv("data_2.csv", index=False)
我有一個來自上述刮板的 csv 資料集,由 30 列和 31 行組成這是一個小例子:
Title,Price,Store,Shipping,Price_Per_100_g,store_01,store_02,store_03,store_04,store_05,store_06,store_07,store_08,store_09,store_10,store_11,store_12,store_13,store_14,store_15,store_16,store_17,store_18,store_19,store_20,store_21,store_22,store_23,store_24,store_25,price_01,price_02,price_03,price_04,price_05,price_06,price_07,price_08,price_09,price_10,price_11,price_12,price_13,price_14,price_15,price_16,price_17,price_18,price_19,price_20,price_21,price_22,price_23,price_24,price_25,shipping_01,shipping_02,shipping_03,shipping_04,shipping_05,shipping_06,shipping_07,shipping_08,shipping_09,shipping_10,shipping_11,shipping_12,shipping_13,shipping_14,shipping_15,shipping_16,shipping_17,shipping_18,shipping_19,shipping_20,shipping_21,shipping_22,shipping_23,shipping_24,shipping_25,shipping_26,store_26,price_26,shipping_27,store_27,store_28,price_27,price_28,shipping_28,shipping_29,store_29,price_29,shipping_30
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g,VetShopAustralia,Vet Products Direct,Pet Shop Direct,Petso,PetPost,Pet Chemist,Your PetPA,Pet Circle,Petbarn,World for Pets,Lucky Pet,Stefmar,Budget Pet Products,Best Friends Pets,Kellyville Pets,iPetStore,My Pet Warehouse,Pet City,Pets Unleashed,PetO,PETstock,Pet House,Habitat Pets,Pet Culture,Peticular,$64.60,$64.60,$64.95,$64.95,$64.99,$66.29,$67.32,$69.69,$69.69,$69.95,$70.19,$71.99,$72.99,$73.39,$75.15,$75.95,$76.99,$77.99,$77.99,$79.99,$81.54,$81.99,$83.49,$83.49,$83.50, $9.95 shipping, $7.95 shipping, $7.95 shipping, $4.95 to $9.95 shipping, $10.00 to $12.70 shipping,free shipping, $9.33 to $14.81 shipping, $4.00 to $11.16 shipping,free to $7.95 shipping,free shipping, $9.00 shipping, $6.95 to $10.51 shipping, $7.99 to $13.87 shipping,free to $14.35 shipping, $9.99 shipping,free to $15.95 shipping,free to $6.95 shipping,free shipping,free to $7.28 shipping, $8.95 shipping,free to $11.00 shipping,free shipping,free shipping,free to $9.99 shipping,free to $20.00 shipping, $7.95 shipping,,,,,,,,,,,,
我喜歡將資料框更改為在行而不是列中具有相同產品的不同商店?產品名稱將被商店名稱重復,然后是不同的名稱,然后是價格和運費。這是螢屏截圖和預期輸出:
Title,Price,Store,Shipping,Price_Per_100_g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,VetShopAustralia, $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,Vet Products Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Pet Shop Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Petso, $4.95 to $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.99,PetPost, $10.00 to $12.70 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$66.29,Pet Chemist,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$67.32,Your PetPA, $9.33 to $14.81 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Pet Circle, $4.00 to $11.16 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Petbarn,free to $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.95,World for Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$70.19,Lucky Pet, $9.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$71.99,Stefmar, $6.95 to $10.51 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$72.99,Budget Pet Products, $7.99 to $13.87 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$73.39,Best Friends Pets,free to $14.35 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.15,Kellyville Pets, $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.95,iPetStore,free to $15.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$76.99,My Pet Warehouse,free to $6.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pet City,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pets Unleashed,free to $7.28 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$79.99,PetO, $8.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.54,PETstock,free to $11.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.99,Pet House,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Habitat Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Pet Culture,free to $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.50,Peticular,free to $20.00 shipping,
誰能幫我找出達到預期輸出的最佳方法?謝謝!
uj5u.com熱心網友回復:
嘗試:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_info = []
for item in soup.select(".pd-info"):
title = item.h2.get_text(strip=True)
price = item.select_one('span:-soup-contains("Price")').span.text
shipping = item.select_one(".shipping")
shipping = shipping.get_text(strip=True) if shipping else ""
store_name = item.select_one(".pd-name img")
store_name = store_name["alt"] if store_name else ""
price_per_100_g = item.select_one(".unit-price")
price_per_100_g = price_per_100_g.text if price_per_100_g else ""
d = {
"Title": title,
"Price": price,
"Store": store_name,
"Shipping": shipping,
"Price_Per_100_g": price_per_100_g,
}
stores = []
for p in item.select(".sp-price"):
price = p.get_text(strip=True)
name = p.find_next("img")
name = name["alt"] if name else "N/A"
shipping = p.find_next(class_="shipping")
shipping = shipping.get_text(strip=True) if shipping else "N/A"
stores.append((name, price, shipping))
d["Stores"] = stores
all_info.append(d)
df = pd.DataFrame(all_info).fillna("").explode("Stores")
df[["Store Name", "Store Price", "Store Shipping"]] = df.pop("Stores").apply(
pd.Series
)
print(df.head())
df.to_csv("data.csv", index=False)
印刷:
Title Price Store Shipping Price_Per_100_g Store Name Store Price Store Shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Vet Products Direct $64.60 $7.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g VetShopAustralia $64.60 $7.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Petso $64.95 $10.00 to $12.70 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Pet Shop Direct $64.95 $4.95 to $9.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g PetPost $64.99 free shipping
并保存data.csv(來自 LibreOffice 的螢屏截圖):

uj5u.com熱心網友回復:
一種選擇是使用pyjanitor的pivot_longer:
# pip install pyjanitor
import pandas as pd
import janitor
df = pd.read_excel('data_2.xlsx')
df.columns = df.columns.str.lower()
df.pivot_longer(
index = 'title',
names_to = '.value',
names_pattern = '(price|store|shipping).*'
)
title price store shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $57.75 Woofers World $7.95 shipping
1 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $127.80 Woofers World $7.95 to $15.62 shipping
2 Ziwi Peak Dog Air-Dried Lamb Recipe 2.5Kg $127.80 Woofers World $7.95 to $15.62 shipping
3 Ziwi Peak Dog Air-Dried Beef Recipe 454g $29.25 Woofers World $7.95 to $11.82 shipping
4 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $29.25 Woofers World $7.95 to $8.68 shipping
.. ... ... ... ...
925 Ziwi Peak Dog Canned Food Venison 390g NaN NaN NaN
926 Ziwi Peak Dog Venison Shank Bone Oral Chews Half NaN NaN NaN
927 Ziwi Peak Good Dog Reward Venison 85g NaN NaN NaN
928 Ziwi Peak Dog Venison Lung & Kidney 60g NaN NaN NaN
929 Ziwi Peak Dog Wet Venison Recipe 170g X 12 NaN NaN NaN
另一種選擇是pd.wide_to_long,但它提出了一個ValueError:
In [48]: pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [48], line 1
----> 1 pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')
File ~/miniforge3/envs/playground/lib/python3.10/site-packages/pandas/core/reshape/melt.py:527, in wide_to_long(df, stubnames, i, j, sep, suffix)
524 stubnames = list(stubnames)
526 if any(col in stubnames for col in df.columns):
--> 527 raise ValueError("stubname can't be identical to a column name")
529 if not is_list_like(i):
530 i = [i]
ValueError: stubname can't be identical to a column name
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/523237.html
