所以我正在抓取這個有多個頁面的產品網站,每個頁面都有具有相似列但列值不同的表。這是一個例子:https ://www.benchmade.com/317-1-weekender.html 和同樣明智的繼承人另一個:https ://www.benchmade.com/15600or-raghorn.html 大約有 144 個鏈接,如這。
我想要的是一張表,我可以將所有相似的列組合成一個列,作為列標題,而行作為列值。
所以像這樣的東西可以輸出為 csv 表:
Blade Length. | | Blade Thickness|| Open Length |--etc etc
|------------- | |----------------||-------------|
| 2.97/1.97" | | 4.34/12.54 || 1.23/5.65 |
| 4.24/2.23" | | 2.34/5.63 || 5.43/2.90 |
| 3.54/2.65 | | 2.57/6.54 || 6.90/4.20 |
| 7.65/5/43 | | 4.65/3.56 || 3.32/4.54 |
到目前為止,我已經這樣做了:
product_links = []
for x in range (1,4):
HTML = requests.get(f'https://www.benchmade.com/all-products.html?blade_edge=521,531,2231&p={x}&price=75-2400&product_list_limit=48',HEADER)
#HTML.status_code
Booti= soup(HTML.content, "lxml")
knife_items = Booti.find_all('li',class_= "item product product-item")
for items in knife_items:
for links in items.findAll('a', class_= "product photo product-item-photo", href = True):
product_links.append(links['href'])
for links_2 in product_links:
#testlinks = "https://www.benchmade.com/4010-211-collectors-edition-station-knife.html"
Specifications_data = pd.read_html(links_2)[0]
任何幫助,將不勝感激!!!太感謝了!
uj5u.com熱心網友回復:
很容易做到pandas。
import pandas as pd
urls = ['https://www.benchmade.com/317-1-weekender.html',
'https://www.benchmade.com/15600or-raghorn.html']
final_df = pd.DataFrame()
for url in urls:
df = pd.read_html(url)[0].set_index(0).T
final_df = final_df.append(df, sort=False).reset_index(drop=True)
輸出:
print(final_df)
0 Blade Length: Blade Thickness: ... Weight: Sheath Weight:
0 2.97/1.97" | 7.16/5.00cm 0.090" | 2.286mm ... 2.28oz | 64.64g NaN
1 4.64" | 11.78 cm 0.09" | 2.286mm ... COMING SOON 21.26g
uj5u.com熱心網友回復:
讓我們首先修改您的代碼以將結果保存在串列中res:
product_links = []
res = []
for x in range (1,4):
... # continue your code
Specifications_data = pd.read_html(links_2)[0]
res.append(Specifications_data)
現在我們將來自資料的資料product_links放在res一個資料框中;有很多方法可以做到這一點,例如像這樣(我們使用來自product_links索引的 url,這樣你就知道哪些資料對應于哪把刀)
res_dict = {k:dict(zip(v[0],v[1])) for k,v in zip(product_links, res)}
df = pd.DataFrame.from_dict(res_dict, orient='index')
你得到一個大的df;df.head()現在看起來像這樣:
Blade Length: Blade Thickness: Open Length: Handle Thickness: Weight: Sheath Weight: Closed Length: Blade Edge Blade Finish/Color Blade Steel Blade Style/Shape Clip Type Clip Position Handle Material Lanyard Hole MOLLE Compatible Use Blade Style Product Box: Designer: Mechanism: Action: Blade Steel: Overall Length: Drop-point Blade Style with Valox Handle Tanto Blade Style with Valox Handle Drop-point Blade Style with G10/Aluminum Handle Drop-point Blade Style with G10 Handle Valox Handle G10/Aluminum Handle Drop-point Blade Style with G10 Drop-point Blade Style Tanto Blade Style Green and red contoured G10 handle Sand contoured G10 handle Handle Length: Opposing Bevel Blade Style Sheepsfoot Blade Style Aluminum Handles Carbon Fiber Handles G10 Handles Glass Breaker Sheath Type
------------------------------------------------------------------------ --------------- ------------------ ---------------- ------------------- ---------------- ---------------- ---------------- ------------ -------------------- ------------- ------------------- ----------- --------------- ----------------- -------------- ------------------ ----- ------------- -------------- ----------- ------------ --------- -------------- ----------------- ------------------------------------------ ------------------------------------- ------------------------------------------------- ---------------------------------------- -------------- --------------------- --------------------------------- ------------------------ ------------------- ------------------------------------ --------------------------- ---------------- ---------------------------- ------------------------ ------------------ ---------------------- ------------- --------------- -------------
https://www.benchmade.com/4010-211-collectors-edition-station-knife.html 5.97" | 15.16cm 0.114" | 2.896mm 10.88" | 27.64cm 0.61" | 15.44mm 6.92oz | 196.18g 1.27oz | 36.00g nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
https://www.benchmade.com/4000-211-collectors-edition-3-piece-set.html 8.04" | 20.42cm 0.114" | 2.896mm 13.02" | 33.07cm 0.61" | 15.44mm 7.37oz | 208.94g nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
https://www.benchmade.com/602-211-tengu-tool.html 1.14" | 2.90cm 0.124" | 3.150mm 3.27" | 8.31cmm 0.40" | 10.16mm 1.04oz | 29.48g 0.28oz | 7.94g 2.14" | 5.44cm nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
https://www.benchmade.com/9070bk-1-claymore.html 3.60" | 8.64cm 0.114" | 2.896mm 8.60" | 19.81cm 0.60" | 14.99mm 3.50oz | 97.24g nan 5.00" | 11.18cm nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
https://www.benchmade.com/9070bk-claymore.html 3.60" | 8.64cm 0.114" | 2.896mm 8.60" | 19.81cm 0.60" | 14.99mm 3.50oz | 97.24g nan 5.00" | 11.18cm nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
您可以進一步按摩資料框以獲取您真正需要的列,例如
df[['Blade Length:', 'Blade Thickness:', 'Open Length:',
'Handle Thickness:', 'Weight:']]
等等
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/422013.html
標籤:
