import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=baseurl link['href']
productlinks.append(comp)
temp=[]
for link in productlinks:
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.find('div',class_='Product__SlideshowNavScroller')
for pro in up:
t=pro.find('img').get('src')
print(t)
代碼運行良好,并給我影像鏈接,但我想給出名稱image1,image2依此類推,以獲得如圖所示的輸出

uj5u.com熱心網友回復:
注意 主要的問題將是,不存在每頁影像的相同數量和您呼叫的產品頁面復式倍導致有重復你的鏈接串列-最后可以通過避免set()串列
一種方法可能是將您的資料附加到字典串列中以創建資料框。
data.append({'id':t.split('=')[-1], 'image':'Image ' str(e) ' UI','link':t})
如果沒有影像源,要獲得您想要的修改,請使用方法pivot()來轉換和fillna()生成空單元格。
df.pivot(index='id', columns='image', values='link').reset_index().fillna('')
例子
import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=baseurl link['href']
productlinks.append(comp)
data = []
for link in set(productlinks):
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.find('div',class_='Product__SlideshowNavScroller')
for e,pro in enumerate(up):
t=pro.find('img').get('src')
data.append({'id':t.split('=')[-1], 'image':'Image ' str(e) ' UI','link':t})
df = pd.DataFrame(data)
df.image=pd.Categorical(df.image,categories=df.image.unique(),ordered=True)
df = df.pivot(index='id', columns='image', values='link').reset_index().fillna('')
輸出
| ID | 影像 0 用戶界面 | 圖 1 用戶界面 | 圖 2 用戶界面 | ... |
|---|---|---|---|---|
| 1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-2_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-3_160x.jpg?v=1631812617 | |
| 1631826938 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Light-Blue-Patch-Work-Stacked-Straight-Leg-Denim_160x.jpg?v=1631826938 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Light-Blue-Patch-Work-Stacked-Straight-Leg-Denim-2_160x.jpg?v=1631826938 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Light-Blue-Patch-Work-Stacked-Straight-Leg-Denim-3_160x.jpg?v=1631826938 | |
| 1631829399 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Khaki-Patch-Work-Stacked-Straight-Leg-Denim_160x.jpg?v=1631829399 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Khaki-Patch-Work-Stacked-Straight-Leg-Denim-2_160x.jpg?v=1631829399 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Khaki-Patch-Work-Stacked-Straight-Leg-Denim-3_160x.jpg?v=1631829399 | |
| ... |
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/386228.html
上一篇:按單元格輸入熊貓值
下一篇:合并重疊索引
