我嘗試決議站點并解決了下一個問題。我確定每件商品的最大圖片數量是 7。每個圖片鏈接都會寫入串列。然后保存在excel中。因此,每個鏈接都有類似于檔案 1.xlsx 中的列。但是有些商品有 3 或 5 個影像。因此,如果影像數量少于 7,我想用空字串填充另一個欄位。但我得到的結果類似于檔案 2.xlsx。請幫我解決這個問題。
from datetime import datetime, timedelta
from time import sleep
import time, csv
from csv import reader
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
import requests, json
def get_html(url):
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
r = requests.get(url, headers=headers).content
return r
goods_link = ['https://www.johnlewis.com/a-a-k-s-hana-raffia-cross-body-bag-navy-multi/p5559710']
Images1 = []
Images2 = []
Images3 = []
Images4 = []
Images5 = []
Images6 = []
Images7 = []
Img = []
for i in goods_link:
soup = BeautifulSoup(get_html(i), 'html.parser')
imgContainer = soup.find('div', {'class':'ProductImages_productImagesContainer__1v2kP'})
imgAll = imgContainer.find_all('div', {'class':'ImageMagnifier_zoomable-image-container__db7jH'})
for j in imgAll:
imgSrc = j.find('img').get('src').split('?$rsp')[0]
Img.append(imgSrc)
[x.append(y) for x,y in zip([Images1, Images2, Images3, Images4, Images5, Images6, Images7], Img)]
info = {}
for ii in Images1:
info.setdefault('Images1',[])
info['Images1'].append(ii)
for ii in Images2:
info.setdefault('Images2',[])
info['Images2'].append(ii)
for ii in Images3:
info.setdefault('Images3',[])
info['Images3'].append(ii)
for ii in Images4:
info.setdefault('Images4',[])
info['Images4'].append(ii)
for ii in Images5:
info.setdefault('Images5',[])
info['Images5'].append(ii)
for ii in Images6:
info.setdefault('Images6',[])
info['Images6'].append(ii)
for ii in Images7:
info.setdefault('Images7',[])
info['Images7'].append(ii)
df = pd.DataFrame.from_dict(info)
df.to_excel('./output.xlsx')
print('Finish')
uj5u.com熱心網友回復:
IIUC 您希望為每行填充所有 7 列,即使該行的影像少于 7 個。
創建字典的步驟是多余的。您可以在串列中列出您附加到串列串列中的所有影像,并從中創建您的 DataFrame。您可以使用以下命令指定標題columns=:
def get_html(url):
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
r = requests.get(url, headers=headers).content
return r
goods_link = ['https://www.johnlewis.com/a-a-k-s-hana-raffia-cross-body-bag-navy-multi/p5559710']
headers = ["Images1", "Images2", "Images3", "Images4", "Images5", "Images6", "Images7"]
img_table = []
for link in goods_link:
img_row = [None]*7
soup = BeautifulSoup(get_html(link), 'html.parser')
imgContainer = soup.find('div', {'class':'ProductImages_productImagesContainer__1v2kP'})
imgAll = imgContainer.find_all('div', {'class':'ImageMagnifier_zoomable-image-container__db7jH'})
for j, div_obj in enumerate(imgAll):
imgSrc = div_obj.find('img').get('src').split('?$rsp')[0]
img_row[j]=imgSrc
img_table.append(img_row)
df = pd.DataFrame(img_table, columns=headers)
df.to_excel('./output.xlsx')
print('Finish')
缺少的是創建一個None長度為 7的串列,然后使用相應的鏈接enumerate替換 index 處的元素j。
請嘗試以一種使代碼下次更容易理解的方式命名您的變數。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/357902.html
上一篇:用分割值填充串列
下一篇:如何避免鏈接串列中的物件?
