我正在嘗試從每個產品中抓取 jpg 影像,每個產品 url 保存在 csv 中。影像鏈接在 json 資料中可用,因此請嘗試訪問 json 鍵值。當我嘗試運行代碼時,盡管有影像 url 鏈接,它只會回傳所有鍵值,其次我的代碼只能抓取最后一個產品 url,盡管所有 url 都保存在 csv 中。
{'name': {'b': {'src': {'xs': 'https://ctl.s6img.com/society6/img/xVx1vleu7iLcR79ZkRZKqQiSzZE/w_125/artwork/~artwork/s6-0041/a/18613683_5971445', 'lg': 'https://ctl.s6img.com/society6/img/W-ESMqUtC_oOEUjx-1E_SyIdueI/w_550/artwork/~artwork/s6-0041/a/18613683_5971445', 'xl': 'https://ctl.s6img.com/society6/img/z90VlaYwd8cxCqbrZ1ttAxINpaY/w_700/artwork/~artwork/s6-0041/a/18613683_5971445', 'xxl': None}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'c': {'src': {'xs': 'https://ctl.s6img.com/society6/img/KQJbb4jG0gBHcqQiOCivLUbKMxI/w_125/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/ztGrxSpA7FC1LfzM3UldiQkEi7g/w_550/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/PHjp9jDic2NGUrpq8k0aaxsYZr4/w_700/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/m-1HhSM5CIGl6DY9ukCVxSmVDIw/w_1500/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'd': {'src': {'xs': 'https://ctl.s6img.com/society6/img/G9TikRnVvy1w0kwKCAmgWsWy42Q/w_125/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/uVOYOxbHmhrNhmGQAi6QeydrFdY/w_550/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/-WIIUx9oB6jQKJdkSkq2ofhjLzc/w_700/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/HlSFppIm7Wk6aVxO17fI4b5s0ts/w_1500/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}}}
這是json資料。我只想抓取 jpg 圖片鏈接。下面是我的代碼:
import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
contents = []
with open('test.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
contents.append(url) # Add each url to list contents
newlist = []
for url in contents:
try:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')[7].text.strip()[24:]
data = json.loads(scripts)
link = data['product']['response']['product']['data']['attributes']['media_map']
except:
link = 'no data'
detail = {
'name': link
}
print(detail)
newlist.append(detail)
df = pd.DataFrame(detail)
df.to_csv('s1.csv')
我正在嘗試抓取所有 jpg 影像鏈接并保存具有每個產品 url 的 csv 檔案,所以我想打開 csv 檔案并回圈每個 url。
uj5u.com熱心網友回復:
一些事情:
df = pd.DataFrame(detail)應該df = pd.DataFrame(newlist)- 您的回圈縮進已關閉。事實上,你為什么要回圈 URL 兩次?您從 test.csv 中獲取 url(無論如何,您應該只使用 pandas),將 url 放入
contents串列中,然后遍歷該串列。
嘗試這個:
import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd
contents = []
with open('test.csv','r') as csvf: # Open file in read mode
urls = csv.reader(csvf)
for url in urls:
try:
page = urlopen(url[0]).read()
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')[7].text.strip()[24:]
data = json.loads(scripts)
link = data['product']['response']['product']['data']['attributes']['media_map']
except:
link = 'no data'
detail = {
'name': link
}
print(detail)
contents.append(detail)
df = pd.DataFrame(contents)
df.to_csv('s1.csv')
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/519935.html
