無法從json中抓取jpg影像鏈接-有解無憂

我正在嘗試從每個產品中抓取 jpg 影像，每個產品 url 保存在 csv 中。影像鏈接在 json 資料中可用，因此請嘗試訪問 json 鍵值。當我嘗試運行代碼時，盡管有影像 url 鏈接，它只會回傳所有鍵值，其次我的代碼只能抓取最后一個產品 url，盡管所有 url 都保存在 csv 中。

{'name': {'b': {'src': {'xs': 'https://ctl.s6img.com/society6/img/xVx1vleu7iLcR79ZkRZKqQiSzZE/w_125/artwork/~artwork/s6-0041/a/18613683_5971445', 'lg': 'https://ctl.s6img.com/society6/img/W-ESMqUtC_oOEUjx-1E_SyIdueI/w_550/artwork/~artwork/s6-0041/a/18613683_5971445', 'xl': 'https://ctl.s6img.com/society6/img/z90VlaYwd8cxCqbrZ1ttAxINpaY/w_700/artwork/~artwork/s6-0041/a/18613683_5971445', 'xxl': None}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'c': {'src': {'xs': 'https://ctl.s6img.com/society6/img/KQJbb4jG0gBHcqQiOCivLUbKMxI/w_125/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/ztGrxSpA7FC1LfzM3UldiQkEi7g/w_550/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/PHjp9jDic2NGUrpq8k0aaxsYZr4/w_700/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/m-1HhSM5CIGl6DY9ukCVxSmVDIw/w_1500/cutting-board/rectangle/lifestyle/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}, 'd': {'src': {'xs': 'https://ctl.s6img.com/society6/img/G9TikRnVvy1w0kwKCAmgWsWy42Q/w_125/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'lg': 'https://ctl.s6img.com/society6/img/uVOYOxbHmhrNhmGQAi6QeydrFdY/w_550/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xl': 'https://ctl.s6img.com/society6/img/-WIIUx9oB6jQKJdkSkq2ofhjLzc/w_700/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg', 'xxl': 'https://ctl.s6img.com/society6/img/HlSFppIm7Wk6aVxO17fI4b5s0ts/w_1500/cutting-board/rectangle/front/~artwork,fw_1572,fh_2500,fx_93,fy_746,iw_1386,ih_2142/s6-0041/a/18613725_13086827/~~/im-not-always-a-bitch-red-cutting-board.jpg'}, 'type': 'image', 'alt': "I'M NOT ALWAYS A BITCH (Red) Cutting Board", 'meta': None}}}

這是json資料。我只想抓取 jpg 圖片鏈接。下面是我的代碼：

import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd


contents = []
with open('test.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        contents.append(url) # Add each url to list contents
        newlist = []
        for url in contents:
            try:
                page = urlopen(url[0]).read()
                soup = BeautifulSoup(page, 'html.parser')
                scripts = soup.find_all('script')[7].text.strip()[24:]
                data = json.loads(scripts)
                link = data['product']['response']['product']['data']['attributes']['media_map']
            except:
                link = 'no data'
            detail = {
                'name': link
                }
            print(detail)
            newlist.append(detail)
df = pd.DataFrame(detail)
df.to_csv('s1.csv')

我正在嘗試抓取所有 jpg 影像鏈接并保存具有每個產品 url 的 csv 檔案，所以我想打開 csv 檔案并回圈每個 url。

uj5u.com熱心網友回復：

一些事情：

df = pd.DataFrame(detail)應該df = pd.DataFrame(newlist)
您的回圈縮進已關閉。事實上，你為什么要回圈 URL 兩次？您從 test.csv 中獲取 url（無論如何，您應該只使用 pandas），將 url 放入contents串列中，然后遍歷該串列。

嘗試這個：

import json
import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd


contents = []
with open('test.csv','r') as csvf: # Open file in read mode
    urls = csv.reader(csvf)
    for url in urls:
        try:
            page = urlopen(url[0]).read()
            soup = BeautifulSoup(page, 'html.parser')
            scripts = soup.find_all('script')[7].text.strip()[24:]
            data = json.loads(scripts)
            link = data['product']['response']['product']['data']['attributes']['media_map']
        except:
            link = 'no data'
        detail = {
            'name': link
            }
        print(detail)
        contents.append(detail)
df = pd.DataFrame(contents)
df.to_csv('s1.csv')

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/519935.html

標籤：PythonjsonCSV网页抓取

上一篇：嵌套的Json資料不會使用Swift語言解碼？

下一篇：JSON值轉換為unicode-python