將資料追加到資料框中-有解無憂

他們向我展示了這些錯誤我該ValueError: All arrays must be of the same length如何解決這些錯誤任何給我解決這些問題的人我正在嘗試很多方法但我無法解決這些錯誤所以我該如何處理這些錯誤我的陣列不一樣

import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd 

url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
    if('CCRZ.detailData.jsonProductData = {"' in str(table)):
        x=str(table).split('CCRZ.detailData.jsonProductData = {"')
        raw_json = "{\"" str(x[-1]).split('};')[0] "}"
        break
           
req_json = json.loads(raw_json)
# with open("text_json.json","w")as file:
#     x=json.dump(req_json,file,indent=4)

temp = req_json

name=[]
specs=[]


title=temp['product']['prodBean']['name']
name.append(title)


item=temp['specifications']['MARKETING']
for i in item:
    try:
        get=i['value']
    except:
        pass

    specs.append(get)


temp={'title':name,'Specification':specs}
df=pd.DataFrame(temp)
print(df)

uj5u.com熱心網友回復：

雖然錯誤很清楚，但問題和預期結果卻不是。

您嘗試創建 DataFrame 的方式必須處理丟失的行，這就是錯誤出現的原因。要解決此問題，您可以從 dict 創建 DataFrame：

pd.DataFrame.from_dict(temp, orient='index')

但這看起來很丑陋，以后不能很好地處理，所以另一種選擇是：

data = [{
    'title':temp['product']['prodBean']['name'],
    'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]
pd.DataFrame(data)

或者如果您希望將每個規范放在一個新行中，請遵循：

data = {
    'title':temp['product']['prodBean']['name'],
    'specs':[s.get('value') for s in temp['specifications']['MARKETING']]
}

pd.DataFrame.from_dict(data)

例子

import enum
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd 

url="https://www.fleetpride.com/parts/otr-coiled-air-hose-otr6818"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.3"
}
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
raw_json = ""
for table_index,table in enumerate( soup.find_all("script")):
    if('CCRZ.detailData.jsonProductData = {"' in str(table)):
        x=str(table).split('CCRZ.detailData.jsonProductData = {"')
        raw_json = "{\"" str(x[-1]).split('};')[0] "}"
        break

temp = json.loads(raw_json)

data = [{
    'title':temp['product']['prodBean']['name'],
    'specs':','.join([s.get('value') for s in temp['specifications']['MARKETING']])
}]

pd.DataFrame(data)

輸出

	標題	眼鏡
0	OTR 拖車空氣軟管和電纜組件，15'	螺旋纏繞，包括掛衣領，一捆便于管理

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/486866.html

標籤：json 熊猫数据框网页抓取美丽的汤

上一篇：抓取ap標簽內的i和span標簽的值

下一篇：page.evaluate或page.$eval在Playwright中總是回傳undefined