我是 python 新手,正在尋找以下解決方案:
我有一個看起來像這樣的 file.xml:
<?xml version="1.0" encoding="UTF-8"?>
<HEADER>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>green cat w short hair</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>green cat w short hair and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>green</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Legs</FNAME>
<FVALUE>14</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
<PRODUCT_DETAILS>
<DESCRIPTION_SHORT>blue dog w no tail</DESCRIPTION_SHORT>
<DESCRIPTION_LONG>blue dog w no tail and unlimited zoomies</DESCRIPTION_LONG>
</PRODUCT_DETAILS>
<PRODUCT_FEATURES>
<FEATURE>
<FNAME>Colour</FNAME>
<FVALUE>blue</FVALUE>
</FEATURE>
<FEATURE>
<FNAME>Happiness Levels</FNAME>
<FVALUE>11/10</FVALUE>
</FEATURE>
</PRODUCT_FEATURES>
</HEADER>
這是我的代碼:
from lxml import etree as et
import pandas as pd
xml_data = et.parse('file2.xml')
products = xml_data.xpath('//HEADER')
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*')]
headers.extend(xml_data.xpath('//HEADER[1]//FNAME/text()'))
rows = []
for product in products:
row = [product.xpath(f'.//{headers[0]}/text()')[0],product.xpath(f'.//{headers[1]}/text()')[0]]
f_values = product.xpath('.//FVALUE/text()')
row.extend(f_values)
rows.append(row)
df = pd.DataFrame(rows,columns=headers)
df
# df.to_csv("File2_Export_V1.csv", index=False)
這是我想要的輸出:
DESCRIPTION_SHORT DESCRIPTION_LONG Colour Legs Happiness Levels
0 green cat w short hair green cat w short hair and unlimited zoomies green 14
1 blue dog w no tail blue dog w no tail and unlimited zoomies blue 11/10
我解決這個問題的嘗試是像這樣擴展一行:
headers=[elem.tag for elem in xml_data.xpath('//HEADER[1]//PRODUCT_DETAILS//*'),('//HEADER[2]//PRODUCT_DETAILS//*')]
可悲的是,我收到語法錯誤并且沒有解決方案。
如何調整我的代碼以反映 xml 結構?
先感謝您!~C
uj5u.com熱心網友回復:
可能不是最好的解決方案,但我認為它非常簡單明了。
import xml.etree.ElementTree as ET
import pandas as pd
# Get xml object
tree = ET.parse('file2.xml')
root = tree.getroot()
# Create final DataFrame
out = pd.DataFrame()
# Loop over all products (Product = (DETAILS, FEATURES))
for i in range(0, len(root), 2):
# Get all descriptions
descriptions = [(child.tag, child.text) for child in root[i]]
# Get all features
features = [(child[0].text, child[1].text) for child in root[i 1]]
# Create a DataFrame, where columns are the tags, and values are, well, values
temp_df = pd.DataFrame([[i[1] for i in descriptions features]], columns=[i[0] for i in descriptions features])
# Append to final DataFrame
out = pd.concat([out, temp_df])
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/454737.html
