獲取父母、孩子和他們的孩子的文本-有解無憂

<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
  <fournisseur>
    <nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
    <adjudicataire>1</adjudicataire>
    <montantsoumis>0.000000</montantsoumis>
    <montantssoumisunite>0</montantssoumisunite>
    <montantcontrat>89732.240000</montantcontrat>
    <montanttotalcontrat>0.000000</montanttotalcontrat>
  </fournisseur>
</fournisseurs>
</avis>

所以有 avis，avis 有 Fournisseurs，fournisseurs 有進一步的節點。如何將這些值獲取到資料框？

我正在使用下面的代碼

element_tree = ET.parse('D:\\python_script\\temp2\\AvisRevisions_20200201_20200229.xml')
root = element_tree.getroot()
for child in root.findall('.//avis/*/*/*'):

或者

for child in root.findall('.//avis/*'):

但它只能讓我獲得父節點或子節點，而不是全部。

uj5u.com熱心網友回復：

由于您的資料不平坦，因此當您將 xml 直接匯入到 Pandas 時可能會導致問題。在這種情況下，像這樣的庫pandas_read_xml可能很有用：

import pandas_read_xml as pdx

df = pdx.read_xml(xml)
df = pdx.fully_flatten(df)  # this should get you the structure you want

在上面的行中，xml 變數是您的“AvisRevisions_20200201_20200229.xml”檔案。

對于更扁平的結構，您可以使用 Pandas：

import pandas as pd

df = pd.read_xml(xml, xpath="//fournisseurs")

如果您正在尋找整個“avis”部分，您可以通過以下方式替換 xpath 引數：

df = pd.read_xml(xml, xpath="//avis")

由此，pandas 應該使用適當的列創建資料框。這是Pandas 檔案的鏈接。

uj5u.com熱心網友回復：

試試下面的

import xml.etree.ElementTree as ET
import pandas as pd

xml = '''<avis>
<numeroseao>1331795</numeroseao>
<numero>61628-3435560</numero>
<organisme>Ville de Québec</organisme>
<fournisseurs>
  <fournisseur>
    <nomorganisation>APEL ASSOCIATION POUT DU LA MARAISNORD</nomorganisation>
    <adjudicataire>1</adjudicataire>
    <montantsoumis>0.000000</montantsoumis>
    <montantssoumisunite>0</montantssoumisunite>
    <montantcontrat>89732.240000</montantcontrat>
    <montanttotalcontrat>0.000000</montanttotalcontrat>
  </fournisseur>
</fournisseurs>
</avis>'''
root = ET.fromstring(xml)

data = []
fournisseur = root.find('.//fournisseur')
data.append({e.tag:e.text for e in fournisseur})
df = pd.DataFrame(data)

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/368118.html

標籤：Python 蟒蛇-3.x xml 元素树

上一篇：從XML配置中提取特定值

下一篇：Beautifulsoup洗掉xml樹中除必需元素以外的元素