在python中決議多個xml檔案并將資料附加到PythonDataFrame-有解無憂

我正在嘗試從多個嵌套的 xml 檔案創建資料框并將資料附加到單個資料框。我知道資料框的結構并定義了它。

tree_list = []
details = ['FirstName','LastName','City','Country']


for file in bucket_list:
    obj = s3.Object(s3_bucket_name,file)
    data = (obj.get()['Body'].read())
    tree_list.append(ET.ElementTree(ET.fromstring(data)))

def parse_XML(list_of_trees, df_cols): 
    
    for tree in tree_list:
        xroot = tree.getroot()
        rows = []
    
    
    
    for node in xroot: 
        res = []
        for el in df_cols[0:]: 
            if node is not None and node.find(el) is not None:
                res.append(node.find(el).text)
            else: 
                res.append(None)
        rows.append({df_cols[i-1]: res[i-1] 
                     for i, _ in enumerate(df_cols)})
    
    out_df = pd.DataFrame(rows, columns=df_cols)
        
    return out_df

parse_XML(tree_list,details)

在我的輸出資料框中，我得到了最后一個檔案讀取的資訊和幾個空白行，如下所示：

    FirstName LastName    City     Country
    Ted       Mosbey      Washington  USA
    None      None        None       None
    None      None        None       None

應該在代碼中進行哪些更改以讀取所有檔案、附加到資料框并洗掉不必要的行？感謝任何有效處理檔案的建議。

XML 示例：

<PD>
  <Clt>
    <PType>xxxx</PType>
    <PNumber>xxxxx</PNumber>
    <UID>xxxx</UID>
    <TEfd>xxxxx</TEfd>
    <TExd>xxxxxx</TExd>
    <DID>xxxxx</DID>
    <CType>xxxxx</CType>
    <FirstName>Ted</FirstName>
    <MiddleName></MiddleName>
    <LastName>Mosbey</LastName>
    <MailingAddrLocation>Home</MailingAddrLocation>
    <AddressLine1>3435</AddressLine1>
    <AddressLine2>Columbia RD</AddressLine2>
    <AddressLine3></AddressLine3>
    <City>Washington</City>
    <State>DC</State>
    <ZipCode>20009</ZipCode>
    <Country>USA</Country>
    <Pr>
      <PrType>xxxxx</PrType>
      <PrName>xxxxxx</PrName>
      <PrID>xxxxxx</PrID>
    </Pr>
</Clt>
</PD>

uj5u.com熱心網友回復：

所以現在當我有你的資料樣本時，我測驗了它，它對我有用，就像我認為你想要的那樣：

def parse_XML(list_of_trees, df_cols):

    def get_el(el_list):
        if len(el_list) > 1:
            return [el_text.text for el_text in el_list]
        else:
            return el_list[0].text
    rows = []
    for tree in list_of_trees:
        xroot = tree.getroot()

        for node in xroot:
            res = []
            for el in df_cols[0:]:
                if node is not None and node.findall(f".//{el}") is not None:
                    res.append(get_el(node.findall(f".//{el}")))
            rows.append(res)

    out_df = pd.DataFrame(rows, columns=df_cols)

    return out_df

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/471876.html

標籤：Python 熊猫 xml 数据框

上一篇：XSLT2.0在應用模板后更改ROOT的默認命名空間值

下一篇：使用pandas決議讀取XML到csv