Python-從XMLAPI回應構造DF-有解無憂

我從 API 呼叫接收到以下 XML 回應，并希望遍歷“結果”并將所有資料點存盤為 pandas 資料框。

通過鏈接如下所示的 .find() 方法，我成功地獲取了我感興趣的資料點，但是在給定 XML 回應的結構的情況下，我不知道如何遍歷正文中的所有結果塊。

我在 Windows 上的 Jupyter 中使用 Python 3.7 。

我試過的：

import pandas as pd
from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET


soup = BeautifulSoup(soap_response.text, "xml")
# print(soup.prettify())

objectid_field = soup.find('Results').find('ObjectID').text
customerkey_field = soup.find('Results').find('CustomerKey').text
name_field = soup.find('Results').find('Name').text
issendable_field = name_field = soup.find('Results').find('IsSendable').text
sendablesubscribe_field = soup.find('Results').find('SendableSubscriberField').text

# for de in soup:
#     de_name = soup.find('Results').find('Name').text
#     print(de_name)


# test_df = pd.read_xml(soup,
#                       xpath="//Results",
#                       namespaces={""})

示例 XML 資料結構：

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
    xmlns:soap="http://www.w3.org/2003/soap-envelope"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema"
    xmlns:xsd="http://www.w3.org/XMLSchema"
    xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
    xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-wssecurity-secext-1.0.xsd"
    xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-security-1.0.xsd">
    <env:Header
        xmlns:env="http://www.w3.org/2003/05/soap-envelope">
        <wsa:Action>RetrieveResponse</wsa:Action>
        <wsa:MessageID>urn:uuid:1234</wsa:MessageID>
        <wsa:RelatesTo>urn:uuid:1234</wsa:RelatesTo>
        <wsa:To>http://schemas.xmlsoap.org/ws/2004/08/dressing/role/anonymous</wsa:To>
        <wsse:Security>
            <wsu:Timestamp wsu:Id="Timestamp-1234">
                <wsu:Created>2021-11-07T13:10:54Z</wsu:Created>
                <wsu:Expires>2021-11-07T13:15:54Z</wsu:Expires>
            </wsu:Timestamp>
        </wsse:Security>
    </env:Header>
    <soap:Body>
        <RetrieveResponseMsg
            xmlns="http://partnerAPI">
            <OverallStatus>OK</OverallStatus>
            <RequestID>f9876</RequestID>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object1</ObjectID>
                <CustomerKey>Customer1</CustomerKey>
                <Name>Test1</Name>
                <IsSendable>true</IsSendable>
                <SendableSubscriberField>
                    <Name>_Something1</Name>
                </SendableSubscriberField>
            </Results>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object2</ObjectID>
                <CustomerKey>Customer2</CustomerKey>
                <Name>Name2</Name>
                <IsSendable>true</IsSendable>
                <SendableSubscriberField>
                    <Name>_Something2</Name>
                </SendableSubscriberField>
            </Results>
            <Results xsi:type="Data">
                <PartnerKey xsi:nil="true" />
                <ObjectID>Object3</ObjectID>
                <CustomerKey>AnotherKey</CustomerKey>
                <Name>Something3</Name>
                <IsSendable>false</IsSendable>
            </Results>
        </RetrieveResponseMsg>
    </soap:Body>
</soap:Envelope>'

uj5u.com熱心網友回復：

你非常接近，你需要找到所有的Results標簽，然后遍歷它們，最后抓住你想要的元素：

for el in soup.find_all('Results'):
    objectid_field = el.find('ObjectID').text
    customerkey_field = el.find('CustomerKey').text
    name_field = el.find('Name').text
    issendable_field = name_field = el.find('IsSendable').text
    sendablesubscribe_field = el.find('SendableSubscriberField').text

但是，SendableSubscriberField并不總是存在，因此您可能需要先檢查是否sendable為 True：

for el in soup.find_all('Results'):
    objectid_field = el.find('ObjectID').text
    customerkey_field = el.find('CustomerKey').text
    name_field = el.find('Name').text
    issendable_field = el.find('IsSendable').text

    # skip if not sendable
    if issendable_field == 'false':
        sendablesubscribe_field = None
        continue

    sendablesubscribe_field = el.find('SendableSubscriberField').find('Name').text

編輯：構建資料框

要從中構建資料框，我會將所有內容收集到一個list字典中：

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(...)

data = []

for el in soup.find_all('Results'):
    record = {}

    record['ObjectID'] = el.find('ObjectID').text
    record['CustomerKey'] = el.find('CustomerKey').text
    record['Name'] = el.find('Name').text
    record['IsSendable'] = el.find('IsSendable').text

    # skip if not sendable
    if record['IsSendable'] == 'false':
        record['SendableSubscriberField'] = None
        continue

    record['SendableSubscriberField'] = el.find('SendableSubscriberField').find('Name').text

    data.append(record)


df = pd.DataFrame(data)

uj5u.com熱心網友回復：

pandas.read_xml通過確認默認命名空間 ( http://partnerAPI)重新考慮使用。另外，由于您需要較低級別的值，因此運行read_xml兩次并join得到結果。請注意，即使丟失，也會回傳所有屬性和元素值。

soap_df = (
    pd.read_xml(
        soap_response.text, 
        xpath = ".//rrm:RetrieveResponseMsg/rrm:Results",
        namespaces = {"rrm": "http://partnerAPI"}
    ).join(
        pd.read_xml(
            soap_response.text, 
            xpath = ".//rrm:RetrieveResponseMsg/rrm:Results/rrm:SendableSubscriberField",
            namespaces = {"rrm": "http://partnerAPI"},
            names = ["SendableSubscriberField_Name", ""]
        ),
    )
)
    
print(soap_df)
#    type  PartnerKey ObjectID CustomerKey        Name  IsSendable  SendableSubscriberField SendableSubscriberField_Name
# 0  Data         NaN  Object1   Customer1       Test1        True                      NaN                  _Something1
# 1  Data         NaN  Object2   Customer2       Name2        True                      NaN                  _Something2
# 2  Data         NaN  Object3  AnotherKey  Something3       False                      NaN                          NaN

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/529355.html

標籤：Pythonpython-3.x熊猫xml

上一篇：PythonXML決議缺少的元素：“無”與無

下一篇：SQL合并陳述句檢查約束錯誤