我從 API 呼叫接收到以下 XML 回應,并希望遍歷“結果”并將所有資料點存盤為 pandas 資料框。
通過鏈接如下所示的 .find() 方法,我成功地獲取了我感興趣的資料點,但是在給定 XML 回應的結構的情況下,我不知道如何遍歷正文中的所有結果塊。
我在 Windows 上的 Jupyter 中使用 Python 3.7 。
我試過的:
import pandas as pd
from bs4 import BeautifulSoup
import xml.etree.ElementTree as ET
soup = BeautifulSoup(soap_response.text, "xml")
# print(soup.prettify())
objectid_field = soup.find('Results').find('ObjectID').text
customerkey_field = soup.find('Results').find('CustomerKey').text
name_field = soup.find('Results').find('Name').text
issendable_field = name_field = soup.find('Results').find('IsSendable').text
sendablesubscribe_field = soup.find('Results').find('SendableSubscriberField').text
# for de in soup:
# de_name = soup.find('Results').find('Name').text
# print(de_name)
# test_df = pd.read_xml(soup,
# xpath="//Results",
# namespaces={""})
示例 XML 資料結構:
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
xmlns:soap="http://www.w3.org/2003/soap-envelope"
xmlns:xsi="http://www.w3.org/2001/XMLSchema"
xmlns:xsd="http://www.w3.org/XMLSchema"
xmlns:wsa="http://schemas.xmlsoap.org/ws/2004/08/addressing"
xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-wssecurity-secext-1.0.xsd"
xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-201-wss-security-1.0.xsd">
<env:Header
xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<wsa:Action>RetrieveResponse</wsa:Action>
<wsa:MessageID>urn:uuid:1234</wsa:MessageID>
<wsa:RelatesTo>urn:uuid:1234</wsa:RelatesTo>
<wsa:To>http://schemas.xmlsoap.org/ws/2004/08/dressing/role/anonymous</wsa:To>
<wsse:Security>
<wsu:Timestamp wsu:Id="Timestamp-1234">
<wsu:Created>2021-11-07T13:10:54Z</wsu:Created>
<wsu:Expires>2021-11-07T13:15:54Z</wsu:Expires>
</wsu:Timestamp>
</wsse:Security>
</env:Header>
<soap:Body>
<RetrieveResponseMsg
xmlns="http://partnerAPI">
<OverallStatus>OK</OverallStatus>
<RequestID>f9876</RequestID>
<Results xsi:type="Data">
<PartnerKey xsi:nil="true" />
<ObjectID>Object1</ObjectID>
<CustomerKey>Customer1</CustomerKey>
<Name>Test1</Name>
<IsSendable>true</IsSendable>
<SendableSubscriberField>
<Name>_Something1</Name>
</SendableSubscriberField>
</Results>
<Results xsi:type="Data">
<PartnerKey xsi:nil="true" />
<ObjectID>Object2</ObjectID>
<CustomerKey>Customer2</CustomerKey>
<Name>Name2</Name>
<IsSendable>true</IsSendable>
<SendableSubscriberField>
<Name>_Something2</Name>
</SendableSubscriberField>
</Results>
<Results xsi:type="Data">
<PartnerKey xsi:nil="true" />
<ObjectID>Object3</ObjectID>
<CustomerKey>AnotherKey</CustomerKey>
<Name>Something3</Name>
<IsSendable>false</IsSendable>
</Results>
</RetrieveResponseMsg>
</soap:Body>
</soap:Envelope>'
uj5u.com熱心網友回復:
你非常接近,你需要找到所有的Results標簽,然后遍歷它們,最后抓住你想要的元素:
for el in soup.find_all('Results'):
objectid_field = el.find('ObjectID').text
customerkey_field = el.find('CustomerKey').text
name_field = el.find('Name').text
issendable_field = name_field = el.find('IsSendable').text
sendablesubscribe_field = el.find('SendableSubscriberField').text
但是,SendableSubscriberField并不總是存在,因此您可能需要先檢查是否sendable為 True:
for el in soup.find_all('Results'):
objectid_field = el.find('ObjectID').text
customerkey_field = el.find('CustomerKey').text
name_field = el.find('Name').text
issendable_field = el.find('IsSendable').text
# skip if not sendable
if issendable_field == 'false':
sendablesubscribe_field = None
continue
sendablesubscribe_field = el.find('SendableSubscriberField').find('Name').text
編輯:構建資料框
要從中構建資料框,我會將所有內容收集到一個list字典中:
import pandas as pd
from bs4 import BeautifulSoup
soup = BeautifulSoup(...)
data = []
for el in soup.find_all('Results'):
record = {}
record['ObjectID'] = el.find('ObjectID').text
record['CustomerKey'] = el.find('CustomerKey').text
record['Name'] = el.find('Name').text
record['IsSendable'] = el.find('IsSendable').text
# skip if not sendable
if record['IsSendable'] == 'false':
record['SendableSubscriberField'] = None
continue
record['SendableSubscriberField'] = el.find('SendableSubscriberField').find('Name').text
data.append(record)
df = pd.DataFrame(data)
uj5u.com熱心網友回復:
pandas.read_xml通過確認默認命名空間 ( http://partnerAPI)重新考慮使用。另外,由于您需要較低級別的值,因此運行read_xml兩次并join得到結果。請注意,即使丟失,也會回傳所有屬性和元素值。
soap_df = (
pd.read_xml(
soap_response.text,
xpath = ".//rrm:RetrieveResponseMsg/rrm:Results",
namespaces = {"rrm": "http://partnerAPI"}
).join(
pd.read_xml(
soap_response.text,
xpath = ".//rrm:RetrieveResponseMsg/rrm:Results/rrm:SendableSubscriberField",
namespaces = {"rrm": "http://partnerAPI"},
names = ["SendableSubscriberField_Name", ""]
),
)
)
print(soap_df)
# type PartnerKey ObjectID CustomerKey Name IsSendable SendableSubscriberField SendableSubscriberField_Name
# 0 Data NaN Object1 Customer1 Test1 True NaN _Something1
# 1 Data NaN Object2 Customer2 Name2 True NaN _Something2
# 2 Data NaN Object3 AnotherKey Something3 False NaN NaN
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/529355.html
下一篇:SQL合并陳述句檢查約束錯誤
