我有一個相當大的 xml 檔案,其中包含多個不同的元素,類似于下面的一個:
<adrmsg:ADRMessage xmlns:adrmsg="http://www.eurocontrol.int/cfmu/b2b/ADRMessage"
xmlns:gml="http://www.opengis.net/gml/3.2" gml:id="ID_197112_1650420171084_1"
xmlns:adrext="http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR"
xmlns:aixm="http://www.aixm.aero/schema/5.1.1"
xmlns:xlink="http://www.w3.org/1999/xlink">
<adrmsg:hasMember>
<aixm:Airspace gml:id="ID_197112_1650420171084_93332">
<gml:identifier codeSpace="urn:uuid:">3271922d-6b7a-4953-a6ff-599b17ab785e</gml:identifier>
<aixm:timeSlice>
<aixm:AirspaceTimeSlice gml:id="ID_197112_1650420171084_93333">
<gml:validTime>
<gml:TimePeriod gml:id="ID_197112_1650420171084_93334">
<gml:beginPosition>2021-10-07T00:00:00</gml:beginPosition>
<gml:endPosition indeterminatePosition="unknown"/>
</gml:TimePeriod>
</gml:validTime>
<aixm:interpretation>BASELINE</aixm:interpretation>
<aixm:featureLifetime>
<gml:TimePeriod gml:id="ID_197112_1650420171084_93335">
<gml:beginPosition>2021-10-07T00:00:00</gml:beginPosition>
<gml:endPosition indeterminatePosition="unknown"/>
</gml:TimePeriod>
</aixm:featureLifetime>
<aixm:type>RAS</aixm:type>
<aixm:designator>EDGGNFRA</aixm:designator>
<aixm:name>EDGG NON FRA</aixm:name>
<aixm:designatorICAO>NO</aixm:designatorICAO>
<aixm:geometryComponent>
<aixm:AirspaceGeometryComponent gml:id="ID_197112_1650420171084_93336">
<aixm:operation>BASE</aixm:operation>
<aixm:theAirspaceVolume>
<aixm:AirspaceVolume gml:id="ID_197112_1650420171084_93337">
<aixm:upperLimit uom="FL">265</aixm:upperLimit>
<aixm:upperLimitReference>STD</aixm:upperLimitReference>
<aixm:lowerLimit uom="FL">245</aixm:lowerLimit>
<aixm:lowerLimitReference>STD</aixm:lowerLimitReference>
<aixm:contributorAirspace>
<aixm:AirspaceVolumeDependency gml:id="ID_197112_1650420171084_93338">
<aixm:dependency>HORZ_PROJECTION</aixm:dependency>
<aixm:theAirspace xlink:href="urn:uuid:5831b5a2-4861-4bf5-ae99-d31413234cdb"/>
</aixm:AirspaceVolumeDependency>
</aixm:contributorAirspace>
</aixm:AirspaceVolume>
</aixm:theAirspaceVolume>
</aixm:AirspaceGeometryComponent>
</aixm:geometryComponent>
<aixm:geometryComponent>
<aixm:AirspaceGeometryComponent gml:id="ID_197112_1650420171084_93339">
<aixm:operation>UNION</aixm:operation>
<aixm:theAirspaceVolume>
<aixm:AirspaceVolume gml:id="ID_197112_1650420171084_93340">
<aixm:upperLimit uom="FL">255</aixm:upperLimit>
<aixm:upperLimitReference>STD</aixm:upperLimitReference>
<aixm:lowerLimit uom="FL">245</aixm:lowerLimit>
<aixm:lowerLimitReference>STD</aixm:lowerLimitReference>
<aixm:contributorAirspace>
<aixm:AirspaceVolumeDependency gml:id="ID_197112_1650420171084_93341">
<aixm:dependency>HORZ_PROJECTION</aixm:dependency>
<aixm:theAirspace xlink:href="urn:uuid:dcd8301c-de12-4e6c-992f-fd8de781ab58"/>
</aixm:AirspaceVolumeDependency>
</aixm:contributorAirspace>
</aixm:AirspaceVolume>
</aixm:theAirspaceVolume>
</aixm:AirspaceGeometryComponent>
</aixm:geometryComponent>
<aixm:extension>
<adrext:AirspaceExtension gml:id="ID_197112_1650420171084_93342">
<adrext:usage>OPERATIONAL</adrext:usage>
</adrext:AirspaceExtension>
</aixm:extension>
</aixm:AirspaceTimeSlice>
</aixm:timeSlice>
</aixm:Airspace>
</adrmsg:hasMember>
.... many other <adrmsg:hasMember>
</adrmsg:ADRMessage>
我只添加了其中一個元素 命名空間。
我的代碼嘗試:
import xml.etree.ElementTree as ET
import pandas as pd
ab = {"adrmsg":"http://www.eurocontrol.int/cfmu/b2b/ADRMessage",
"gml":"http://www.opengis.net/gml/3.2",
"adrext":"http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR",
"aixm": "http://www.aixm.aero/schema/5.1.1",
"xlink":"http://www.w3.org/1999/xlink",
"id":"http://www.opengis.net/gml/3.2",
"href":"http://www.w3.org/1999/xlink"
}
root_node = ET.parse('Airspace.xml').getroot()
pipare = []
verate = []
for tag in root_node.findall(".//aixm:Airspace" , ab):
value = tag.find("gml:identifier", ab)
for char in tag.findall(".//aixm:AirspaceTimeSlice", ab):
for per in char.findall(".//aixm:type",ab):
for ir in char.findall(".//aixm:name",ab):
for epa in char.findall(".//aixm:designator", ab):
for op in char.findall(".//aixm:theAirspace[@xlink:href]", ab):
pipare = [value.text, char.attrib,per.text,ir.text,epa.text,op.attrib]
verate.append(pipare)
xml_todf = pd.DataFrame(verate, columns=['uuid','id','type','name','designator','contributorAirspace'])
正如您可能看到的那樣,我以一種非常“粗略”的方式嘗試決議該 XML,提取我感興趣的元素,最后將它們放入 pandas DataFrame。
當我“捕獲” .text 時,提取的資料就是我想要的,但是在捕獲屬性時,結果不僅是值,還有命名空間......我不知道該怎么做才能解決這個問題。讓我分享一下 pandas DataFrame 是如何顯示這些資料的:
| uuid | ID | 型別 | 姓名 | 代號 | 貢獻者空域 |
|---|---|---|---|---|---|
| 3271922d-6b7a-4953-a6ff-599b17ab785e | {'{http://www.opengis.net/gml/3.2}id':'ID_197112_1650420171084_93333'} | RAS | EDGG 非 FRA | EDGGNFRA | {'{http://www.w3.org/1999/xlink}href':'urn:uuid:5831b5a2-4861-4bf5-ae99-d31413234cdb'} |
| 3271922d-6b7a-4953-a6ff-599b17ab785e | {'{http://www.opengis.net/gml/3.2}id':'ID_197112_1650420171084_93333'} | RAS | EDGG 非 FRA | EDGGNFRA | {'{http://www.w3.org/1999/xlink}href':'urn:uuid:dcd8301c-de12-4e6c-992f-fd8de781ab58'} |
理想情況下,我希望有這樣的東西:
| uuid | ID | 型別 | 姓名 | 代號 | 貢獻者空域 |
|---|---|---|---|---|---|
| 3271922d-6b7a-4953-a6ff-599b17ab785e | 'ID_197112_1650420171084_93333'} | RAS | EDGG 非 FRA | EDGGNFRA | 5831b5a2-4861-4bf5-ae99-d31413234cdb,dcd8301c-de12-4e6c-992f-fd8de781ab58 |
但如果有人能幫助我達到這一點,我將不勝感激:
| uuid | ID | 型別 | 姓名 | 代號 | 貢獻者空域 |
|---|---|---|---|---|---|
| 3271922d-6b7a-4953-a6ff-599b17ab785e | 'ID_197112_1650420171084_93333'} | RAS | EDGG 非 FRA | EDGGNFRA | 5831b5a2-4861-4bf5-ae99-d31413234cdb |
| 3271922d-6b7a-4953-a6ff-599b17ab785e | 'ID_197112_1650420171084_93333'} | RAS | EDGG 非 FRA | EDGGNFRA | dcd8301c-de12-4e6c-992f-fd8de781ab58 |
謝謝你的幫助
uj5u.com熱心網友回復:
Python elementtree 需要通過其限定名稱(即名稱空間 屬性名稱)使用名稱空間來尋址屬性。當參考char.attrib或op.attrib檢索包含所有元素屬性及其值的字典時。下面是一個屬性值檢索的例子:
import xml.etree.ElementTree as ET
import pandas as pd
from collections import defaultdict
ab = {"adrmsg":"http://www.eurocontrol.int/cfmu/b2b/ADRMessage",
"gml":"http://www.opengis.net/gml/3.2",
"adrext":"http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR",
"aixm": "http://www.aixm.aero/schema/5.1.1",
"xlink":"http://www.w3.org/1999/xlink",
"id":"http://www.opengis.net/gml/3.2",
"href":"http://www.w3.org/1999/xlink"
}
# parse XML
root_node = ET.fromstring(xml)
# create dictionary to store parsed data
data = defaultdict(list)
for tag in root_node.findall(".//aixm:Airspace" , ab):
value = tag.find("gml:identifier", ab)
for char in tag.findall(".//aixm:AirspaceTimeSlice", ab):
for per in char.findall(".//aixm:type",ab):
for ir in char.findall(".//aixm:name",ab):
for epa in char.findall(".//aixm:designator", ab):
for op in char.findall(".//aixm:theAirspace[@xlink:href]", ab):
data['uuid'].append(value.text)
data['id'].append(char.attrib['{http://www.opengis.net/gml/3.2}id'])
data['type'].append(per.text)
data['name'].append(ir.text)
data['designator'].append(epa.text)
#data['contributorAirspace'].append(op.attrib['{http://www.w3.org/1999/xlink}href'])
df = pd.DataFrame(data)
注意運算式char.attrib['{http://www.opengis.net/gml/3.2}id']和op.attrib['{http://www.w3.org/1999/xlink}href']。它們使用限定名稱尋址屬性并檢索屬性值。
此示例也使用 defaultdict 而不是兩個串列,但這是個人喜好問題。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/493458.html
