我需要洗掉 XML 檔案的某些部分,例如這個檔案:
<dict>
<key>Images</key>
<array>
<dict>
<key>ImageIndex</key>
<integer>0</integer>
<key>NumberOfROIs</key>
<integer>42</integer>
<key>ROIs</key>
<array>
<dict>
<key>Area</key>
<real>0.0</real>
<key>Center</key>
<string>(0.000000, 0.000000, 0.000000)</string>
<key>Dev</key>
<real>0.0</real>
<key>IndexInImage</key>
<integer>0</integer>
<key>Max</key>
<real>1358</real>
<key>Mean</key>
<real>1358</real>
<key>Min</key>
<real>1358</real>
<key>Name</key>
<string>Calcification</string>
<key>NumberOfPoints</key>
<integer>1</integer>
<key>Point_mm</key>
<array>
<string>(0.000000, 0.000000, 0.000000)</string>
</array>
<key>Point_px</key>
<array>
<string>(2964.620117, 3427.979980)</string>
</array>
<key>Total</key>
<real>1358</real>
<key>Type</key>
<integer>19</integer>
</dict>
<dict>
<key>Area</key>
<real>0.0</real>
<key>Center</key>
<string>(0.000000, 0.000000, 0.000000)</string>
<key>Dev</key>
<real>0.0</real>
<key>IndexInImage</key>
<integer>1</integer>
<key>Max</key>
<real>1401</real>
<key>Mean</key>
<real>1401</real>
<key>Min</key>
<real>1401</real>
<key>Name</key>
<string>Calcification</string>
<key>NumberOfPoints</key>
<integer>1</integer>
<key>Point_mm</key>
<array>
<string>(0.000000, 0.000000, 0.000000)</string>
</array>
<key>Point_px</key>
<array>
<string>(2993.159912, 3403.550049)</string>
</array>
<key>Total</key>
<real>1401</real>
<key>Type</key>
<integer>19</integer>
</dict>
<dict>
<key>Area</key>
<real>1.3665732145309448</real>
<key>Center</key>
<string>(0.000000, 0.000000, 0.000000)</string>
<key>Dev</key>
<real>66.487342834472656</real>
<key>IndexInImage</key>
<integer>36</integer>
<key>Max</key>
<real>1836</real>
<key>Mean</key>
<real>1583.29638671875</real>
<key>Min</key>
<real>1313</real>
<key>Name</key>
<string>Mass</string>
<key>NumberOfPoints</key>
<integer>89</integer>
<key>Point_mm</key>
<array>
<string>(0.000000, 0.000000, 0.000000)</string>
<string>(0.000000, 0.000000, 0.000000)</string>
</array>
<key>Point_px</key>
<array>
<string>(3196.290039, 1048.599976)</string>
<string>(3203.560059, 1046.170044)</string>
<string>(3211.330078, 1042.780029)</string>
<string>(3189.500000, 1050.540039)</string>
</array>
<key>Total</key>
<real>44457380</real>
<key>Type</key>
<integer>15</integer>
</dict>
</array>
</dict>
</array>
</dict>
</plist>
我想洗掉 < dict > </ dict > 之間的所有內容,包括其中有 < string > Calcification </string > 的內容,換句話說,我只想要沒有鈣化的部分,這是我想要的結果檔案將是:
<dict>
<key>Images</key>
<array>
<dict>
<key>ImageIndex</key>
<integer>0</integer>
<key>NumberOfROIs</key>
<integer>42</integer>
<key>ROIs</key>
<array>
<dict>
<key>Area</key>
<real>1.3665732145309448</real>
<key>Center</key>
<string>(0.000000, 0.000000, 0.000000)</string>
<key>Dev</key>
<real>66.487342834472656</real>
<key>IndexInImage</key>
<integer>36</integer>
<key>Max</key>
<real>1836</real>
<key>Mean</key>
<real>1583.29638671875</real>
<key>Min</key>
<real>1313</real>
<key>Name</key>
<string>Mass</string>
<key>NumberOfPoints</key>
<integer>89</integer>
<key>Point_mm</key>
<array>
<string>(0.000000, 0.000000, 0.000000)</string>
<string>(0.000000, 0.000000, 0.000000)</string>
</array>
<key>Point_px</key>
<array>
<string>(3196.290039, 1048.599976)</string>
<string>(3203.560059, 1046.170044)</string>
<string>(3211.330078, 1042.780029)</string>
<string>(3189.500000, 1050.540039)</string>
</array>
<key>Total</key>
<real>44457380</real>
<key>Type</key>
<integer>15</integer>
</dict>
</array>
</dict>
</array>
</dict>
</plist>
這是我嘗試過的:
data = r"C:\Users\vinc\Desktop\ExemploXML.xml"
import xml.etree.ElementTree as ET
tree = ET.parse(data)
root = tree.getroot()
for e in root.findall(".//string"):
if e.text == 'Calcification':
print(e)
root.remove(e)
else:
pass
tree.write(r'C:\Users\vinc\Desktop\out.xml')
結果 ======================================
<Element 'string' at 0x000002B085002EA0>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-43-d417d00038ed> in <module>
8
9 print(e)
---> 10 root.remove(e)
11 else:
12 pass
ValueError: list.remove(x): x not in list
對于背景關系,那些 XML 檔案是語意分割資訊,我想洗掉 Calcification 類注釋。
uj5u.com熱心網友回復:
這是基于 XSLT 的解決方案。
下面的 XSLT 遵循所謂的身份轉換模式。
單行模板洗掉不需要的<dict>元素:
<xsl:template match="dict[string='Calcification']"/>
如何在 Python 中使用 XSLT 轉換 XML 檔案?
XSLT
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" encoding="utf-8" indent="yes" omit-xml-declaration="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="dict[string='Calcification']"/>
</xsl:stylesheet>
uj5u.com熱心網友回復:
清單[Python.Docs]: xml.etree.ElementTree - ElementTree XML API。
我總是喜歡通過XPATH搜索節點,并指定(盡可能多)完整的節點。當然,缺點是如果XML結構發生變化,節點路徑(在代碼中)應該相應地進行調整。
此外,作為一般模式(不知道是否適用于此),永遠不要從您正在迭代的容器中洗掉元素。
我將您的源XML保存在file00.xml 中(還洗掉了最后一個(不匹配的)標簽(“</plist>”))。
代碼00.py:
#!/usr/bin/env python
import xml.etree.ElementTree as ET
import sys
def main(*argv):
xml_file_name = "./file00.xml"
tree = ET.parse(xml_file_name)
root = tree.getroot()
inner_array_nodes = root.findall("./array/dict/array") # XPATH
to_remove = []
for parent_node in inner_array_nodes:
for dict_node in parent_node:
string_nodes = dict_node.findall("string")
for string_node in string_nodes:
if string_node.text == "Calcification":
to_remove.append((parent_node, dict_node))
for parent, child in to_remove:
parent.remove(child)
print(b"".join(ET.tostringlist(root)).decode())
if __name__ == "__main__":
print("Python {:s} {:03d}bit on {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
64 if sys.maxsize > 0x100000000 else 32, sys.platform))
rc = main(*sys.argv[1:])
print("\nDone.")
sys.exit(rc)
輸出:
[cfati@CFATI-5510-0:e:\Work\Dev\StackOverflow\q070442605]> "e:\Work\Dev\VEnvs\py_pc064_03.08.07_test0\Scripts\python.exe" code00.py Python 3.8.7 (tags/v3.8.7:6503f05, Dec 21 2020, 17:59:51) [MSC v.1928 64 bit (AMD64)] 064bit on win32 <dict> <key>Images</key> <array> <dict> <key>ImageIndex</key> <integer>0</integer> <key>NumberOfROIs</key> <integer>42</integer> <key>ROIs</key> <array> <dict> <key>Area</key> <real>1.3665732145309448</real> <key>Center</key> <string>(0.000000, 0.000000, 0.000000)</string> <key>Dev</key> <real>66.487342834472656</real> <key>IndexInImage</key> <integer>36</integer> <key>Max</key> <real>1836</real> <key>Mean</key> <real>1583.29638671875</real> <key>Min</key> <real>1313</real> <key>Name</key> <string>Mass</string> <key>NumberOfPoints</key> <integer>89</integer> <key>Point_mm</key> <array> <string>(0.000000, 0.000000, 0.000000)</string> <string>(0.000000, 0.000000, 0.000000)</string> </array> <key>Point_px</key> <array> <string>(3196.290039, 1048.599976)</string> <string>(3203.560059, 1046.170044)</string> <string>(3211.330078, 1042.780029)</string> <string>(3189.500000, 1050.540039)</string> </array> <key>Total</key> <real>44457380</real> <key>Type</key> <integer>15</integer> </dict> </array> </dict> </array> </dict> Done.
uj5u.com熱心網友回復:
您的 XML 有一個額外的 plist 標簽。
即使您的代碼確實有效,您的代碼也只是嘗試洗掉其中包含“ Calcification ”文本的字串標簽,而不是您嘗試過的字典。
我在這里有一個可行的解決方案 - 也許不是最優化的代碼,但可以肯定我只是針對您的輸入進行了嘗試
import xml.etree.ElementTree as ET
tree = ET.parse("sample.xml")
root = tree.getroot()
dict_list = []
array = root.find("./array/dict/array")
for each_dict in array.iter('dict'):
for each_string in each_dict.iter('string'):
if each_string.text == "Calcification":
dict_list.append(each_dict)
for each_dict in dict_list:
array.remove(each_dict)
tree.write('sample3.xml')
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/389417.html
