如何使用 lxml 從注釋行中洗掉多余的空格
我曾嘗試使用以下代碼評論必要的標簽:
tc.getparent().replace(tc,etree.Comment(etree.tostring(tc)))
print(etree.tostring(doc2).decode())
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>
-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>
-->
</Next_Item>
</Price>
</Item>
</List>
我已經嘗試過beautifulsoup,但評論中仍然有空格
soup = BeautifulSoup(open('XML1.xml', 'r'), 'xml')
for elem in soup.find_all():
if elem.string is not None:
elem.string = elem.string.strip()
所需的 XML 如下:
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
我的問題是標簽中有額外的新行:Necessary/> 和“-->”,其中“-->”將進入下一行。
任何幫助將不勝感激
uj5u.com熱心網友回復:
注釋后的“額外”新行屬于用作注釋文本的元素。所以這個字串已經包含了額外的withespace,包括下一個元素縮進
etree.tostring(ele)
保留該尾部文本并應用于評論解決了這個問題。
>>> doc = etree.parse('test.xml')
>>> for ele in doc.xpath('//Necessary'):
... t = ele.tail
... c = etree.Comment(etree.tostring(ele, with_tail=False))
... c.tail = t
... ele.getparent().replace(ele, c)
...
>>> print(etree.tostring(doc).decode())
結果
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
uj5u.com熱心網友回復:
您可以通過呼叫選擇所有評論Comment并將其替換為剝離版本:
for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
c.replace_with(Comment(c.strip()))
例子
from bs4 import BeautifulSoup
from bs4 import Comment
xml = '''
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>
-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>
-->
</Next_Item>
</Price>
</Item>
</List>
'''
soup = BeautifulSoup(xml, 'xml')
for c in soup.find_all(text=lambda text:isinstance(text, Comment)):
c.replace_with(Comment(c.strip()))
soup
輸出
<?xml version="1.0" encoding="utf-8"?>
<List>
<Item>
<Price>
<Amount>100</Amount>
<Next_Item>
<Name>Apple</Name>
<!--<Necessary/>-->
</Next_Item>
<Next_Item>
<Name>Orange</Name>
<!--<Necessary/>-->
</Next_Item>
</Price>
</Item>
</List>
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/517707.html
