我有一個關于論文的元資料的 XML 檔案,我試圖將作者姓名作為單個字串獲取。XML 中的名稱如下所示:
<DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
所有名字都有名字和姓氏,但只有一些名字有中間名和/或后綴。這是我的代碼:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle')
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix')
if author_mname is not None and author_suffix is not None:
author_name = author_surname ', ' author_fname author_mname.text ', ' author_suffix.text
if author_mname is not None and author_suffix is None:
author_name = author_surname ', ' author_fname author_mname.text
if author_mname is None and author_suffix is None:
author_name = author_surname ', ' author_fname
為什么我會得到這個輸出,我該如何解決?
Traceback (most recent call last):
File "C:\Users\bpclark2\pythonProject3\prqXML-to-dcCSV.py", line 185, in <module>
author_name = author_surname ', ' author_fname author_mname.text author_suffix.text
TypeError: can only concatenate str (not "NoneType") to str
修改后的代碼:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname ', ' author_fname ' ' str(author_mname.strip().title()) str(', ' author_suffix.strip().title())
row.append(author_name)
這得到了我正在尋找的輸出:
author_surname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_surname').text.strip().title()
author_fname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_fname').text.strip().title()
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle').text or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix').text or ''
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title() ', ' author_suffix.strip().title()
if author_mname != '' and author_suffix != '':
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title() ', ' author_suffix.strip().title()
row.append(author_name)
if author_mname != '' and author_suffix == '':
author_name = author_surname ', ' author_fname ' ' author_mname.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix != '':
author_name = author_surname ', ' author_fname ', ' author_suffix.strip().title()
row.append(author_name)
if author_mname == '' and author_suffix == '':
author_name = author_surname ', ' author_fname
row.append(author_name)
uj5u.com熱心網友回復:
把你的代碼改成這樣怎么樣:
author_mname = record.find('DISS_authorship/DISS_author/DISS_name/DISS_middle') or ''
author_suffix = record.find('DISS_authorship/DISS_author/DISS_name/DISS_suffix') or ''
你也可以添加str像這樣的演員表:
... str(author_suffix.text)
如果您使用的是新 Python,請使用 f-strings!有了他們,生活要輕松得多。
uj5u.com熱心網友回復:
我會通過對代碼進行少量編輯來保持一切簡單。您可以使用 XPath.//DISS_name查找所有<DISS_name>節點,然后將其解壓縮為具有相應名稱的單獨變數。代碼:
import xml.etree.ElementTree as ET
data = """\
<DISS_authorship>
<DISS_author>
<DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
</DISS_author>
</DISS_authorship>"""
root = ET.fromstring(data)
row = []
for name_node in root.iterfind(".//DISS_name"):
surname, fname, middle, suffix = name_node # 4 child nodes in this order
name_str = surname.text ", " fname.text
if middle.text:
name_str = " " middle.text
if suffix.text:
name_str = ", " suffix.text
row.append(name_str)
或者更短:
import xml.etree.ElementTree as ET
data = ...
root = ET.fromstring(data)
row = []
for (surname, fname, middle, suffix) in root.iterfind(".//DISS_name"):
name_str = surname.text ", " fname.text
if middle.text:
name_str = " " middle.text
if suffix.text:
name_str = ", " suffix.text
row.append(name_str)
uj5u.com熱心網友回復:
下面是一個簡短的概念
import xml.etree.ElementTree as ET
xml = '''<r><DISS_name>
<DISS_surname>Clark</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle/>
<DISS_suffix/>
</DISS_name>
<DISS_name>
<DISS_surname>Jack</DISS_surname>
<DISS_fname>Brian</DISS_fname>
<DISS_middle>Smith</DISS_middle>
<DISS_suffix/>
</DISS_name>
</r>'''
root = ET.fromstring(xml)
for name in root.findall('.//DISS_name'):
parts = [name.find(f'DISS_{f}').text for f in ['surname','fname','middle','suffix'] if name.find(f'DISS_{f}').text is not None ]
print(", ".join(parts))
輸出
Clark, Brian
Jack, Brian, Smith
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/363082.html
下一篇:XSL風格:在XSLT中使用<xsl:for-eachselect>或<xsl:templatematch>或其他解決方案?
