將XML檔案決議為CSV-NameError-有解無憂

我正在為特定元素和屬性決議具有數千個 XML 檔案的大型專案。我設法列印了我想要的所有元素和屬性，但我無法將它們寫入 CSV 表。如果我能在各自的標題下獲得每個元素/屬性的每次出現，那就太好了。問題是我得到“NameError：name 'X' is not defined”，我不知道如何重組，在我將它們移動到 CSV 之前，我的變數似乎一切正常。

from logging import root
import xml.etree.ElementTree as ET
import csv
import os
path = r'C:\Users\briefe\V'

f = open('jp-elements.csv', 'w', encoding="utf-8")
writer = csv.writer(f)
writer.writerow(["Note", "Supplied", "@Certainty", "@Source"])


    #opening files in folder for project
for filename in os.listdir(path):
        if filename.endswith(".xml"):
            fullpath = os.path.join(path, filename)
        #getting the root of each file as my starting point
        for file in fullpath:
            tree = ET.parse(fullpath)
            root = tree.getroot()
            try:
                for note in root.findall('.//note'):
                    notes = note.attrib, note.text
                for supplied in root.findall(".//supplied"):
                    print(supplied.attrib)
                    for suppliedChild in supplied.findall(".//*"):
                        supplies = suppliedChild.tag, suppliedChild.attrib
                #attribute search
                for responsibility in root.findall(".//*[@resp]"):
                    responsibilities = responsibility.tag, responsibility.attrib, responsibility.text
                for certainty in root.findall(".//*[@cert]"):
                    certainties = certainty.tag, certainty.attrib, certainty.text
                writer.writerow([notes, supplies, responsibilities, certainties])
            finally:
                f.close()

正如好心的建議，我正在嘗試保存如下所示的結果：

{http://www.tei-c.org/ns/1.0}add {'resp': '#MB', 'status': 'unremarkable'} Nach H gedruckt IV. Abt., V, Anhang Nr.
                     10.
{http://www.tei-c.org/ns/1.0}date {'cert': 'medium', 'when': '1805-04-09'} 9. April 1805

我正在嘗試將元組和字典項的這些混合作為字串保存到 csv 欄位中。但例如，我得到“NameError：名稱'notes'未定義”。

XML 代碼示例：

<?xml version="1.0" encoding="UTF-8"?><TEI xmlns="http://www.tei-c.org/ns/1.0" type="letter" xml:id="V_100">
   <teiHeader>
</teiHeader>
   <text>
      <body>
         <div type="letter">
            <note type="ig">Kopie</note>
            <p>Erlauben Sie mir, in Ihre Ehrenpforte noch einige Zwick<lb xml:id="V_39-7" rendition="#hyphen"/>steinchen einzuschieben. Philemon und Baucis müssen —
                  wenn<note corresp="#V_39-8">
                  <listPerson type="lineReference">
                     <person corresp="#JP-000228">
                        <persName>
                           <name cert="high" type="reg">Baucis</name>
                        </persName>
                     </person>
                     <person corresp="#JP-003214" ana="?">
                        <persName>
                           <name cert="low" type="reg">Philemon</name>
                        </persName>
                     </person>
                  </listPerson>
            <p>
               <hi rendition="#aq">Der Brief ist vielleicht nicht an den Minister Hardenberg
                  gerichtet,<lb/>
            </p>
            <lb/>
         </div>
      </body>
   </text>
</TEI>

uj5u.com熱心網友回復：

正如發布的那樣，XML 在根目錄下有一個默認命名空間，必須考慮到元素的每個命名參考，例如<note>. 因此，請考慮notes將適當分配的此調整。

nsmp = "http://www.tei-c.org/ns/1.0"

for note in root.findall(f'.//{{{nsmp}}}note'):
    notes = note.attrib, note.text

三重花括號是為了確保插入的字串值包含在花括號中，這也是 F 字串中使用的符號。請注意，您的代碼也會因supplies找不到而出錯。

但是，根據您的意見，請考慮一個動態解決方案，該解決方案不對任何元素名稱進行硬編碼，而是決議所有元素和屬性并將輸出展平為 CSV 格式。下面使用嵌套串列/字典決議來決議 XML 資料并遷移到 CSV，使用csv.DictWriter它將字典映射到 CSV 的欄位名稱。此外，下面使用背景關系管理器 ,with()來寫入文本并且不需要close()命令。

with open('Output.csv', 'w', newline='') as f:
    writer = csv.DictWriter(
        f, fieldnames=['element_or_attribute', 'text_or_value']
    )
  
    # MERGES DICTIONARIES OF ELEMENTS AND ATTRIBUTES
    # DICT KEYS REMOVE NAMESPACES AND CHECKS FOR NoneTypes
    # ATTRIBUTES ARE PREFIXED WITH PARENT ELEMENT NAME
    xml_dicts = [{
        **{el.tag.split('}')[1]:(
            el.text.strip() if el.text is not None else el.text
          )}, 
        **{(
            el.tag.split('}')[1] '_' k.split('}')[1] 
            if '}' in k 
            else el.tag.split('}')[1] '_' k):v 
           for k,v in el.attrib.items()}
    } for i, el in enumerate(root.findall(f'.//*'), start=1)]
    
    # COMBINES ABOVE DICTS INTO FLATTER FORMAT
    csv_dicts = [
        {'element_or_attribute': k, 'text_or_value':v} 
        for d in xml_dicts  
        for k, v in d.items()
    ]
    
    writer.writeheader()
    writer.writerows(csv_dicts)

上面應該集成到您的檔案回圈中，其中將一個XML 檔案處理為一個CSV。

CSV 輸出

element_or_attribute	文本或值
標題
文本
身體
div
div_type	信件
筆記	科皮
note_type	ig
p	"Erlauben Sie mir, in Ihre Ehrenpforte noch einige Zwick"
磅
lb_id	V_39-7
lb_rendition	#連字符
筆記
note_corresp	#V_39-8
串列人
listPerson_type	行參考
人
person_corresp	#JP-000228
persName
name	Baucis
name_cert	high
name_type	reg
person
person_corresp	#JP-003214
person_ana	?
persName
name	Philemon
name_cert	low
name_type	reg
p
hi	"Der Brief ist vielleicht nicht an den Minister Hardenberg\n gerichtet,"
hi_rendition	#aq
lb
lb

XML Input (corrected for reproducibility)

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" type="letter" xml:id="V_100">
   <teiHeader></teiHeader>
   <text>
      <body>
         <div type="letter">
            <note type="ig">Kopie</note>
            <p>Erlauben Sie mir, in Ihre Ehrenpforte noch einige Zwick<lb xml:id="V_39-7" rendition="#hyphen"/>steinchen einzuschieben. Philemon und Baucis müssen —
                  wenn<note corresp="#V_39-8"/>
                  <listPerson type="lineReference">
                     <person corresp="#JP-000228">
                        <persName>
                           <name cert="high" type="reg">Baucis</name>
                        </persName>
                     </person>
                     <person corresp="#JP-003214" ana="?">
                        <persName>
                           <name cert="low" type="reg">Philemon</name>
                        </persName>
                     </person>
                  </listPerson>
            </p>
            <p>
               <hi rendition="#aq">Der Brief ist vielleicht nicht an den Minister Hardenberg
                  gerichtet,<lb/></hi>
            </p>
            <lb/>
         </div>
      </body>
   </text>
</TEI>

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/445733.html

標籤：Python xml CSV 解析

上一篇：在熊貓中創建csv檔案后如何洗掉索引

下一篇：Python-創建一個csv檔案字典，它使用相同的鍵收集所有值，而不使用任何模塊