是否有一個回圈將遍歷兄弟元素,如果它到達學生篩選之一(見下文)并且該學生沒有正在使用的標簽,則輸入 null/na ?
以下是我的 xml 檔案 [studentinfo.xml] 中的內容:
<?xml version="1.0" encoding="UTF-8"?>
<StudentBreakdown>
<Studentdata>
<StudentScreening>
<name>Sam Davies</name>
<age>15</age>
<hair>Black</hair>
<eyes>Blue</eyes>
<grade>10</grade>
<teacher>Draco Malfoy</teacher>
<dorm>Innovation Hall</dorm>
</StudentScreening>
<StudentScreening>
<name>Cassie Stone</name>
<age>14</age>
<hair>Science</hair>
<grade>9</grade>
<teacher>Luna Lovegood</teacher>
</StudentScreening>
<StudentScreening>
<name>Derek Brandon</name>
<age>17</age>
<eyes>green</eyes>
<teacher>Ron Weasley</teacher>
<dorm>Hogtie Manor</dorm>
</StudentScreening>
</Studentdata>
</StudentBreakdown>
我的代碼正在遍歷 studentinfo.xml 檔案,并pandas根據我將標簽映射到的列將資訊輸入到資料框(df1)中。
以下是我的代碼示例:
import pandas as pd
from bs4 import BeautifulSoup
with open('studentinfo.xml', 'r') as f:
file = f.read()
def parse_xml(file):
soup = BeautifulSoup(file, 'xml')
df1 = pd.DataFrame(columns=['StudentName', 'Age', 'Hair', 'Eyes', 'Grade', 'Teacher', 'Dorm'])
all_items = soup.find_all('info')
items_length = len(all_items)
for index, info in enumerate(all_items):
StudentName = info.find('<name>').text
Age = info.find('<age>').text
Hair = info.find('<hair>').text
Eyes = info.find('<eyes>').text
Grade = info.find('<grade>').text
Teacher = info.find('<teacher>').text
Dorm = info.find('<dorm>').text
row = {
'StudentName': StudentName,
'Age': Age,
'Hair': Hair,
'Eyes': Eyes,
'Grade': Grade,
'Teacher': Teacher,
'Dorm': Dorm
}
df1 = df1.append(row, ingore_index=True)
print(f'Appending row %s of %s' %(index 1, items_length))
return df1
當我嘗試運行代碼時,我收到此錯誤:'AttributeError:'NoneType' object has no attribute 'text'' 我之所以會收到此錯誤的結論是因為并非每個 StudentScreening 都使用相同的子標簽。
什么條件可以添加到我的代碼中:“當我回圈時,如果元素標簽不存在,請在資料框中輸入 null 并繼續列舉檔案”??????
uj5u.com熱心網友回復:
使用時pandas只需使用它的pandas.read_xml():
pd.read_xml(xml, xpath='.//StudentScreening')
例子
import pandas as pd
xml = '''
<StudentBreakdown>
<Studentdata>
<StudentScreening>
<name>Sam Davies</name>
<age>15</age>
<hair>Black</hair>
<eyes>Blue</eyes>
<grade>10</grade>
<teacher>Draco Malfoy</teacher>
<dorm>Innovation Hall</dorm>
</StudentScreening>
<StudentScreening>
<name>Cassie Stone</name>
<age>14</age>
<hair>Science</hair>
<grade>9</grade>
<teacher>Luna Lovegood</teacher>
</StudentScreening>
<StudentScreening>
<name>Derek Brandon</name>
<age>17</age>
<eyes>green</eyes>
<teacher>Ron Weasley</teacher>
<dorm>Hogtie Manor</dorm>
</StudentScreening>
</Studentdata>
</StudentBreakdown>'''
pd.read_xml(xml, xpath='.//StudentScreening')
輸出
| 姓名 | 年齡 | 頭發 | 眼睛 | 年級 | 老師 | 宿舍 | |
|---|---|---|---|---|---|---|---|
| 0 | 山姆戴維斯 | 15 | 黑色的 | 藍色的 | 10 | 德拉科馬爾福 | 創新館 |
| 1 | 卡西斯通 | 14 | 科學 | 9 | 盧娜洛夫古德 | ||
| 2 | 德里克·布蘭登 | 17 | 綠色 | 楠 | 羅恩韋斯萊 | 霍蒂莊園 |
uj5u.com熱心網友回復:
您可以遍歷您的 xml 檔案ElementTree以創建字典串列,然后將其轉換為資料框:
import pandas as pd
import xml.etree.ElementTree as ET
tree = ET.parse('studentinfo.xml')
root = tree.getroot()
arr = []
for student_screening in root.iterfind('.//StudentScreening'):
arr.append({el.tag: el.text for el in student_screening})
df = pd.DataFrame(arr)
print(df)
輸出:
name age hair eyes grade teacher dorm
0 Sam Davies 15 Black Blue 10 Draco Malfoy Innovation Hall
1 Cassie Stone 14 Science NaN 9 Luna Lovegood NaN
2 Derek Brandon 17 NaN green NaN Ron Weasley Hogtie Manor
uj5u.com熱心網友回復:
嘗試:
import pandas as pd
from bs4 import BeautifulSoup
html_doc = """\
<?xml version="1.0" encoding="UTF-8"?>
<StudentBreakdown>
<Studentdata>
<StudentScreening>
<name>Sam Davies</name>
<age>15</age>
<hair>Black</hair>
<eyes>Blue</eyes>
<grade>10</grade>
<teacher>Draco Malfoy</teacher>
<dorm>Innovation Hall</dorm>
</StudentScreening>
<StudentScreening>
<name>Cassie Stone</name>
<age>14</age>
<hair>Science</hair>
<grade>9</grade>
<teacher>Luna Lovegood</teacher>
</StudentScreening>
<StudentScreening>
<name>Derek Brandon</name>
<age>17</age>
<eyes>green</eyes>
<teacher>Ron Weasley</teacher>
<dorm>Hogtie Manor</dorm>
</StudentScreening>
</Studentdata>
</StudentBreakdown>"""
soup = BeautifulSoup(html_doc, "xml")
all_data = []
for s in soup.select("StudentScreening"):
all_data.append(
{
"name": s.find("name"),
"age": s.age,
"eyes": s.eyes,
"grade": s.grade,
"teacher": s.teacher,
"dorm": s.dorm,
}
)
df = pd.DataFrame(all_data).apply(lambda x: [v.text if v else "N/A" for v in x])
print(df)
印刷:
name age eyes grade teacher dorm
0 Sam Davies 15 Blue 10 Draco Malfoy Innovation Hall
1 Cassie Stone 14 N/A 9 Luna Lovegood N/A
2 Derek Brandon 17 green N/A Ron Weasley Hogtie Manor
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/521264.html
上一篇:單個引數的inheritdoc?
