我使用clinicaltrials.gov 的API 將臨床試驗資料串列獲取到XML 檔案中,然后決議資料以最終匯出到Excel 資料集。
在我的代碼中提供的 URL 中,有 9 個結果,但是我的代碼僅提取了 5/9 的資料。我意識到這是因為對于其中一個領域 ( detaildescription ),只有一些試驗有這些資料。當我洗掉detaileddescription,只是使用了兩個欄位(nctid和briefdescription),我能得到9/9。除了做一些像創建單獨的資料框以進行詳細描述和合并之類的混亂之外,我還能在這里做什么?
底線:我提取,其中包括9次臨床試驗中的XML檔案3個欄位:nctid,briefsummary和detaileddescription,但我的輸出只抽取5/9的臨床試驗。我的輸出如何在不從我的輸出中取出詳細描述欄位的情況下獲得所有 9/9 ?
import requests
from bs4 import BeautifulSoup
import pandas as pd
out = []
url = 'https://clinicaltrials.gov/api/query/full_studies?expr=diabetes telehealth peer support& AREA[StartDate] EXPAND[Term] RANGE[01/01/2020, 09/01/2020]&min_rnk=1&max_rnk=50&fmt=xml'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
nctids = soup.find_all("field", {"name" : "NCTId"})
briefsummaries = soup.find_all("field", {"name" : "BriefSummary"}) if soup.find_all("field", {"name" : "BriefSummary"}) is not None else 'nothing'
detaileddescriptions = soup.find_all("field", {"name" : "DetailedDescription"}) if soup.find_all("field", {"name" : "DetailedDescription"}) is not None else 'nothing'
for nctid, briefsummary, detaileddescription in zip(nctids, briefsummaries, detaileddescriptions):
data = {'nctid': nctid, 'briefsummary': briefsummary, 'detaileddescription': detaileddescription}
out.append(data)
df = pd.DataFrame(out)
df.to_excel('clinicaltrialstresults.xlsx')
uj5u.com熱心網友回復:
您可以嘗試在對代碼稍作更改的情況下遍歷學習串列
import requests
from bs4 import BeautifulSoup
import pandas as pd
out = []
url = 'https://clinicaltrials.gov/api/query/full_studies?expr=diabetes telehealth peer support& AREA[StartDate] EXPAND[Term] RANGE[01/01/2020, 09/01/2020]&min_rnk=1&max_rnk=50&fmt=xml'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
study_list = soup.find_all("fullstudy")
for study in study_list:
nctid = study.find("field", {"name" : "NCTId"})
briefsummary = study.find("field", {"name" : "BriefSummary"}) if study.find("field", {"name" : "BriefSummary"}) is not None else 'nothing'
detaileddescription = study.find("field", {"name" : "DetailedDescription"}) if study.find("field", {"name" : "DetailedDescription"}) is not None else 'nothing'
data = {'nctid': nctid, 'briefsummary': briefsummary, 'detaileddescription': detaileddescription}
out.append(data)
df = pd.DataFrame(out)
df.to_excel('clinicaltrialstresults.xlsx', index=False)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/363085.html
