如何從R中復雜的嵌套.json檔案中提取元素？-有解無憂

我的最終目標是從 US HealthData 網站讀取一系??列每日 .xlsx 檔案（名為“Community_Profile_Report_YYYYMMDD_Public.xlsx”）：
如何從 R 中復雜的嵌套 .json 檔案中提取元素？

我的問題是，如何提取關聯的 .xlsx 檔案的所有assetIDs 和 thefilenames并將它們放入串列或資料框中？

這是一個示例檔案的 URL（我需要的多個日期中的一個），以及我在讀入它后打算如何處理它。目錄路徑中“/files/”后面的字母數字短語對應于assetID.

url <- "https://healthdata.gov/api/views/gqxm-d9w9/files/3bf3ff9c-67d3-47af-aad3-799e21de6e3d?download=true&filename=Community Profile Report 20210903.xlsx"

df2 <- read.xlsx(url, sheet=6) %>% select(1,2,6,8,16,76,77) %>% .[-1, ]
colnames(df2) <- c("county","FIPS","state","pop","cases_last7","vaccinated","vacc_prop")
cols.name <- c("FIPS","pop","cases_last7","vaccinated","vacc_prop")
df2[cols.name] <- sapply(df2[cols.name], as.numeric)
df2 <- df2 %>% filter(complete.cases(.)) %>% mutate(cases_last7_100k = cases_last7/pop * 100000) %>%   
        mutate(vacc_prop = vacc_prop*100) %>% filter(state != "PR" & vacc_prop < 100)

謝謝，大衛

uj5u.com熱心網友回復：

一種方式是通過函式parse_json 從jsonlite包。

第一個元素的快速而丑陋的示例：

require(jsonlite)
require(magrittr)
dt <- read_json(path = 'https://healthdata.gov/resource/6hii-ae4f.json')

# as an example, consider only first item from the list 
dt2 <- dt[[1]]$metadata_published %>% parse_json

# str(dt2) to see the parsed object structure.
# we want to loop through the `attachments`
dt2$attachments %>% lapply(.,function(attachment){
  c(filename = attachment$filename, assetId = attachment$assetId)
}) %>% do.call(rbind,.) -> extracted_list

# filter for `xlsx` entries
subset(extracted_list, grepl('*.xlsx', extracted_list[,'filename']))

對于一般情況，我們可以將其包裝為一個函式

'extract_attachments' <- function(x) {
  require(jsonlite)
  require(magrittr)
  x$metadata_published %>% parse_json %>% {
    .$attachments
  } %>%
    lapply(., function(attachment) {
      c(filename = attachment$filename,
        assetId = attachment$assetId)
    }) %>% do.call(rbind, .) %>%
    subset(., grepl('*.xlsx', .[, 'filename']))
}

并遍歷整個json：

lapply(dt,extract_attachments) %>% do.call(rbind,.)

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/336890.html

標籤：r json

上一篇：更改一組堆疊條形圖中的軸標簽

下一篇：防止某些元素在for回圈中被`sample()`處理