我有一個這樣的資料集:
| ICD_10 | 診斷 |
|---|---|
| A00 | 霍亂 |
| A01-A03 | 其他腸道傳染病 |
| A15 | 呼吸道結核 |
| A17-A19 | 其他結核病 |
...
在第 2 行和第 4 行,有多個 ICD-10 代碼,我想將它們擴展為多行,如下所示:
| ICD_10 | 診斷 |
|---|---|
| A00 | 霍亂 |
| A01 | 其他腸道傳染病 |
| A02 | 其他腸道傳染病 |
| A03 | 其他腸道傳染病 |
| A15 | 呼吸道結核 |
| A17 | 其他結核病 |
| A18 | 其他結核病 |
| A19 | 其他結核病 |
如何使用 tidyverse 在 R 中完成此操作?
謝謝你的幫助!
uj5u.com熱心網友回復:
fun <- function(vec) {
ltr <- substring(vec, 1, 1)
L <- lapply(strsplit(gsub("[^-0-9]", "", vec), "-"), as.integer)
mapply(function(ltr, z) sprintf("%si", ltr, if (length(z) > 1) seq(z[1], z[2]) else z),
ltr, L)
}
quux %>%
mutate(ICD_10 = fun(ICD_10)) %>%
tidyr::unnest(ICD_10)
# # A tibble: 8 x 2
# ICD_10 diagnosis
# <chr> <chr>
# 1 A00 Cholera
# 2 A01 Other Intestinal infectious diseases
# 3 A02 Other Intestinal infectious diseases
# 4 A03 Other Intestinal infectious diseases
# 5 A15 Respiratory tuberculosis
# 6 A17 Other tuberculosis
# 7 A18 Other tuberculosis
# 8 A19 Other tuberculosis
資料
quux <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))
uj5u.com熱心網友回復:
使用專用icd 包:
#data
d <- structure(list(ICD_10 = c("A00", "A01-A03", "A15", "A17-A19"), diagnosis = c("Cholera", "Other Intestinal infectious diseases", "Respiratory tuberculosis", "Other tuberculosis")), class = "data.frame", row.names = c(NA, -4L))
#remotes::install_github("jackwasey/icd")
library(icd)
為了避免在我們使用的范圍之間創建不存在或遺漏現有代碼expand_ranges。比如下面是33個代碼,如果我們依次填寫A01、A02、A03,則不是3,這是錯誤的。
expand_range("A01", "A03")
# [1] "A01" "A010" "A0100" "A0101" "A0102" "A0103" "A0104" "A0105"
# [9] "A0109" "A011" "A012" "A013" "A014" "A02" "A020" "A021"
# [17] "A022" "A0220" "A0221" "A0222" "A0223" "A0224" "A0225" "A0229"
# [25] "A028" "A029" "A03" "A030" "A031" "A032" "A033" "A038"
# [33] "A039"
我們還使用explain_code來描述新創建的代碼,示例用法:
explain_code("A01")
# [1] "Typhoid and paratyphoid fevers"
現在,將兩個函式合二為一,以獲得漂亮的輸出
# custom function using expand_range
f <- function(icd10, diagnosis){
x <- unlist(strsplit(icd10, "-"))
if(length(x) == 1){ ICD10 = x
} else {ICD10 = expand_range(x[1], x[2])}
data.frame(
icd10 = icd10,
diagnosis = diagnosis,
icd10range = ICD10,
desc = explain_code(ICD10))
}
并回圈遍歷代碼以展開,然后進行行系結:
# loop through rows, and rowbind
res <- do.call(rbind,
mapply(f, d$ICD_10, d$diagnosis,
SIMPLIFY = FALSE, USE.NAMES = FALSE))
head(res)
# icd10 diagnosis icd10range desc
# 1 A00 Cholera A00 Cholera
# 2 A01-A03 Other Intestinal infectious diseases A01 Typhoid and paratyphoid fevers
# 3 A01-A03 Other Intestinal infectious diseases A010 Typhoid fever
# 4 A01-A03 Other Intestinal infectious diseases A0100 Typhoid fever, unspecified
# 5 A01-A03 Other Intestinal infectious diseases A0101 Typhoid meningitis
# 6 A01-A03 Other Intestinal infectious diseases A0102 Typhoid fever with heart involvement
正如預期的那樣,A01-A03 現在擴展為 33 行:
table(res$icd10)
# A00 A01-A03 A15 A17-A19
# 1 33 1 53
uj5u.com熱心網友回復:
一種選擇:
tibble::tribble(
~ICD_10, ~diagnosis,
"A00", "Cholera",
"A01-A03", "Other Intestinal infectious diseases",
"A15", "Respiratory tuberculosis",
"A17-A19", "Other tuberculosis"
) |>
tidyr::separate_rows(ICD_10, sep = "-") |>
mutate(id = parse_number(ICD_10)) |>
group_by(diagnosis) |>
complete(id = min(id):max(id)) |>
mutate(ICD_10 = paste0("A", id))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/524535.html
