我有一個包含多種疾病的資料集,0 表示沒有疾病,1 表示有疾病。
舉個例子來說明:我對疾病 A 以及資料集中的人是否有這種疾病本身或作為另一種疾病的原因感興趣。因此,我想創建一個新變數“Type”,其值為“NotDiseasedWithA”、“Primary”和“Secondary”。可能導致 A 的疾病包含在向量“SecondaryCauses”中:
SecondaryCauses = c("DiseaseB", "DiseaseD")
“NotDiseasedWithA”表示他們沒有疾病 A。“主要”表示他們患有疾病 A,但沒有任何可能導致該疾病的已知疾病。“次要”意味著他們患有疾病 A 和可能導致該疾病的疾病。
樣本資料
ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE
1 0 1 0 0 0
2 1 0 0 0 1
3 1 0 1 1 0
4 1 0 1 1 1
5 0 0 0 0 0
我的問題是:
- 如何選擇我感興趣的列?我有 20 多列未排序。因此我創建了向量。
- 如何根據我感興趣的疾病的內容創建條件?
我嘗試了類似以下的方法,但這不起作用:
DF %>% mutate(Type = ifelse(DiseaseA == 0, "NotDiseasedWithA", ifelse(sum(names(DF) %in% SecondaryCauses) > 0, "Secondary", "Primary")))
所以最后我想得到這個結果:
ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE Type
1 0 1 0 0 0 NotDiseasedWithA
2 1 0 0 0 1 Primary
3 1 0 1 1 0 Secondary
4 1 0 1 1 1 Secondary
5 0 0 0 0 0 NotDiseasedWithA
uj5u.com熱心網友回復:
使用資料表
df <- structure(list(ID = 1:5, DiseaseA = c(0L, 1L, 1L, 1L, 0L), DiseaseB = c(1L,
0L, 0L, 0L, 0L), DiseaseC = c(0L, 0L, 1L, 1L, 0L), DiseaseD = c(0L,
0L, 1L, 1L, 0L), DiseaseE = c(0L, 1L, 0L, 1L, 0L)), row.names = c(NA,
-5L), class = c("data.frame"))
library(data.table)
setDT(df) # make it a data.table
SecondaryCauses = c("DiseaseB", "DiseaseD")
df[DiseaseA == 0, Type := "NotDiseasedWithA"][DiseaseA == 1, Type := ifelse(rowSums(.SD) > 0, "Secondary", "Primary"), .SDcols = SecondaryCauses]
df
# ID DiseaseA DiseaseB DiseaseC DiseaseD DiseaseE Type
# 1: 1 0 1 0 0 0 NotDiseasedWithA
# 2: 2 1 0 0 0 1 Primary
# 3: 3 1 0 1 1 0 Secondary
# 4: 4 1 0 1 1 1 Secondary
# 5: 5 0 0 0 0 0 NotDiseasedWithA
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/427557.html
上一篇:在C 中創建具有條件的矩陣
