#使用資料的傳播來確定資料集中的(描述性)位置
代碼如下:
jobs_df <- jobs_df %>%
mutate(description = if_else(quan_value < 'q1' , "Lowest",
if_else(quan_value < 'q2', "Low",
if_else(quan_value < 'q3' , "Medium",
if_else(quan_value < 'q4' , "High",
if_else(quan_value < 'q5', "Highest", NA_character_))))))
其中資料框中每一行的“描述”應該是最低、低、中、高、最高和 q1、q2、q3、q4、q5 指的是“quan_value”列資料分布的五分位數
資料框如下(jobs_df):
jobs quan_value q1 q2 q3 q4 q5
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Banker 1.3 2 4 6 8 1
2 Accountant 2.4 2 4 6 8 1
3 Waiter 4.2 2 4 6 8 1
4 Barista 6.3 2 4 6 8 1
5 Train driver 9.1 2 4 6 8 1
“描述”是我想要的基于 if_else 陳述句的新列,但它主要只是回傳“中”作為結果
uj5u.com熱心網友回復:
每當我看到超過 2 個嵌套if_else(或ifelse或fifelse)時,我傾向于case_when:
jobs_df %>%
mutate(description = case_when(
quan_value < q1 ~ "Lowest",
quan_value < q2 ~ "Low",
quan_value < q3 ~ "Medium",
quan_value < q4 ~ "High",
quan_value < q5 ~ "Highest",
TRUE ~ NA_character_)
)
# jobs quan_value q1 q2 q3 q4 q5 description
# 1 Banker 1.3 2 4 6 8 1 Lowest
# 2 Accountant 2.4 2 4 6 8 1 Low
# 3 Waiter 4.2 2 4 6 8 1 Medium
# 4 Barista 6.3 2 4 6 8 1 High
# 5 Train driver 9.1 2 4 6 8 1 <NA>
更新:因為你說你的名字有點不標準,我將演示使用jobs_df2(我認為它更接近你的真實姓名)。值得注意的是,您需要用反引號將不合規的物件/列名稱包裝起來:
jobs_df2 %>%
mutate(description = case_when(
quan_value < `20%` ~ "Lowest",
quan_value < `40%` ~ "Low",
quan_value < `60%` ~ "Medium",
quan_value < `80%` ~ "High",
quan_value < `100%` ~ "Highest",
TRUE ~ NA_character_)
)
# jobs quan_value 20% 40% 60% 80% 100% description
# 1 Banker 1.3 2 4 6 8 1 Lowest
# 2 Accountant 2.4 2 4 6 8 1 Low
# 3 Waiter 4.2 2 4 6 8 1 Medium
# 4 Barista 6.3 2 4 6 8 1 High
# 5 Train driver 9.1 2 4 6 8 1 <NA>
資料
jobs_df <- structure(list(jobs = c("Banker", "Accountant", "Waiter", "Barista", "Train driver"), quan_value = c(1.3, 2.4, 4.2, 6.3, 9.1), q1 = c(2L, 2L, 2L, 2L, 2L), q2 = c(4L, 4L, 4L, 4L, 4L), q3 = c(6L, 6L, 6L, 6L, 6L), q4 = c(8L, 8L, 8L, 8L, 8L), q5 = c(1L, 1L, 1L, 1L, 1L)), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")
jobs_df2 <- structure(list(jobs = c("Banker", "Accountant", "Waiter", "Barista", "Train driver"), quan_value = c(1.3, 2.4, 4.2, 6.3, 9.1), "20%" = c(2L, 2L, 2L, 2L, 2L), "40%" = c(4L, 4L, 4L, 4L, 4L), "60%" = c(6L, 6L, 6L, 6L, 6L), "80%" = c(8L, 8L, 8L, 8L, 8L), "100%" = c(1L, 1L, 1L, 1L, 1L)), row.names = c("1", "2", "3", "4", "5"), class = "data.frame")
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/388780.html
