我有像這樣的年齡列是虛擬編碼的。如何使用 dplyr 將這些列轉換為一列?
輸入:
age_0-10 age_11-20 age_21-30 age_31-40 age_41-50 age_51-60 gender
1 0 1 0 0 0 0 0
2 0 0 1 0 0 0 1
3 0 0 0 1 0 0 0
4 0 1 0 0 0 0 1
5 0 0 0 0 0 1 1
預期輸出:
age gender
1 11-20 0
2 21-30 1
3 31-40 0
4 11-20 1
5 51-60 1
uj5u.com熱心網友回復:
由于@Adam 的評論,現在有一個可能的解決方案names_prefix:
library(tidyverse)
df <- data.frame(
check.names = FALSE,
`age_0-10` = c(0L, 0L, 0L, 0L, 0L),
`age_11-20` = c(1L, 0L, 0L, 1L, 0L),
`age_21-30` = c(0L, 1L, 0L, 0L, 0L),
`age_31-40` = c(0L, 0L, 1L, 0L, 0L),
`age_41-50` = c(0L, 0L, 0L, 0L, 0L),
`age_51-60` = c(0L, 0L, 0L, 0L, 1L),
gender = c(0L, 1L, 0L, 1L, 1L)
)
df %>%
pivot_longer(col=starts_with("age"), names_to="age", names_prefix="age_") %>%
filter(value==1) %>%
select(age, gender, -value)
#> # A tibble: 5 × 2
#> age gender
#> <chr> <int>
#> 1 11-20 0
#> 2 21-30 1
#> 3 31-40 0
#> 4 11-20 1
#> 5 51-60 1
uj5u.com熱心網友回復:
這是dplyr使用c_across().
library(dplyr)
library(stringr)
df %>%
rowwise() %>%
mutate(age = str_remove(names(.)[which(c_across(starts_with("age")) == 1)], "^age_")) %>%
ungroup() %>%
select(age, gender)
# # A tibble: 5 x 2
# age gender
# <chr> <int>
# 1 11-20 0
# 2 21-30 1
# 3 31-40 0
# 4 11-20 1
# 5 51-60 1
uj5u.com熱心網友回復:
嘗試使用下面的基本 R 代碼 max.col
cbind(
age = head(names(df), -1)[max.col(df[-ncol(df)])],
df[ncol(df)]
)
這使
age gender
1 age_11-20 0
2 age_21-30 1
3 age_31-40 0
4 age_11-20 1
5 age_51-60 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/388785.html
上一篇:洗掉R中的單引號
下一篇:如何淘汰學生少于20人的學校?
