我在 R 中有下表
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
id medal
1 2021-2020-2018
2 NA
3 2019
4 2015-2014-2012
并希望將獎牌列分解為每個 id 的多個虛擬變數,如下所示:
id medal 2021 2020 2019 2018 2015 2014 2012
1 2021-2020-2018 1 1 0 1 0 0 0
2 NA 0 0 0 0 0 0 0
3 2019 0 0 1 0 0 0 0
4 2015-2014-2012 0 0 0 0 1 1 1
我將不勝感激您對此的幫助。
uj5u.com熱心網友回復:
qdapTools有一個函式可以做到這一點,你只需medal要先拆分列:
library(qdapTools)
df <- cbind(df, mtabulate(strsplit(df$medal, "-")))
df[, names(df) != "NA"]
uj5u.com熱心網友回復:
這是一個tidyverse解決方案:請注意,我已將"NA"輸入資料更改為NA_character_.
我們的想法是使用tidyr: separate,然后用轉動的資料gather和spread。我也轉換id為因子,使得id沒有獎牌的 s 留在輸出中。
library(tidyverse)
id <- c(1,2,3,4)
medal <- c("2021-2020-2018",NA_character_,"2019","2015-2014-2012")
df <- data.frame(id,medal)
df %>%
separate(medal, sep = "-", into = as.character(1:3), fill = "right") %>%
gather(dummy, year, -id) %>%
select(-dummy) %>%
mutate(val = 1, id = factor(id)) %>%
filter(!is.na(year)) %>%
spread(year, val, fill = 0, drop = FALSE)
id 2012 2014 2015 2018 2019 2020 2021
1 1 0 0 0 1 0 1 1
2 2 0 0 0 0 0 0 0
3 3 0 0 0 0 1 0 0
4 4 1 1 1 0 0 0 0
uj5u.com熱心網友回復:
或者:
library(tidyverse)
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
tmp <- unlist(str_split(df$medal, "-"))
tmp <- sort(tmp[tmp != "NA"])
tmp <- set_names(tmp, tmp)
df %>%
bind_cols(map_dfc(tmp, ~as.integer(str_detect(medal, .x))))
# id medal 2012 2014 2015 2018 2019 2020 2021
#1 1 2021-2020-2018 0 0 0 1 0 1 1
#2 2 NA 0 0 0 0 0 0 0
#3 3 2019 0 0 0 0 1 0 0
#4 4 2015-2014-2012 1 1 1 0 0 0 0
uj5u.com熱心網友回復:
或使用 unnest()
library(tidyverse)
id<-c(1,2,3,4)
medal<-c("2021-2020-2018","NA","2019","2015-2014-2012")
df<-data.frame(id,medal)
df %>% mutate(
medal_2=str_split(medal, "-")) %>%
unnest(medal_2) %>%
mutate(value=1) %>%
pivot_wider(
c("id", "medal"), names_from=medal_2, values_from = value
) %>%
replace(is.na(.), 0)
#> # A tibble: 4 x 10
#> id medal `2021` `2020` `2018` `NA` `2019` `2015` `2014` `2012`
#> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2021-2020-2018 1 1 1 0 0 0 0 0
#> 2 2 NA 0 0 0 1 0 0 0 0
#> 3 3 2019 0 0 0 0 1 0 0 0
#> 4 4 2015-2014-2012 0 0 0 0 0 1 1 1
由reprex 包( v2.0.0 )于 2021 年 10 月 21 日創建
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/331194.html
