我正在處理一個大型資料集,其中大部分資料都輸入了兩次。這意味著許多變數由成對的串列示:column.1一個人輸入的資料,以及另一個人輸入column.2的相同資料。我想創建一個簡單的“主”列column,它首先從 中提取column.1,如果column.1是NA,則從 中提取column.2。
這是我嘗試使用虛構資料執行的操作的示例:
mydata <- data.frame(name = c("Sarah","Ella","Carmen","Dinah","Billie"),
cheese.1 = c(1,4,NA,6,NA),
cheese.2 = c(1,4,3,5,NA),
milk.1 = c(NA,2,0,4,NA),
milk.2 = c(1,2,1,4,2),
tofu.1 = c("yum","yum",NA,"gross", NA),
tofu.2 = c("gross", "yum", "yum", NA, "gross"))
例如,下面的代碼顯示了我想要對一對列執行的操作的示例。
mydata %>% mutate(cheese = ifelse(is.na(cheese.1), cheese.2, cheese.1))
#OUTPUT:
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 cheese
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
但是,我想自動化該程序而不是手動完成每個程序。下面是我嘗試自動化該程序,使用col.list我要為其創建新“主”列的列對的串列 ( ):
col.list = c("cheese","milk","tofu")
lapply(col.list, FUN = function(x) {
v <- as.name({{x}})
v.1 <- as.name(paste0({{x}}, ".1"))
v.2 <- as.name(paste0(({{x}}), ".2"))
mydata %>% mutate(v = ifelse(is.na({{v.1}}), {{v.2}}, {{v.1}}))
})
#OUTPUT:
[[1]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
[[2]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 2
3 Carmen NA 3 0 1 <NA> yum 0
4 Dinah 6 5 4 4 gross <NA> 4
5 Billie NA NA NA 2 <NA> gross 2
[[3]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 v
1 Sarah 1 1 NA 1 yum gross yum
2 Ella 4 4 2 2 yum yum yum
3 Carmen NA 3 0 1 <NA> yum yum
4 Dinah 6 5 4 4 gross <NA> gross
5 Billie NA NA NA 2 <NA> gross gross
這種嘗試的問題是:
- 新列不正確命名的(他們應該被命名
cheese,milk并且tofu不是所有的呼叫v) - 新列不會添加到原始資料框中。我想要的是讓程式向我的資料框中添加一系列新的“主”列(在 中標識的每對列都有一個新列
col.list)。
uj5u.com熱心網友回復:
(1) 您必須包裝v到 curl-curly 運算子并使用:=:
library(dplyr)
col.list <- c("cheese","milk","tofu")
lapply(col.list, FUN = function(x) {
v <- as.name({{x}})
v.1 <- as.name(paste0({{x}}, ".1"))
v.2 <- as.name(paste0(({{x}}), ".2"))
mydata %>% mutate({{ v }} = ifelse(is.na({{v.1}}), {{v.2}}, {{v.1}}))
})
回傳
[[1]]
name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2 cheese
1 Sarah 1 1 NA 1 yum gross 1
2 Ella 4 4 2 2 yum yum 4
3 Carmen NA 3 0 1 <NA> yum 3
4 Dinah 6 5 4 4 gross <NA> 6
5 Billie NA NA NA 2 <NA> gross NA
[...]
這離您想要的輸出更近了一步。
(2) 但是為了得到你想要的輸出,我建議使用purrr:
library(purrr)
library(dplyr)
col.list %>%
map(~mydata %>%
select(name, starts_with(.x)) %>%
mutate({{ .x }} := ifelse(
is.na(!!sym(paste0(.x, ".1"))),
!!sym(paste0(.x, ".2")),
!!sym(paste0(.x, ".1"))
)
)
) %>%
reduce(left_join, by = "name")
這回傳
name cheese.1 cheese.2 cheese milk.1 milk.2 milk tofu.1 tofu.2 tofu
1 Sarah 1 1 1 NA 1 1 yum gross yum
2 Ella 4 4 4 2 2 2 yum yum yum
3 Carmen NA 3 3 0 1 0 <NA> yum yum
4 Dinah 6 5 6 4 4 4 gross <NA> gross
5 Billie NA NA NA NA 2 2 <NA> gross gross
uj5u.com熱心網友回復:
這是一個非常簡單和動態的選項。由于它使用tidyselect,如果有不止兩列(例如cheese.1、cheese.2、和 cheese.3),這仍然有效。如果列組不平衡(例如 3 個奶酪列,但只有 2 個牛奶列),這也將起作用:
library(purrr)
library(stringr)
library(rlang)
library(dplyr)
col.list <- c("cheese","milk","tofu")
express <- map(set_names(col.list), ~
str_glue("coalesce(!!!across(starts_with(\"{.x}\")))") %>%
parse_expr())
mydata %>%
mutate(!!! express, .keep = "unused")
輸出
其他列已被洗掉.keep = "unused"。如果要保留所有列,請洗掉該引數。
name cheese milk tofu
1 Sarah 1 1 yum
2 Ella 4 2 yum
3 Carmen 3 0 yum
4 Dinah 6 4 gross
5 Billie NA 2 gross
這個怎么運作
- 使用的
map和set_names非常重要,因為這將創建一個名為名單,這是大爆炸的重要!!!運營商以后。map創建一個命名的運算式串列。 - 使用
across和coalesce允許動態整齊選擇列。 - 該
!!!操作力接頭物件的串列和列的名稱是從建立使用串列的名稱map和set_names。
uj5u.com熱心網友回復:
這是我會做的一種方式。首先轉換為長格式,然后重新整形回寬格式但只有 2 個值列1和2
library(dplyr)
library(tidyr)
mydata <- data.frame(name = c("Sarah","Ella","Carmen","Dinah","Billie"),
cheese.1 = c(1,4,NA,6,NA),
cheese.2 = c(1,4,3,5,NA),
milk.1 = c(NA,2,0,4,NA),
milk.2 = c(1,2,1,4,2),
tofu.1 = c("yum","yum",NA,"gross", NA),
tofu.2 = c("gross", "yum", "yum", NA, "gross"))
mydata_long <- mydata %>%
mutate(across(where(is.numeric), as.character)) %>%
pivot_longer(-name,
names_to = c("food", "nr"),
names_sep = "\\.")
mydata_long
#> # A tibble: 30 x 4
#> name food nr value
#> <chr> <chr> <chr> <chr>
#> 1 Sarah cheese 1 1
#> 2 Sarah cheese 2 1
#> 3 Sarah milk 1 <NA>
#> 4 Sarah milk 2 1
#> 5 Sarah tofu 1 yum
#> 6 Sarah tofu 2 gross
#> 7 Ella cheese 1 4
#> 8 Ella cheese 2 4
#> 9 Ella milk 1 2
#> 10 Ella milk 2 2
#> # ... with 20 more rows
ifelse()轉換回不同的寬格式后應用函式
mydata_wide <- mydata_long %>%
pivot_wider(names_from = nr,
values_from = value) %>%
mutate(final_val = ifelse(is.na(`1`), `2`, `1`)) %>%
arrange(food)
mydata_wide
#> # A tibble: 15 x 5
#> name food `1` `2` final_val
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Sarah cheese 1 1 1
#> 2 Ella cheese 4 4 4
#> 3 Carmen cheese <NA> 3 3
#> 4 Dinah cheese 6 5 6
#> 5 Billie cheese <NA> <NA> <NA>
#> 6 Sarah milk <NA> 1 1
#> 7 Ella milk 2 2 2
#> 8 Carmen milk 0 1 0
#> 9 Dinah milk 4 4 4
#> 10 Billie milk <NA> 2 2
#> 11 Sarah tofu yum gross yum
#> 12 Ella tofu yum yum yum
#> 13 Carmen tofu <NA> yum yum
#> 14 Dinah tofu gross <NA> gross
#> 15 Billie tofu <NA> gross gross
mydata_wide2 <- mydata_wide %>%
pivot_wider(-c(`1`, `2`),
names_from = food,
values_from = final_val)
mydata_wide2
#> # A tibble: 5 x 4
#> name cheese milk tofu
#> <chr> <chr> <chr> <chr>
#> 1 Sarah 1 1 yum
#> 2 Ella 4 2 yum
#> 3 Carmen 3 0 yum
#> 4 Dinah 6 4 gross
#> 5 Billie <NA> 2 gross
由reprex 包(v2.0.1)于 2021 年 10 月 29 日創建
uj5u.com熱心網友回復:
我會在這里使用purrr::map_dfc和coalesce。看起來很簡單。
library(purrr)
library(dplyr)
library(stringr)
mydata %>% mutate(map2_dfc(select(., ends_with('1')),
select(., ends_with('2')),
~coalesce(.x, .y)))%>%
select(-ends_with('2'))%>%
rename_with(~str_remove(.x, '\\.\\d $'))
name cheese milk tofu
1 Sarah 1 1 yum
2 Ella 4 2 yum
3 Carmen 3 0 yum
4 Dinah 6 4 gross
5 Billie NA 2 gross
uj5u.com熱心網友回復:
以下是您完成任務的方法:
- 定義你的對(如果你有數百列,這可以自動化。
- 用于
imap_dfc應用coalesce定義的對 - 系結到原始數??據框
library(dplyr)
library(purrr)
pairs <- list(cheese = c(2, 3), milk = c(4, 5), tofu = c(6, 7))
imap_dfc(pairs, ~mydata[, .x] %>% transmute(!!.y := coalesce(!!!syms(names(mydata)[.x])))) %>%
bind_cols(mydata)
cheese milk tofu name cheese.1 cheese.2 milk.1 milk.2 tofu.1 tofu.2
1 1 1 yum Sarah 1 1 NA 1 yum gross
2 4 2 yum Ella 4 4 2 2 yum yum
3 3 0 yum Carmen NA 3 0 1 <NA> yum
4 6 4 gross Dinah 6 5 4 4 gross <NA>
5 NA 2 gross Billie NA NA NA 2 <NA> gross
uj5u.com熱心網友回復:
另一個 tidyverse 選項。這里的優點是它保留了原始資料型別,并且不會將所有內容都轉換為字符值。
library(tidyverse)
mydata %>%
pivot_longer(cols = -name,
names_pattern = '(.*)(\\..)',
names_to = c('.value', 'number')) %>%
group_by(name) %>%
mutate(across(-number, ~if_else(is.na(.[1]), .[2], .[1]))) %>%
ungroup() %>%
filter(number == '.1') %>%
select(-number)
這使
# A tibble: 5 x 4
name cheese milk tofu
<chr> <dbl> <dbl> <chr>
1 Sarah 1 1 yum
2 Ella 4 2 yum
3 Carmen 3 0 yum
4 Dinah 6 4 gross
5 Billie NA 2 gross
替代解決方案coalesce:
mydata %>%
pivot_longer(cols = -name,
names_pattern = '(.*)(\\..)',
names_to = c('.value', 'number')) %>%
group_by(name) %>%
mutate(across(-number, ~coalesce(.[1], .[2]))) %>%
ungroup() %>%
filter(number == '.1') %>%
select(-number)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/342953.html
