我正在嘗試合并保持唯一名稱的資料框串列(避免重復在所有資料框中重復的名稱)。我有 4 個資料框,每個資料框的行數不同,我需要將它們全部組合起來,這樣我才能得到參加所有測驗的參與者
我明白了,與納斯:

變數:
class1Time1 : took VarA and VarB tests in 2021
class2Time1 : took VarA and VarB tests in 2021
class1Time1 : took VarA and VarB tests in 2022
class2Time1 : took VarA and VarB tests in 2022
so, "Var_Year" stands for the grades of each subject in the target year, but note that
not every subject took all tests
這是我的資料的簡化版
### create a similar data frame:
names1 <- c("Mary","John","Kate", "Bea", "Harry", "Hermione", "Rony", "Dobby")
names2 <- c("Harry", "Hermione", "Rony", "Dobby", "Dumbledore", "Snape", "Sirius")
class1Time1 <- data.frame(ID = names1[3:8], VarA_21 = sample(1:20, 6), VarB_21 = sample(1:20, 6))
class2Time1 <- data.frame(ID = names2[1:7], VarA_21 = sample(1:20, 7), VarB_21 = sample(1:20, 7))
class1Time2 <- data.frame(ID = names1[2:8], VarA_22 = sample(1:20, 7), VarB_22 = sample(1:20, 7))
class2Time2 <- data.frame(ID = names2[1:4], VarA_22 = sample(1:20, 4), VarB_22 = sample(1:20, 4))
### So, the only students that took all tests were "Harry" "Hermione" "Rony" "Dobby"
### Ok, now I'm taking all dataframes from the environment and putting them into a list:
together <- grep("class",names(.GlobalEnv), value=TRUE)
##### put into a list
my_list <- do.call("list", mget(together))
### Now I need ONLY the same names from all dataframes
test <- Reduce(function(...) full_join(...), my_list) ### doens't work
### I've tried merge(), rbind(), etc...
注意:我試圖重現我的實際資料,但如果不更改參與者的真實姓名,我就無法做到這一點,這就是我制作虛構版本的原因,但我的實際資料看起來像這樣:

問題 1:如何加入所有資料框,以便每個參加所有測驗的參與者只有一行?
問題2:如果我能得到Q1,那么我相信一個簡單的filter事后只會保持完整。我想,對吧?
我在這里看到了很多解決方案,包括merge(), reduce, rbind, join (by = "ID),但似乎沒有一個對我有幫助(我都試過了)。提前致謝。
編輯:我想我更接近了test <- Reduce(function(...) merge(..., all = TRUE, by="ID"), my_list),但它沒有保留列的原始名稱,它現在復制了列
uj5u.com熱心網友回復:
您可以在此處使用內部連接,只要您指定merge()僅在ID
Reduce(\(a,b) merge(a,b, by="ID"),my_list)
輸出
ID VarA_21.x VarB_21.x VarA_22.x VarB_22.x VarA_21.y VarB_21.y VarA_22.y VarB_22.y
1 Dobby 8 5 19 9 7 13 3 3
2 Harry 14 1 4 16 12 4 20 9
3 Hermione 4 4 14 4 1 17 14 6
4 Rony 18 18 7 18 2 9 7 2
注意:Reduce(merge, my_list)默認情況下,將依次內部連接其中的每一個,但最終沒有匹配的結果,因為您不僅有共同的列名ID(但這些列中的“分數不同”)
uj5u.com熱心網友回復:
正如上面討論中提到的,很難知道您的資料可能會發生什么以洗掉所有行。但是,您可能可以做一些事情來幫助查明/解決問題:
- 目前,您的四個資料框具有重復的列名,并且在每個資料框中它不“知道”它屬于哪個類。這就是在上面的示例中列被重復的原因。通過在合并之前將“類”名稱插入資料框,整理資料會有所幫助。
- 旋轉以改變資料的形狀將成為
bind_rows一種簡單的方法來組合跨時間/測驗的所有可比較值。 - 首先加入所有可以讓您直觀地檢查預期人員是否存在部件,然后您可以過濾所有列。
purrr這是使用和dplyr操作串列和資料框的上述資料的作業:
library(tidyverse)
names1 <- c("Mary","John","Kate", "Bea", "Harry", "Hermione", "Rony", "Dobby")
names2 <- c("Harry", "Hermione", "Rony", "Dobby", "Dumbledore", "Snape", "Sirius")
class1Time1 <- data.frame(ID = names1[3:8], VarA_21 = sample(1:20, 6), VarB_21 = sample(1:20, 6))
class2Time1 <- data.frame(ID = names2[1:7], VarA_21 = sample(1:20, 7), VarB_21 = sample(1:20, 7))
class1Time2 <- data.frame(ID = names1[2:8], VarA_22 = sample(1:20, 7), VarB_22 = sample(1:20, 7))
class2Time2 <- data.frame(ID = names2[1:4], VarA_22 = sample(1:20, 4), VarB_22 = sample(1:20, 4))
together <- grep("class",names(.GlobalEnv), value=TRUE)
my_list <- do.call("list", mget(together))
my_list |>
imap( ~ mutate(.x, class = str_extract(.y, "class\\d"))) |>
map(
pivot_longer,
starts_with("Var"),
names_to = "test_time",
values_to = "score"
) |>
reduce(bind_rows) |>
pivot_wider(names_from = c(class, test_time),
values_from = score) |>
# This final line reduces down to only full rows. Can be cut out for checking.
filter(if_all(.fns = ~!is.na(.x)))
#> # A tibble: 4 × 9
#> ID class2_VarA_21 class2_VarB_21 class2_VarA_22 class2_VarB_22
#> <chr> <int> <int> <int> <int>
#> 1 Harry 13 13 8 4
#> 2 Hermione 4 6 9 2
#> 3 Rony 15 15 16 8
#> 4 Dobby 17 19 13 18
#> # … with 4 more variables: class1_VarA_21 <int>, class1_VarB_21 <int>,
#> # class1_VarA_22 <int>, class1_VarB_22 <int>
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/516142.html
