合并保持唯一名稱的資料框串列（避免重復名稱和NA）-有解無憂

我正在嘗試合并保持唯一名稱的資料框串列（避免重復在所有資料框中重復的名稱）。我有 4 個資料框，每個資料框的行數不同，我需要將它們全部組合起來，這樣我才能得到參加所有測驗的參與者

我明白了，與納斯：

合并保持唯一名稱的資料框串列（避免重復名稱和 NA）

變數：

class1Time1 : took VarA and VarB tests in 2021
class2Time1 : took VarA and VarB tests in 2021

class1Time1 : took VarA and VarB tests in 2022
class2Time1 : took VarA and VarB tests in 2022

so, "Var_Year" stands for the grades of each subject in the target year, but note that
not every subject took all tests

這是我的資料的簡化版

### create a similar data frame:

names1 <- c("Mary","John","Kate", "Bea", "Harry", "Hermione", "Rony", "Dobby")
names2 <- c("Harry", "Hermione", "Rony", "Dobby", "Dumbledore", "Snape", "Sirius")

class1Time1 <- data.frame(ID = names1[3:8], VarA_21 = sample(1:20, 6), VarB_21 = sample(1:20, 6))
class2Time1 <- data.frame(ID = names2[1:7], VarA_21 = sample(1:20, 7), VarB_21 = sample(1:20, 7))
class1Time2 <- data.frame(ID = names1[2:8], VarA_22 = sample(1:20, 7), VarB_22 = sample(1:20, 7))
class2Time2 <- data.frame(ID = names2[1:4], VarA_22 = sample(1:20, 4), VarB_22 = sample(1:20, 4))

### So, the only students that took all tests were "Harry" "Hermione" "Rony" "Dobby"  

### Ok, now I'm taking all dataframes from the environment and putting them into a list:

together <- grep("class",names(.GlobalEnv), value=TRUE)

##### put into a list 

my_list <- do.call("list", mget(together))

### Now I need ONLY the same names from all dataframes

test <- Reduce(function(...) full_join(...), my_list) ### doens't work

### I've tried merge(), rbind(), etc...

注意：我試圖重現我的實際資料，但如果不更改參與者的真實姓名，我就無法做到這一點，這就是我制作虛構版本的原因，但我的實際資料看起來像這樣：

合并保持唯一名稱的資料框串列（避免重復名稱和 NA）

問題 1：如何加入所有資料框，以便每個參加所有測驗的參與者只有一行？

問題2：如果我能得到Q1，那么我相信一個簡單的filter事后只會保持完整。我想，對吧？

我在這里看到了很多解決方案，包括merge(), reduce, rbind, join (by = "ID)，但似乎沒有一個對我有幫助（我都試過了）。提前致謝。

編輯：我想我更接近了test <- Reduce(function(...) merge(..., all = TRUE, by="ID"), my_list)，但它沒有保留列的原始名稱，它現在復制了列

uj5u.com熱心網友回復：

您可以在此處使用內部連接，只要您指定merge()僅在ID

Reduce(\(a,b) merge(a,b, by="ID"),my_list)

輸出

        ID VarA_21.x VarB_21.x VarA_22.x VarB_22.x VarA_21.y VarB_21.y VarA_22.y VarB_22.y
1    Dobby         8         5        19         9         7        13         3         3
2    Harry        14         1         4        16        12         4        20         9
3 Hermione         4         4        14         4         1        17        14         6
4     Rony        18        18         7        18         2         9         7         2

注意：Reduce(merge, my_list)默認情況下，將依次內部連接其中的每一個，但最終沒有匹配的結果，因為您不僅有共同的列名ID（但這些列中的“分數不同”）

uj5u.com熱心網友回復：

正如上面討論中提到的，很難知道您的資料可能會發生什么以洗掉所有行。但是，您可能可以做一些事情來幫助查明/解決問題：

目前，您的四個資料框具有重復的列名，并且在每個資料框中它不“知道”它屬于哪個類。這就是在上面的示例中列被重復的原因。通過在合并之前將“類”名稱插入資料框，整理資料會有所幫助。
旋轉以改變資料的形狀將成為bind_rows一種簡單的方法來組合跨時間/測驗的所有可比較值。
首先加入所有可以讓您直觀地檢查預期人員是否存在部件，然后您可以過濾所有列。

purrr這是使用和dplyr操作串列和資料框的上述資料的作業：

library(tidyverse)

names1 <- c("Mary","John","Kate", "Bea", "Harry", "Hermione", "Rony", "Dobby")
names2 <- c("Harry", "Hermione", "Rony", "Dobby", "Dumbledore", "Snape", "Sirius")

class1Time1 <- data.frame(ID = names1[3:8], VarA_21 = sample(1:20, 6), VarB_21 = sample(1:20, 6))
class2Time1 <- data.frame(ID = names2[1:7], VarA_21 = sample(1:20, 7), VarB_21 = sample(1:20, 7))
class1Time2 <- data.frame(ID = names1[2:8], VarA_22 = sample(1:20, 7), VarB_22 = sample(1:20, 7))
class2Time2 <- data.frame(ID = names2[1:4], VarA_22 = sample(1:20, 4), VarB_22 = sample(1:20, 4))

together <- grep("class",names(.GlobalEnv), value=TRUE)
my_list <- do.call("list", mget(together))

my_list |>
  imap( ~ mutate(.x, class = str_extract(.y, "class\\d"))) |>
  map(
    pivot_longer,
    starts_with("Var"),
    names_to = "test_time",
    values_to = "score"
  ) |>
  reduce(bind_rows) |>
  pivot_wider(names_from = c(class, test_time),
              values_from = score) |>
  # This final line reduces down to only full rows. Can be cut out for checking.
  filter(if_all(.fns = ~!is.na(.x)))

#> # A tibble: 4 × 9
#>   ID       class2_VarA_21 class2_VarB_21 class2_VarA_22 class2_VarB_22
#>   <chr>             <int>          <int>          <int>          <int>
#> 1 Harry                13             13              8              4
#> 2 Hermione              4              6              9              2
#> 3 Rony                 15             15             16              8
#> 4 Dobby                17             19             13             18
#> # … with 4 more variables: class1_VarA_21 <int>, class1_VarB_21 <int>,
#> #   class1_VarA_22 <int>, class1_VarB_22 <int>

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/516142.html

標籤：r数据框加入dplyrtidyverse

上一篇：如何在按id分組后根據列值計算行數

下一篇：如何計算在SQL連接中重復的資料計數？