我有一個包含許多不同植物物種名稱(列MTmatch)的資料集,其中一些重復出現。其中每一個都有一個列 ( ReadSum),其中包含與其相關的總和(以及許多其他資訊)。如何組合/聚合所有冗余植物物種并將ReadSum與每個物種相關聯的總和,同時保留非冗余行?
我想采用這樣的資料集,或者對其進行轉換,以便每個樣本具有組合行的聚合,或者至少有一個附加列顯示ReadSum組合冗余物種的列總和。抱歉,如果這令人困惑,我不知道如何問這個問題。
我一直在搞亂 dplyr,使用group_by()and summarise(),但這似乎是對整個專欄的總結,而不僅僅是新組。
structure(list(ESVID = c("ESV_000090", "ESV_000682", "ESV_000028",
"ESV_000030", "ESV_000010", "ESV_000182", "ESV_000040", "ESV_000135",
"ESV_000383"), S026401.R1 = c(0.222447727, 0, 0, 0, 0, 0, 0.029074432,
0, 0), S026404.R1 = c(0.022583349, 0, 0, 0, 0, 0, 0.016390389,
0.001257217, 0), S026406.R1 = c(0.360895503, 0, 0, 0.00814677,
0, 0, 0.01513888, 0, 0.00115466)), row.names = c(NA, -9L), class = "data.frame")
> dput(samp5[1:9])
structure(list(ESVID = c("ESV_000090", "ESV_000682", "ESV_000028",
"ESV_000030", "ESV_000010", "ESV_000182", "ESV_000040", "ESV_000135",
"ESV_000383"), S026401.R1 = c(0.222447727, 0, 0, 0, 0, 0, 0.029074432,
0, 0), S026404.R1 = c(0.022583349, 0, 0, 0, 0, 0, 0.016390389,
0.001257217, 0), S026406.R1 = c(0.360895503, 0, 0, 0.00814677,
0, 0, 0.01513888, 0, 0.00115466), S026409.R1 = c(0.221175955,
0, 0, 0, 0, 0, 0.005146173, 0, 0), S026412.R1 = c(0.026058888,
0, 0, 0, 0, 0, 0, 0, 0), MAX = c(0.400577608, 0.009933177, 0.124412855,
0.00814677, 0.009824944, 0.086475106, 0.154850408, 0.015593835,
0.008340888), ReadSum = c(3.54892343, 0.012059346, 0.203303936,
0.021075546, 0.009824944, 0.128007863, 0.859687787, 0.068159534,
0.050266853), SPECIES = c("Abies ", "Abies ", "Acer", "Alnus",
"Berberis", "Betula ", "Boykinia", "Boykinia", "Boykinia")), row.names = c(NA,
-9L), class = "data.frame")
uj5u.com熱心網友回復:
這些方法中的任何一個是否會產生您預期的結果?
資料:
df <- structure(list(ESVID = c("ESV_000090", "ESV_000682", "ESV_000028",
"ESV_000030", "ESV_000010", "ESV_000182", "ESV_000040", "ESV_000135",
"ESV_000383"), S026401.R1 = c(0.222447727, 0, 0, 0, 0, 0, 0.029074432,
0, 0), S026404.R1 = c(0.022583349, 0, 0, 0, 0, 0, 0.016390389,
0.001257217, 0), S026406.R1 = c(0.360895503, 0, 0, 0.00814677,
0, 0, 0.01513888, 0, 0.00115466), S026409.R1 = c(0.221175955,
0, 0, 0, 0, 0, 0.005146173, 0, 0), S026412.R1 = c(0.026058888,
0, 0, 0, 0, 0, 0, 0, 0), MAX = c(0.400577608, 0.009933177, 0.124412855,
0.00814677, 0.009824944, 0.086475106, 0.154850408, 0.015593835,
0.008340888), ReadSum = c(3.54892343, 0.012059346, 0.203303936,
0.021075546, 0.009824944, 0.128007863, 0.859687787, 0.068159534,
0.050266853), SPECIES = c("Abies ", "Abies ", "Acer", "Alnus",
"Berberis", "Betula ", "Boykinia", "Boykinia", "Boykinia")), row.names = c(NA,
-9L), class = "data.frame")
創建一個新列“combined_ReadSum”(第二列),它是每個“SPECIES”的“ReadSum”之和:
library(dplyr)
df %>%
group_by(SPECIES) %>%
summarise(combined_ReadSum = sum(ReadSum)) %>%
left_join(df, by = "SPECIES")
#> # A tibble: 9 × 10
#> SPECIES combi…1 ESVID S0264…2 S0264…3 S0264…? S0264…? S0264…? MAX ReadSum
#> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 "Abies " 3.56 ESV_… 0.222 0.0226 0.361 0.221 0.0261 0.401 3.55
#> 2 "Abies " 3.56 ESV_… 0 0 0 0 0 0.00993 0.0121
#> 3 "Acer" 0.203 ESV_… 0 0 0 0 0 0.124 0.203
#> 4 "Alnus" 0.0211 ESV_… 0 0 0.00815 0 0 0.00815 0.0211
#> 5 "Berber… 0.00982 ESV_… 0 0 0 0 0 0.00982 0.00982
#> 6 "Betula… 0.128 ESV_… 0 0 0 0 0 0.0865 0.128
#> 7 "Boykin… 0.978 ESV_… 0.0291 0.0164 0.0151 0.00515 0 0.155 0.860
#> 8 "Boykin… 0.978 ESV_… 0 0.00126 0 0 0 0.0156 0.0682
#> 9 "Boykin… 0.978 ESV_… 0 0 0.00115 0 0 0.00834 0.0503
#> # … with abbreviated variable names 1?combined_ReadSum, 2?S026401.R1,
#> # 3?S026404.R1, ??S026406.R1, ??S026409.R1, ??S026412.R1
或者,通過對每個獨特物種的值求和來匯總列:
library(dplyr)
df %>%
group_by(SPECIES) %>%
summarise(across(where(is.numeric), sum))
#> # A tibble: 6 × 8
#> SPECIES S026401.R1 S026404.R1 S026406.R1 S026409.R1 S0264…1 MAX ReadSum
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 "Abies " 0.222 0.0226 0.361 0.221 0.0261 0.411 3.56
#> 2 "Acer" 0 0 0 0 0 0.124 0.203
#> 3 "Alnus" 0 0 0.00815 0 0 0.00815 0.0211
#> 4 "Berberis" 0 0 0 0 0 0.00982 0.00982
#> 5 "Betula " 0 0 0 0 0 0.0865 0.128
#> 6 "Boykinia" 0.0291 0.0176 0.0163 0.00515 0 0.179 0.978
#> # … with abbreviated variable name 1?S026412.R1
由reprex 包于 2022-10-28 創建(v2.0.1)
uj5u.com熱心網友回復:
謝謝賈里德mamrot!像魅力一樣作業。兩者都有幫助,但我將使用第二個建議來重新排序一些資料。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/522289.html
上一篇:反應背景關系的生命周期是多少
