如何找到跨行的平均值，按第一行值分組？-有解無憂

       S1   S2  S3  S4
Cohort  1    2   1   1
G1     23   44  67  13
G2     11   78  88  30
G3     45   46  56  66
G4     67   77  22  45

這是我正在使用的演示資料集，其中 S1、S2... 是樣本，群組是群組變數，即 1 或 2，而 G1、G2... 是基因。這些值是運算式值。

我想在佇列 1 和佇列 2 中找到平均表達。

我嘗試使用 if 陳述句，if(data$cohort ==1)但它給了我一個錯誤：條件的長度 > 1 有沒有簡單的方法可以解決這個問題？

uj5u.com熱心網友回復：

資料框是圍繞列而不是行構建的。我首先將資料整理成基于列的長格式：

library(tidyr)
library(dplyr)
library(tibble)
df = t(data) |> 
  as.data.frame() |> 
  rownames_to_column(var = "sample") |>
  pivot_longer(cols = starts_with("G"), names_to = "gene", values_to = "expression")
df
# # A tibble: 16 × 4
#    sample Cohort gene  expression
#    <chr>   <int> <chr>      <int>
#  1 S1          1 G1            23
#  2 S1          1 G2            11
#  3 S1          1 G3            45
#  4 S1          1 G4            67
#  5 S2          2 G1            44
#  6 S2          2 G2            78
#  7 S2          2 G3            46
#  8 S2          2 G4            77
#  9 S3          1 G1            67
# 10 S3          1 G2            88
# ...

現在我們有一個清晰的分組列和一個值列，我們可以使用FAQ 中關于按組計算均值的任何方法。這是dplyr方法：

df |>
  group_by(Cohort) %>%
  summarize(mean_ex = mean(expression))
# # A tibble: 2 × 2
#   Cohort mean_ex
#    <int>   <dbl>
# 1      1    44.4
# 2      2    61.2

（group_by(Cohort, gene)如果你想將這兩者分組的平均值......你可以在你的問題中不清楚你想要的輸出是什么。）

使用此示例資料：

data = read.table(text = '       S1   S2  S3  S4
Cohort  1    2   1   1
G1     23   44  67  13
G2     11   78  88  30
G3     45   46  56  66
G4     67   77  22  45', header = T)

uj5u.com熱心網友回復：

轉置您的資料，然后分組Cohort并匯總dplyr::across()所有基因列：

library(dplyr)

data %>%
  t() %>%
  as.data.frame() %>%
  group_by(Cohort) %>%
  summarize(across(G1:G4, mean))

# A tibble: 2 × 5
  Cohort    G1    G2    G3    G4
   <dbl> <dbl> <dbl> <dbl> <dbl>
1      1  34.3    43  55.7  44.7
2      2  44      78  46    77

uj5u.com熱心網友回復：

這是另一種可能性：

  
df %>% pivot_longer(-Cohort) %>% 
  nest(data = -Cohort) %>% 
  mutate(mean = map(data, ~mean(.$value))) %>% 
  unnest(mean)
#> # A tibble: 2 × 3
#>   Cohort data               mean
#>    <int> <list>            <dbl>
#> 1      1 <tibble [12 × 2]>  44.4
#> 2      2 <tibble [4 × 2]>   61.2

資料：

df <- read.table(text = "
       S1   S2  S3  S4
Cohort  1    2   1   1
G1     23   44  67  13
G2     11   78  88  30
G3     45   46  56  66
G4     67   77  22  45", header =T) %>% 
  t() %>% 
  as.data.frame()

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/534796.html

標籤：r生物信息学

上一篇：如何使用ggplot根據R中df的顏色列填充堆積條形圖的顏色

下一篇：abline()沒有出現在情節中