使用R計算比率矩陣-有解無憂

我想知道是否有一種簡單的方法可以計算資料框中每個元素的比率矩陣。例子 -

gene sample1 sample2 sample3 sample4 .....
aa     2       2       3      2
aa     1       5       2      1
aa     4       1       2      3
bb     1       2       1      2
bb     2       1       1      2

I 是從 sample1 到 sample4 的每個元素的比率，計算每列基因中的常見行值。計算是這樣的——

gene sample1 sample2 sample3 sample4 .....
aa     2/7     2/8     3/7      2/6
aa     1/7     5/8     2/7      1/6
aa     4/7     1/8     2/7      3/6
bb     1/3     2/3     1/2      2/4
bb     2/3     1/3     1/2      2/4

結果會是這樣——

gene  sample1  sample2  sample3  sample4 .....
aa     .28       .25       .42      .33
aa     .14       .62       .28      .16
aa     .57       .12       .28      .5
bb     .33       .66       .5       .5
bb     .66       .33       .5       .5

我在回圈中嘗試過的是 -

tf <- dd %>%
        group_by(symbol) %>%
        summarise_if(is.numeric, mean)

但這總結但不計算每個元素并保持初始資料幀的相同矩陣維度（例如，這里的 dd）。任何建議將不勝感激。

uj5u.com熱心網友回復：

你可以做：

library(dplyr)

dat %>%
  group_by(gene) %>%
  mutate(across(everything(), proportions)) %>% 
  ungroup()

# A tibble: 5 x 5
  gene  sample1 sample2 sample3 sample4
  <chr>   <dbl>   <dbl>   <dbl>   <dbl>
1 aa      0.286   0.25    0.429   0.333
2 aa      0.143   0.625   0.286   0.167
3 aa      0.571   0.125   0.286   0.5  
4 bb      0.333   0.667   0.5     0.5  
5 bb      0.667   0.333   0.5     0.5

如果您有想要忽略的缺失值，請使用：

dat %>%
  group_by(gene) %>%
  mutate(across(everything(),  ~ .x / sum(.x, na.rm = TRUE)))

資料：

dat <- structure(list(gene = c("aa", "aa", "aa", "bb", "bb"), sample1 = c(2, 
1, 4, 1, 2), sample2 = c(2, 5, 1, 2, 1), sample3 = c(3, 2, 2, 
1, 1), sample4 = c(2, 1, 3, 2, 2)), class = "data.frame", row.names = c(NA, 
-5L))

uj5u.com熱心網友回復：

這是一個選項data.table

> library(data.table)

> setDT(df)[,lapply(.SD,proportions),gene]
   gene   sample1   sample2   sample3   sample4
1:   aa 0.2857143 0.2500000 0.4285714 0.3333333
2:   aa 0.1428571 0.6250000 0.2857143 0.1666667
3:   aa 0.5714286 0.1250000 0.2857143 0.5000000
4:   bb 0.3333333 0.6666667 0.5000000 0.5000000
5:   bb 0.6666667 0.3333333 0.5000000 0.5000000

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/455759.html

標籤：r 数据框通过...分组

上一篇：根據其他列的條件重命名列（Rstudio）

下一篇：使用串列展平資料框中的JSON列