R：計算每列滿足條件并且行名稱出現在串列中的次數-有解無憂

我有一個包含計數資訊的資料框 (df1)

行名	樣本1	樣本2	樣本3
米1	0	5	1
平方米	1	7	5
立方米	6	2	0
米4	3	1	0

和第二個樣本資訊（df2）

行名	批	總數
樣本1	一種	10
樣本2	乙	15
樣本3	一種	6

我還有兩個包含有關 m 值資訊的串列（如果需要，可以很容易地將其轉換為另一個資料框，但我寧愿不添加計數資訊，因為它非常大）。不存在模式（例如偶數和奇數），我只是使用了一個非常簡單的例子

x <- c("m1", "m3") 和 y <- c("m2", "m4")

我想做的是在示例資訊中再添加兩列。這是每個樣本的每個 m 的計數，其值大于 5 并出現在串列 x 或 y 中

行名	批	總數	X	是
樣本1	一種	10	1	0
樣本2	乙	15	1	1
樣本3	一種	6	0	1

我目前的策略是為 x 和 y 制作一個值串列，然后將它們附加到 df2。到目前為止，這是我的嘗試：

numX <- colSums(df1[sum(rownames(df1)>10 %in% x),])并且numX <- colSums(df1[sum(rownames(df1)>10 %in% x),])都回傳一個 0 串列

numX <- colSums(df1[rownames(df1)>10 %in% x,]) 回傳滿足每列條件的計數值總和的串列

numX <- length(df1[rownames(df1)>10 %in% novel,]) 回傳滿足條件的次數（在本例中為 2L）

我不太確定如何解決這個問題，所以我一直在嘗試。我試過尋找答案，但也許我只是在努力尋找正確的措辭。

uj5u.com熱心網友回復：

如何使用 usingdplyr和reshape2::melt

df3 <- df1 %>%
  melt %>%
  filter(value >= 5) %>% 
  mutate(x = as.numeric(rownames %in% c("m1", "m3")),
         y = as.numeric(rownames %in% c("m2", "m4"))) %>%
  select(-rownames, - value) %>%
  group_by(variable) %>%
  summarise(x = sum(x), y = sum(y))

df2 %>% left_join(df3, by = c("rownames" = "variable"))

  rownames batch total_count x y
1  sample1     a          10 1 0
2  sample2     b          15 1 1
3  sample3     a           6 0 1

uj5u.com熱心網友回復：

您可以創建載體命名串列并為每個rownames計數多少個值x和y在各自sample的>= 5。

基本 R 選項 -

list_vec <- list(x = x, y = y)

cbind(df2, do.call(rbind, lapply(df2$rownames, function(x) 
  sapply(list_vec, function(y) {
    sum(df1[[x]][df1$rownames %in% y] >= 5)
}))))

#  rownames batch total.count x y
#1  sample1     a          10 1 0
#2  sample2     b          15 1 1
#3  sample3     a           6 0 1

使用tidyverse-

library(dplyr)
library(purrr)

list_vec <- lst(x, y)

df2 %>%
  bind_cols(map_df(df2$rownames, function(x) 
    map(list_vec, ~sum(df1[[x]][df1$rownames %in% .x] >= 5))))

uj5u.com熱心網友回復：

我們可以這樣做 rowwise

library(dplyr)
df2 %>% 
   rowwise %>%
    mutate(x =  (sum(df1[[rownames]][df1$rownames %in% x]) >= 5), 
           y =  (sum(df1[[rownames]][df1$rownames %in% y]) >= 5)) %>%
    ungroup

-輸出

# A tibble: 3 × 5
  rownames batch totalcount     x     y
  <chr>    <chr>      <int> <int> <int>
1 sample1  a             10     1     0
2 sample2  b             15     1     1
3 sample3  a              6     0     1

或者根據資料，一個base R選項是

out <- aggregate(. ~ grp, FUN = sum, 
     transform(df1,  grp = c('x', 'y')[1   (rownames %in% y)] )[-1])
df2[out$grp] <-  (t(out[-1]) >= 5)

-輸出

> df2
  rownames batch totalcount x y
1  sample1     a         10 1 0
2  sample2     b         15 1 1
3  sample3     a          6 0 1

資料

df1 <- structure(list(rownames = c("m1", "m2", "m3", "m4"), sample1 = c(0L, 
1L, 6L, 3L), sample2 = c(5L, 7L, 2L, 1L), sample3 = c(1L, 5L, 
0L, 0L)), class = "data.frame", row.names = c(NA, -4L))

df2 <- structure(list(rownames = c("sample1", "sample2", "sample3"), 
    batch = c("a", "b", "a"), totalcount = c(10L, 15L, 6L)), 
class = "data.frame", row.names = c(NA, 
-3L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/ruanti/325890.html

標籤：r 数据框子集

上一篇：根據列名洗掉PandasDataframe列

下一篇：根據列范圍的范圍值條件過濾DataFrames行