您好,我正在嘗試弄清楚如何通過前綴匹配來合并資料框的行(并對列求和):
示例資料框:
set.seed(42) ## for sake of reproducibility
df <- data.frame(col1=c(sprintf("gene%s", 1:3), sprintf("protein%s", 1:5), sprintf("lipid%s", 1:3)),
counts=runif(11, min=10, max=70))
df
# col1 counts
# 1 gene1 64.88836
# 2 gene2 66.22452
# 3 gene3 27.16837
# 4 protein1 59.82686
# 5 protein2 48.50473
# 6 protein3 41.14576
# 7 protein4 54.19530
# 8 protein5 18.08000
# 9 lipid1 49.41954
# 10 lipid2 52.30389
# 11 lipid3 37.46451
所以我希望所有以“基因”開頭的行都合并為一行,并且與蛋白質和脂質行相同。
期望的輸出:
col1 counts
gene 158.2813
lipid 139.1879
protein 221.7526
uj5u.com熱心網友回復:
gsub數字消失,然后aggregate使用公式
aggregate(counts ~ gsub('\\d ', '', col1), df, sum)
# gsub("\\\\d", "", col1) counts
# 1 gene 158.2813
# 2 lipid 139.1879
# 3 protein 221.7526
或list符號。
with(df, aggregate(list(counts=counts), list(col1=gsub('\\d ', '', col1)), sum))
# col1 counts
# 1 gene 158.2813
# 2 lipid 139.1879
# 3 protein 221.7526
關于字串生成的旁注:您也可以使用paste0數字后綴。
paste0("gene", 1:3)
# [1] "gene1" "gene2" "gene3"
資料:
df <- structure(list(col1 = c("gene1", "gene2", "gene3", "protein1",
"protein2", "protein3", "protein4", "protein5", "lipid1", "lipid2",
"lipid3"), counts = c(64.8883626097813, 66.2245247978717, 27.1683720871806,
59.8268575640395, 48.5047311335802, 41.145756947808, 54.195298878476,
18.0799958342686, 49.4195374241099, 52.303887042217, 37.4645065749064
)), class = "data.frame", row.names = c(NA, -11L))
uj5u.com熱心網友回復:
df %>%
group_by(col1 = str_remove(col1, "\\d "))%>%
summarise(counts = sum(counts))
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/482645.html
