我制作了一個資料框來解釋我的問題,我的真實資料集要大得多。
gene <- c("a", "b", "c", "a", "b", "c", "a", "b", "c")
sample <- c("a", "a", "a", "b", "b", "b", "c", "c", "c")
expression <- c("5", "6", "8", "3", "5", "7", "7", "8", "9")
data.frame(gene, sample, expression)
gene sample expression
1 a a 5
2 b a 6
3 c a 8
4 a b 3
5 b b 5
6 c b 7
7 a c 7
8 b c 8
9 c c 9
和
gene2 <- c("a", "b", "c", "a", "b", "c", "a", "b", "c")
sample2 <- c("1", "1", "1", "2", "2", "2", "3", "3", "3")
expression2 <- c("5.4", "6.3", "8", "3.2", "5.4", "7.2", "7.1", "8.2", "9.4")
data.frame(gene2, sample2, expression2)
gene2 sample2 expression2
1 a 1 5.4
2 b 1 6.3
3 c 1 8
4 a 2 3.2
5 b 2 5.4
6 c 2 7.2
7 a 3 7.1
8 b 3 8.2
9 c 3 9.4
所以我有 2 個具有不同樣本識別符號的不同資料框。但是運算式資料(應該)是一樣的。我想要做的是為每個樣本找到最接近的匹配運算式值并報告相應的樣本識別符號。所以它可能看起來像這樣:
gene sample sample2 expression expression2
1 a a 1 5 5.4
2 b a 1 6 6.3
3 c a 1 8 8
4 a b 2 3 3.2
5 b b 2 5 5.4
6 c b 2 7 7.2
7 a c 3 7 7.1
8 b c 3 8 8.2
9 c c 3 9 9.4
我想也許是roll join但我有點迷失了
uj5u.com熱心網友回復:
您可以使用以下方式進行滾動連接data.table:
library(data.table)
setDT(df1)[, expression := as.numeric(expression)]
setDT(df2)[, ":="(sample = unique(df1$sample)[as.numeric(sample2)],
gene = gene2,
expression = as.numeric(expression2))]
df <- df2[df1, on = .(gene, sample, expression), roll = "nearest"][, gene2 := NULL][]
setcolorder(df, rev(seq_along(df)))
df
# gene expression sample expression2 sample2
# 1: a 5 a 5.4 1
# 2: b 6 a 6.3 1
# 3: c 8 a 8 1
# 4: a 3 b 3.2 2
# 5: b 5 b 5.4 2
# 6: c 7 b 7.2 2
# 7: a 7 c 7.1 3
# 8: b 8 c 8.2 3
# 9: c 9 c 9.4 3
uj5u.com熱心網友回復:
您可以使用split(比較基因)、outer(創建距離矩陣)和apply(對于每一行查找具有最小值的列)。使用mapply您可以將所有內容包裝在一起:
資料:
df1 <- data.frame(gene, sample, expression, stringsAsFactors = FALSE)
df2 <- data.frame(gene2, sample2, expression2, stringsAsFactors = FALSE)
df1$expression <- as.numeric(df1$expression)
df2$expression2 <- as.numeric(df2$expression2)
代碼:
do.call(
rbind,
mapply(
function(x, y){
j <- apply(
abs(outer(x$expression, y$expression2, FUN = "-")), 1, which.min
)
cbind(x, y[j,])
},
split(df1, df1$gene),
split(df2, df2$gene2),
SIMPLIFY = FALSE
)
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/521663.html
標籤:r
上一篇:在單個列中匯總多個二進制變數
