我有幾個格式如下的資料框。我想加入/合并資料幀species并kmers從所有資料幀中提取,以便輸出包含一列species和多列kmers,每個檔案一種形式。kmers然后該列將給出它源自的檔案的名稱。df1
reads taxReads kmers species
232 2323 23234 Bacteria
555 12 4545 Virus
df2
reads taxReads kmers species
12 23 56 Bacteria
932 1213 12 Virus
出去
species df1 df2
Bacteria 23234 56
Virus 4545 12
我曾嘗試使用 join_all 制作腳本,但它沒有選擇正確的列 ( kmers):
file_list = list.files(pattern="tsv$")
datalist = lapply(file_list, function(x){
dat = read.csv(file=x, header=T, sep = "\t")
names(dat)[2] = x
return(dat)
})
joined <- join_all(dfs = datalist,by = "species",type ="full" )
uj5u.com熱心網友回復:
我假設您已將檔案讀入到一個由檔案的基本名稱命名的框架串列中(洗掉擴展名)。將幀串列命名為dfs,我們有
dfs <- list(df1 = structure(list(reads = c(232L, 555L), taxReads = c(2323L, 12L), kmers = c(23234L, 4545L), species = c("Bacteria", "Virus")), class = "data.frame", row.names = c(NA, -2L)), df2 = structure(list(reads = c(12L, 932L), taxReads = c(23L, 1213L), kmers = c(56L,12L), species = c("Bacteria", "Virus")), class = "data.frame", row.names = c(NA, -2L)))
dfs
# $df1
# reads taxReads kmers species
# 1 232 2323 23234 Bacteria
# 2 555 12 4545 Virus
# $df2
# reads taxReads kmers species
# 1 12 23 56 Bacteria
# 2 932 1213 12 Virus
從這里開始,分兩步:
將
kmers列重命名為檔案名(無擴展名),并過濾掉不需要的列,dfs <- Map(function(x, nm) { names(x)[names(x) == "kmers"] <- nm; x[, c("species", nm)]; }, dfs, names(dfs)) dfs # $df1 # species df1 # 1 Bacteria 23234 # 2 Virus 4545 # $df2 # species df2 # 1 Bacteria 56 # 2 Virus 12用 減少
merge。Reduce(function(d1, d2) merge(d1, d2, by = "species", all = TRUE), dfs) # species df1 df2 # 1 Bacteria 23234 56 # 2 Virus 4545 12這可以在此處僅使用進行編碼
Reduce(merge, dfs),但我使用兩個引數 anon-func 將其分解,以便您可以控制 的某些merge選項。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/333108.html
