根據值是否存在于相應資料框的范圍內，向資料框添加一列-有解無憂

我有一個位置為 1..16569（基于 1）的檔案和一個帶有特征資訊的檔案，即；基因名稱等....我想根據 dataframe_positions 中的位置是否落入 dataframe_features$start 和 dataframe_features$end 指定的范圍來制作一個資料框。

我將更改值以節省空間。

df_positions = as.data.frame(
 chromosome = rep('MT', 10),
 positions = 1:10,
 depth = c(rep(6,3), rep(7,3), rep(8,2), rep(10,2),
 stringsAsFactors = F
)

df_features = as.data.frame(
 chromosome = rep('MT', 10),
 start = c(1,4),
 end = c(3,10),
 feature = c('TRNF', 'RNR1'),
 stringsAsFactors = F
)

這就是我希望資料之后的樣子

染色體	職位	深度	特征
公噸	1	6	TRNF
公噸	2	6	TRNF
公噸	3	6	TRNF
公噸	4	7	RNR1
公噸	5	7	RNR1
公噸	6	7	RNR1
公噸	7	8	RNR1
公噸	8	8	RNR1
公噸	9	10	RNR1
公噸	10	10	RNR1

這是我嘗試過的

x <- df_positions %>% mutate(feature = ifelse(between(df_positions$positions, df_features$start,df_features$end),df_features$feature, '')

這行不通。我認為 dplyr 函式不知道檢查每個元組。有沒有辦法在 R 中做到這一點？我正在研究 plyr::mapvalues ，然后可能接下來嘗試一個 for 回圈。

謝謝。

uj5u.com熱心網友回復：

df_positions <- data.frame(
  chromosome = rep('MT', 10),
  positions = 1:10,
  depth = c(rep(6, 3), rep(7, 3), rep(8, 2), rep(10, 2)),
  stringsAsFactors = FALSE
)

df_features <- data.frame(
  chromosome = rep('MT', 10),
  start = c(1, 4),
  end = c(3, 10),
  feature = c('TRNF', 'RNR1'),
  stringsAsFactors = FALSE
)

df_positions$feature <- apply(df_positions, 1, function(x) {
  idx <- which(df_features$chromosome == x[ 'chromosome' ] &
                 df_features$start <= as.integer(x[ 'positions' ]) & 
                 df_features$end >= as.integer(x[ 'positions' ]))
  df_features[ idx, 'feature' ][ 1 ]
})

View(df_positions)

uj5u.com熱心網友回復：

我使用回圈解決了這個問題，但如果有人有更多 R 解決方案，我會很高興看到它！

data <- c()
data_to_map <- df_positions %>% select(locus) %>% pull()

for(row in 1:nrow(df_features)){

 for(i in data_to_map){
   check <- df_features[row,]
   if(i <= check$end & i >= check$start){
     data <- c(data, check$name)
   }else{ next }
  }
}

df_positions$feature <- data

完畢。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/519824.html

標籤：r数据框循环dplyr

上一篇：字典，其中鍵是串列中元素的第一個字母，值是元素

下一篇：如何將多個表連接到一個中央表[SQl]