我有一個像這樣的資料框:
# generate data frame
df = as.data.frame(cbind(c('Chr1', 'Chr1', 'Chr1', 'Chr2', 'Chr2', 'Chr2', 'Chr3', 'Chr3', 'Chr4', 'Chr4', 'Chr5'),
c(121, 1567, 2489, 23, 565, 1789, 551, 1987, 25, 2356, 1111)))
colnames(df) = c('Chr', 'Pos')
df$Pos = as.numeric(df$Pos)
df
Chr Pos 1 Chr1 121 2 Chr1 1567 3 Chr1 2489 4 Chr2 23 5 Chr2 565 6 Chr2 1789 7 Chr3 551 8 Chr3 1987 9 Chr4 25 10 Chr4 2356 11 Chr5 1111
現在,我想根據列Pos的當前值更改列中的值。例如,如果Pos<= 1000 中的值,則應500在新列中分配,如果值 <= 2000 但 > 1000,則應1000在新列中分配,等等。
簡單的方法df看起來像這樣:
# alter dataframe
df$Pos = ifelse(df$Pos <= 1000, 500, df$Pos)
df$Pos = ifelse(df$Pos <= 2000 & df$Pos > 1000, 1500, df$Pos)
df$Pos = ifelse(df$Pos <= 3000 & df$Pos > 2000, 2500, df$Pos)
df
Chr Pos 1 Chr1 500 2 Chr1 1500 3 Chr1 2500 4 Chr2 500 5 Chr2 500 6 Chr2 1500 7 Chr3 500 8 Chr3 1500 9 Chr4 500 10 Chr4 2500 11 Chr5 1500
這會產生所需的輸出。但是,我的真實資料集要大得多,我無法為每個要重置的值范圍添加額外的條件。因此,我正在尋找更有效的解決方案。這是我嘗試更有效的解決方案:
# generate reference vectors
bin = seq(from = 1000, by = 1000, length.out = 3)
pos = seq(from = 500, by = 1000, length.out = 3)
# reset values
df$Pos = ifelse(df$Pos <= bin & df$Pos > bin-1000, pos, df$Pos)
df
但是,這會引發一條警告訊息:
Warning messages: 1: In df$Pos <= bin : longer object length is not a multiple of shorter object length 2: In df$Pos > bin - 1000 : longer object length is not a multiple of shorter object length
并且輸出看起來不對(一些值已被重置,其他值尚未重置):
> df Chr Pos 1 Chr1 500 2 Chr1 1500 3 Chr1 2500 4 Chr2 500 5 Chr2 565 6 Chr2 1789 7 Chr3 500 8 Chr3 1500 9 Chr4 25 10 Chr4 2356 11 Chr5 1500
我也嘗試用Map函式解決我的問題,但這也沒有用。請參閱下面的嘗試Map:
df2 = Map(function(bin, bin2, pos) {
df2 = ifelse(df$Pos <= 1000 & df$Pos > bin2, pos, df$Pos)
}, bin, bin-1000, pos)
df2
[[1]] [1] 500 1567 2489 500 500 1789 500 1987 500 2356 1111 [[2]] [1] 121 1567 2489 23 565 1789 551 1987 25 2356 1111 [[3]] [1] 121 1567 2489 23 565 1789 551 1987 25 2356 1111
我覺得我是從一個完全錯誤的角度來解決這個問題的。任何人都知道如何解決這段代碼?
uj5u.com熱心網友回復:
您可以借助cut或findInterval。
bin = c(0, seq(from = 1000, by = 1000, length.out = 3))
pos = seq(from = 500, by = 1000, length.out = 3)
df$new_value <- cut(df$Pos, bin, pos)
#cut returns factor output, to change to numbers use the below code
df$new_value <- as.numeric(as.character(df$new_value))
df
# Chr Pos new_value
#1 Chr1 121 500
#2 Chr1 1567 1500
#3 Chr1 2489 2500
#4 Chr2 23 500
#5 Chr2 565 500
#6 Chr2 1789 1500
#7 Chr3 551 500
#8 Chr3 1987 1500
#9 Chr4 25 500
#10 Chr4 2356 2500
#11 Chr5 1111 1500
為了清楚和解釋答案,我創建了一個新列,如果您想這樣做,new_value您可以替換原始列。Pos
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/537928.html
標籤:r数据框if语句重置
上一篇:如何使用ifelsec
