我正在嘗試撰寫一個函式,該函式接受一個向量并根據幾個步驟對其進行子集化:
- 丟棄任何不需要的值
- 洗掉重復項。
- 在考慮步驟 (1) 和 (2) 后回傳原始向量的索引。
例如,提供以下輸入向量:
vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
和
throw_away_val <- "cat"
我希望我的函式get_indexes(x = vec_animals, y = throw_away_val)回傳:
# [1] 1 6 # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")
另一個例子
vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003
回傳:
# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).
我最初的嘗試
以下函式回傳索引但不考慮重復項
get_index <- function(x, throw_away) {
which(x != throw_away)
}
然后回傳原始索引,vec_animals例如:
get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7
如果我們使用此輸出進行子集化,vec_animal我們將得到:
vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog" "dog" "dog" "dog" "dolphin" "dolphin"
您可以建議對這個輸出進行操作,例如:
vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog" "dolphin"
但是不,我需要get_index()立即回傳正確的索引(在本例中為1和6)。
編輯
提供了一個相關的程式,我們可以在其中獲得第一次出現重復項的索引
library(bit64)
vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8
或更一般地
which(!duplicated(vec_num))
#> [1] 1 2 4 8
如果不需要丟棄不需要的值,這樣的解決方案會很棒。
uj5u.com熱心網友回復:
嘗試:
get_index <- function(x, throw_away) {
which(!duplicated(x) & x!=throw_away)
}
> get_index(vec_animals, "cat")
[1] 1 6
uj5u.com熱心網友回復:
這是一個簡單的自寫函式,可提供所需的資訊。
vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
get_indexes <- function(x, throw_away){
elements <- (unique(x))[(unique(x)) != throw_away]
index <- lapply(1:length(elements), function(i) {which(x %in% elements[i]) })
index2return <- c()
for (j in 1:length(index)) {
index2return <- c(index2return, min(index[[j]]))
}
return(index2return)
}
get_indexes(x = vec_animals, throw_away = "cat")
[1] 1 6
uj5u.com熱心網友回復:
我的方法:
vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
throw_away_val <- "cat"
my_function <- function(x, y) {
my_df <- data.frame("Origin" = x,
"Position" = seq.int(from = 1, to = length(x), by = 1),
stringsAsFactors = FALSE)
my_var <- which(my_df$Origin %in% y)
if (length(my_var)) {
my_df <- my_df[-my_var,]
}
my_df <- my_df[!duplicated(my_df$Origin),]
return (my_df)
}
my_df <- my_function(vec_animals, throw_away_val)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/376494.html
上一篇:洗掉重復的單詞、逗號和空格
下一篇:從df列R中提取特定文本部分
