撰寫一個函式，將向量作為輸入，丟棄不需要的值，去重并回傳原始向量的相應索引-有解無憂

我正在嘗試撰寫一個函式，該函式接受一個向量并根據幾個步驟對其進行子集化：

丟棄任何不需要的值
洗掉重復項。
在考慮步驟 (1) 和 (2) 后回傳原始向量的索引。

例如，提供以下輸入向量：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

和

throw_away_val <- "cat"

我希望我的函式get_indexes(x = vec_animals, y = throw_away_val)回傳：

# [1] 1 6   # `1` is the index of the 1st unique ("dog") in `vec_animals`, `6` is the index of the 2nd unique ("dolphin")

另一個例子

vec_years <- c(2003, 2003, 2003, 2007, 2007, 2011, 2011, 2011)
throw_away_val <- 2003

回傳：

# [1] 4 6 # `4` is the position of 1st unique (`2007`) after throwing away unwanted val; `6` is the position of 2nd unique (`2011`).

我最初的嘗試

以下函式回傳索引但不考慮重復項

get_index <- function(x, throw_away) {
  which(x != throw_away)
}

然后回傳原始索引，vec_animals例如：

get_index(vec_animals, "cat")
#> [1] 1 2 3 4 6 7

如果我們使用此輸出進行子集化，vec_animal我們將得到：

vec_animals[get_index(vec_animals, "cat")]
#> [1] "dog"     "dog"     "dog"     "dog"     "dolphin" "dolphin"

您可以建議對這個輸出進行操作，例如：

vec_animals[get_index(vec_animals, "cat")] |> unique()
#> [1] "dog"     "dolphin"

但是不，我需要get_index()立即回傳正確的索引（在本例中為1和6）。

編輯

提供了一個相關的程式，我們可以在其中獲得第一次出現重復項的索引

library(bit64)

vec_num <- as.integer64(c(4, 2, 2, 3, 3, 3, 3, 100, 100))
unipos(vec_num)
#> [1] 1 2 4 8

或更一般地

which(!duplicated(vec_num))
#> [1] 1 2 4 8

如果不需要丟棄不需要的值，這樣的解決方案會很棒。

uj5u.com熱心網友回復：

嘗試：

get_index <- function(x, throw_away) {
  which(!duplicated(x) & x!=throw_away)
  }

> get_index(vec_animals, "cat")
[1] 1 6

uj5u.com熱心網友回復：

這是一個簡單的自寫函式，可提供所需的資訊。

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")

get_indexes <- function(x, throw_away){
  elements <- (unique(x))[(unique(x)) != throw_away]
  index <- lapply(1:length(elements), function(i) {which(x %in% elements[i]) })
  index2return <- c()
  for (j in 1:length(index)) {
    index2return <- c(index2return, min(index[[j]]))
  }
  return(index2return)
}

get_indexes(x = vec_animals, throw_away = "cat")
[1] 1 6

uj5u.com熱心網友回復：

我的方法：

vec_animals <- c("dog", "dog", "dog", "dog", "cat", "dolphin", "dolphin")
throw_away_val <- "cat"

my_function <- function(x, y) {
my_df <- data.frame("Origin" = x,
                  "Position" = seq.int(from = 1, to = length(x), by = 1),
                  stringsAsFactors = FALSE)
my_var <- which(my_df$Origin %in% y)
if (length(my_var)) {
my_df <- my_df[-my_var,]
}
my_df <- my_df[!duplicated(my_df$Origin),]
return (my_df)
}

my_df <- my_function(vec_animals, throw_away_val)

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/376494.html

標籤：r 功能向量重复

上一篇：洗掉重復的單詞、逗號和空格

下一篇：從df列R中提取特定文本部分