替換多個字串中的多個單詞-有解無憂

我想根據另一個資料幀中的原始詞和替換詞替換向量中的詞。舉個例子：

要更改的字串向量：

my_words <- c("example r", "example River", "example R", "anthoer river",
        "now a creek", "and another Ck", "example river tributary")

要替換的單詞的資料框和相應的替換單詞：

my_replace <- data.frame(
  original = c("r", "River", "R", "river", "Ck", "creek", "Creek"),
  replacement = c("R", "R", "R", 'R', "C", "C", "C"))

我想用向量中my_replace$original的相應值替換任何出現的詞 in 。我嘗試使用，但它替換了字母/單詞的所有實體，而不僅僅是整個單詞（例如“another”變成了“antheR”），這是不可取的。my_replace$replacementmy_wordsstringr::str_replace_all()

我想要做的偽代碼：

str_replace_all(my_words, my_replace$original, my_replace$replacement)

期望的輸出：

"example R", "example R", "example R", "another R", "now a C", "and another C", "example R tributary"

我確實找到了使用for回圈的解決方案，但鑒于我的資料集很大，for回圈選項太慢了。非常感謝任何建議。

uj5u.com熱心網友回復：

這是一種sub只進行一次替換的方法：

my_words <- c("example r", "example River", "example R", "anthoer river",
    "now a creek", "and another Ck", "example river tributary")

output <- gsub("\\b([rR])(?:iver)?\\b|\\b([cC])(?:ree)?k\\b", "\\U\\1\\U\\2", my_words, perl=TRUE)
output

[1] "example R"           "example R"           "example R"
[4] "anthoer R"           "now a C"             "and another C"
[7] "example R tributary"

由于所有河流和小溪出現的替換分別是R和C，我們可以捕獲每個可能匹配的第一個字母，然后使用這些字母的大寫版本進行替換。

uj5u.com熱心網友回復：

您需要從 in 中的單詞中構建一個基于動態單詞邊界的模式my_words$original，然后使用stringr::str_replace_all相應的值進行替換。請注意，original短語需要按長度降序排序，以使較長的字串首先匹配：

my_words <- c("example r", "example River", "example R", "anthoer river", "now a creek", "and another Ck", "example river tributary")
my_replace <- data.frame(original = c("r", "River", "R", "river", "Ck", "creek", "Creek"), replacement = c("R", "R", "R", 'R', "C", "C", "C"))
sort.by.length.desc <- function (v) v[order( -nchar(v)) ]
library(stringr)
regex <- paste0("\\b(",paste(sort.by.length.desc(my_replace$original), collapse="|"), ")\\b")
str_replace_all(my_words, regex, function(word) my_replace$replacement[my_replace$original==word][[1]][1])

輸出：

[1] "example R"           "example R"           "example R"           "anthoer R"           "now a C"             "and another C"       "example R tributary"

uj5u.com熱心網友回復：

library(stringi)

stri_replace_all_regex(my_words, "\\b" %s % my_replace$original %s % "\\b", my_replace$replacement, vectorize_all = FALSE)

[1] "example R" "example R" "example R" "anthoer R" "now a C" "and another C" "example R tributary"

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/380667.html

標籤：r 正则表达式细绳代替

上一篇：如何使用串列和字符條目取消嵌套串列列（“不能組合串列和字符”）？

下一篇：如何撰寫一個簡單的for回圈，使用鍵值對根據舊列中的值填充新列？