我想根據另一個資料幀中的原始詞和替換詞替換向量中的詞。舉個例子:
要更改的字串向量:
my_words <- c("example r", "example River", "example R", "anthoer river",
"now a creek", "and another Ck", "example river tributary")
要替換的單詞的資料框和相應的替換單詞:
my_replace <- data.frame(
original = c("r", "River", "R", "river", "Ck", "creek", "Creek"),
replacement = c("R", "R", "R", 'R', "C", "C", "C"))
我想用向量中my_replace$original的相應值替換任何出現的詞 in 。我嘗試使用,但它替換了字母/單詞的所有實體,而不僅僅是整個單詞(例如“another”變成了“antheR”),這是不可取的。my_replace$replacementmy_wordsstringr::str_replace_all()
我想要做的偽代碼:
str_replace_all(my_words, my_replace$original, my_replace$replacement)
期望的輸出:
"example R", "example R", "example R", "another R", "now a C", "and another C", "example R tributary"
我確實找到了使用for回圈的解決方案,但鑒于我的資料集很大,for回圈選項太慢了。非常感謝任何建議。
uj5u.com熱心網友回復:
這是一種sub只進行一次替換的方法:
my_words <- c("example r", "example River", "example R", "anthoer river",
"now a creek", "and another Ck", "example river tributary")
output <- gsub("\\b([rR])(?:iver)?\\b|\\b([cC])(?:ree)?k\\b", "\\U\\1\\U\\2", my_words, perl=TRUE)
output
[1] "example R" "example R" "example R"
[4] "anthoer R" "now a C" "and another C"
[7] "example R tributary"
由于所有河流和小溪出現的替換分別是R和C,我們可以捕獲每個可能匹配的第一個字母,然后使用這些字母的大寫版本進行替換。
uj5u.com熱心網友回復:
您需要從 in 中的單詞中構建一個基于動態單詞邊界的模式my_words$original,然后使用stringr::str_replace_all相應的值進行替換。請注意,original短語需要按長度降序排序,以使較長的字串首先匹配:
my_words <- c("example r", "example River", "example R", "anthoer river", "now a creek", "and another Ck", "example river tributary")
my_replace <- data.frame(original = c("r", "River", "R", "river", "Ck", "creek", "Creek"), replacement = c("R", "R", "R", 'R', "C", "C", "C"))
sort.by.length.desc <- function (v) v[order( -nchar(v)) ]
library(stringr)
regex <- paste0("\\b(",paste(sort.by.length.desc(my_replace$original), collapse="|"), ")\\b")
str_replace_all(my_words, regex, function(word) my_replace$replacement[my_replace$original==word][[1]][1])
輸出:
[1] "example R" "example R" "example R" "anthoer R" "now a C" "and another C" "example R tributary"
正則運算式將是\b(River|river|creek|Creek|Ck|r|R)\b,它將作為一個完整的單詞匹配里面的任何單詞。
uj5u.com熱心網友回復:
library(stringi)
stri_replace_all_regex(my_words, "\\b" %s % my_replace$original %s % "\\b", my_replace$replacement, vectorize_all = FALSE)
[1] "example R" "example R" "example R" "anthoer R" "now a C" "and another C" "example R tributary"
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/380667.html
