如何使用 R 中的正則運算式洗掉以下逗號和空格旁邊的所有重復單詞?
到目前為止,我已經提出了以下正則運算式,它匹配重復項,但不匹配逗號和空格。:
(\b\w \b)(?=[\S\s]*\b\1\b)
一個示例串列是:
blue, red, blue, yellow, green, blue
輸出應如下所示:
blue, red, yellow, green
因此,在這種情況下,它必須匹配兩個“藍色”,以及以下逗號和空格(如果有)。
uj5u.com熱心網友回復:
取決于您的串列是真正的串列還是帶有逗號的字串
# your data is actually already a list/vector
v <- c("blue", "red", "blue", "yellow", "green", "blue")
unique(v)
[1] "blue" "red" "yellow" "green"
# if your data is actually a comma seperated string
s <- "blue, red, blue, yellow, green, blue"
# if output needs to be a vector
unique(strsplit(s, ", ")[[1]])
[1] "blue" "red" "yellow" "green"
# if output needs to be a string again
paste(unique(strsplit(s, ", ")[[1]]), collapse = ", ")
[1] "blue, red, yellow, green"
基于 data.table 或 data.frame 中的串列列的示例
dt <- data.table(
id = rep(1:5),
colors = list(
c("blue", "red", "blue", "yellow", "green", "blue"),
c("blue", "blue", "yellow", "green", "blue"),
c("blue", "red", "blue", "yellow"),
c("red", "red", "yellow", "yellow", "green", "blue"),
c("black")
)
)
## using data.table
library(data.table)
setDT(dt)
# use colors instead of clean_list to just fix the existing column
dt[, clean_list := lapply(colors, function(x) unique(x))]
## using dplyr
library(dplyr)
# use colors instead of clean_list to just fix the existing column
dt %>% mutate(clean_list = lapply(colors, function(x) unique(x)))
dt
# id colors clean_list
# 1: 1 blue,red,blue,yellow,green,blue blue,red,yellow,green
# 2: 2 blue,blue,yellow,green,blue blue,yellow,green
# 3: 3 blue,red,blue,yellow blue,red,yellow
# 4: 4 red,red,yellow,yellow,green,blue red,yellow,green,blue
# 5: 5 black black
# or just simply in base
dt$colors <- lapply(dt$colors, function(x) unique(x))
uj5u.com熱心網友回復:
我們可以使用paste同unique和collapse:
paste(unique(string), collapse= (", "))
[1] "blue, red, yellow, green"
資料:
string <- c("blue", "red", "blue", "yellow", "green", "blue")
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/376493.html
