洗掉重復的單詞、逗號和空格-有解無憂

如何使用 R 中的正則運算式洗掉以下逗號和空格旁邊的所有重復單詞？

到目前為止，我已經提出了以下正則運算式，它匹配重復項，但不匹配逗號和空格。：

    (\b\w \b)(?=[\S\s]*\b\1\b)

一個示例串列是：

    blue, red, blue, yellow, green, blue

輸出應如下所示：

    blue, red, yellow, green

因此，在這種情況下，它必須匹配兩個“藍色”，以及以下逗號和空格（如果有）。

uj5u.com熱心網友回復：

取決于您的串列是真正的串列還是帶有逗號的字串

# your data is actually already a list/vector
v <- c("blue", "red", "blue", "yellow", "green", "blue")

unique(v)
[1] "blue"   "red"    "yellow" "green"

# if your data is actually a comma seperated string
s <- "blue, red, blue, yellow, green, blue"

# if output needs to be a vector
unique(strsplit(s, ", ")[[1]])
[1] "blue"   "red"    "yellow" "green" 

# if output needs to be a string again
paste(unique(strsplit(s, ", ")[[1]]), collapse = ", ")
[1] "blue, red, yellow, green"

基于 data.table 或 data.frame 中的串列列的示例

dt <- data.table(
  id = rep(1:5),
  colors = list(
    c("blue", "red", "blue", "yellow", "green", "blue"),
    c("blue", "blue", "yellow", "green", "blue"),
    c("blue", "red", "blue", "yellow"),
    c("red", "red", "yellow", "yellow", "green", "blue"),
    c("black")
  )
)

## using data.table
library(data.table)
setDT(dt)
# use colors instead of clean_list to just fix the existing column
dt[, clean_list := lapply(colors, function(x) unique(x))]

## using dplyr
library(dplyr)
# use colors instead of clean_list to just fix the existing column
dt %>% mutate(clean_list = lapply(colors, function(x) unique(x)))

dt
#    id                           colors            clean_list
# 1:  1  blue,red,blue,yellow,green,blue blue,red,yellow,green
# 2:  2      blue,blue,yellow,green,blue     blue,yellow,green
# 3:  3             blue,red,blue,yellow       blue,red,yellow
# 4:  4 red,red,yellow,yellow,green,blue red,yellow,green,blue
# 5:  5                            black                 black

# or just simply in base
dt$colors <- lapply(dt$colors, function(x) unique(x))

uj5u.com熱心網友回復：

我們可以使用paste同unique和collapse：

paste(unique(string), collapse= (", "))

[1] "blue, red, yellow, green"

資料：

string <- c("blue", "red", "blue", "yellow", "green", "blue")

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/376493.html

標籤：r 细绳

上一篇：使用ggplot2創建條形圖并按多個值拆分條形？

下一篇：撰寫一個函式，將向量作為輸入，丟棄不需要的值，去重并回傳原始向量的相應索引