我想將在另一列中具有相同單詞的行合并在一起。解決方案應該在R Base 中。表條目都是逗號分隔的字串(字符),而不是串列。因此,如下所示,相同顏色的顏色深淺應添加在一行中的字串中,而不是跨越多行。此外,顏色深淺列中不應有重復項。
我已經嘗試過:
aggregate(df["Color shades"], df["Color"], paste, collapse=", ")
以及:
aggregate(Color shades ~ Color ,df ,toString)
但這并沒有導致想要的結果。
資料框:
df <- data.frame(colorshades = c("turquoise, babyblue", "royal blue, true blue",
"navy blue, true blue"), colors = c("blue", "blue", "blue"))
目前:
| 顏色深淺 | 顏色 |
|---|---|
| 綠松石,淡藍色 | 藍色 |
| 皇家藍,真藍 | 藍色 |
| 海軍藍,真藍 | 藍色 |
期望輸出:
| 顏色深淺 | 顏色 |
|---|---|
| 綠松石、嬰兒藍、寶藍色、真藍、海軍藍 | 藍色 |
uj5u.com熱心網友回復:
轉換
"Color shades"為串列列:lapply(strsplit(df[["Color shades"]], ","), trimws) # [[1]] # [1] "turquoise" "babyblue" # [[2]] # [1] "royal blue" "true blue" # [[3]] # [1] "navy blue" "true blue" df[["Color shades"]] <- lapply(strsplit(df[["Color shades"]], ","), trimws) df # Color shades Color # 1 turquoise, babyblue blue # 2 royal blue, true blue blue # 3 navy blue, true blue blue聚合
unique:aggregate(df["Color shades"], df["Color"], function(z) paste(unique(unlist(z)), collapse=", ")) # Color Color shades # 1 blue turquoise, babyblue, royal blue, true blue, navy blue或者,與串列列方法保持一致,
aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z)))) # Color Color shades # 1 blue turquoise, babyblue, royal blue, true blue, navy blue str(aggregate(df["Color shades"], df["Color"], function(z) list(unique(unlist(z))))) # 'data.frame': 1 obs. of 2 variables: # $ Color : chr "blue" # $ Color shades:List of 1 # ..$ : chr "turquoise" "babyblue" "royal blue" "true blue" ...
處理串列列副逗號分隔值通常(但并非總是)有優勢。如果您的用例是這樣的,您經常想查看這些欄位之一中的單個元素,您會發現自己深入處理正則運算式和/或反復使用strsplit分隔符。使用串列列,人們可以使用類似unique和%in%放棄的工具(盡管不可否認,人們應該更習慣lapply/ sapply,并且許多用于聚合的 base-R 工具并不總是與串列列一致地作業)。
資料
df <- structure(list(`Color shades` = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"), Color = c("blue", "blue", "blue")), class = "data.frame", row.names = c(NA, -3L))
uj5u.com熱心網友回復:
如果您可以使用庫“dplyr”,您也可以這樣做:
library(dplyr)
df <- data.frame("Colorshade" = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"),
"Color" = c(rep("blue", 3)),
stringsAsFactors = FALSE)
my_df <- df %>% group_by(Color) %>% mutate(Colorshade = paste(unique(sort(str_split(string = paste(df$Colorshade, collapse = ", "), pattern = ", ", simplify = TRUE))), collapse = ", ")) %>% first()
uj5u.com熱心網友回復:
data.table 解決方案
library(data.table)
setDT(df)[, .(Color_shades = paste0(unique(unlist(strsplit(colorshades, ", "))),
collapse = ", ")),
by = .(colors)]
# colors Color_shades
# 1: blue turquoise, babyblue, royal blue, true blue, navy blue
uj5u.com熱心網友回復:
也可以使用tidytextto with unnest
library(dplyr)
library(tidytext)
color_df <- tibble(color= rep("blue", times = 3),
color_shades = c("turquoise, babyblue", "royal blue, true blue", "navy blue, true blue"))
color_shades_agg <- color_df %>%
unnest_tokens(word, color_shades, token = 'regex', pattern=", ") %>%
group_by(color) %>%
distinct() %>%
summarise(color_shades = paste0(sort(word), collapse = ", "))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/378700.html
標籤:r
上一篇:R-根據跨幾列的值范圍過濾行
下一篇:引數值是多少
