根據T/F欄，只粘貼矢量的一部分 -有解無憂

讓我們假設我有一個資料框，其中的物種顯示顏色。

df<-data. frame(name=paste("spec", 1。 5）,
         ind=c（"blue; 綠色","紅色""綠色","紅色。 green;blue",""))

而且一些（藍色和紅色）的顏色實際上是有意義的。我可以直接grepl()它們，然后得到一個T/F列

。

df$isredorblue<-grepl("blue|red", df$ind)

但是現在我想知道哪種有意義的顏色被顯示在一列中。

所需的結果是：

> df 名稱是紅色或藍色的搜索顏色 1 spec 1 blue;green TRUE blue 2 spec 2 紅色 TRUE 紅色 3 spec 3 green FALSE other 4 spec 4 紅色;綠色;藍色 TRUE 紅色;藍色 5 spec 5 FALSE other

我試過用[^] 的gsub，但這并不奏效，因為它匹配所有字母，所以 "r "或 "e "或 "d "不是 "red"...

> gsub("[^red] "/span>。 "",df$ind） [1] "eree" "red" "ree" "redreee" "

現在我正在考慮使用strsplit......但似乎無法想出我的下一步
。
blabla< -strsplit(df$ind, split=" 。 ") blabla<-blabla[-which（! blabla %in% c（"red"。 "blue"））] > blabla [[1]] [1] "red"/span>

請記住這是一個reprex，我的實際資料框架要大得多，而且有不同的指標 "顏色 "對不同的事情很重要，所以我需要能夠在盡可能少的步驟中產生這些列
。
uj5u.com熱心網友回復：

這里有兩種方法。

使用鉸鏈
使用regex -

這從color中創建了一個regex模式，從資料中的ind列中提取。如果沒有提取的值，我們就用'other替換空白。
color < - c('red'>。 'blue'） pat < - paste0(color, collapse = '|) df$is_color_present < - grepl(pat, df$ind) df$searchcolor <- sapply(stringr:: str_extract_all(/span>df$ind, pat), paste0, collapse = ' 。 ') df$searchcolor[/span>df$searchcolor == ''] < - 'other' df # name ind is_color_present searchcolor #1 spec 1 blue;green TRUE blue #2 spec 2 red TRUE red #3 spec 3 green FALSE other #4 spec 4 red;green;blue TRUE red; blue #5 spec 5 FALSE other

使用tidyverse不使用regex -
。

我們獲得長格式的資料，在;上進行分割，只保留那些存在于color中的值。

library（dplyr） library(tidyr) df %>% separate_rows(ind, sep = ' 。 ') %>% group_by(name) %>% summarise(is_color_present = any(ind %in% color)。 searchcolor = paste0（ind[ind %in% color]。崩潰= ' 。 '), searchcolor = replace(searchcolor, searchcolor == ''/span>, 'other'））

uj5u.com熱心網友回復：

這里有一個簡明的解決方案：

library(dplyr) library(stringr)
首先將所有的目標顏色定義為一個矢量：
targets < - c('red'>。 'blue'）
現在使用轉換為重詞交替模式的向量，在新的一列中提取所需的顏色：
df %>% mutate(colors =str_extract_all(ind, paste0(targets, 崩潰= "|")) 命名為ind顏色 1 spec 1 blue; green blue 2 spec 2 red red 3 spec 3 綠色 4 spec 4 紅色;綠色;藍色紅色, 藍色 5 spec 5
如果你有許多顏色名稱，其中一些可能共享相同的字母（如 "red "和 "darkred"），你可能想在顏色名稱周圍包上單詞邊界：

df %>% mutate(colors =str_extract_all(ind, paste0("b(",paste0(targets, 崩潰= "|"）。 ")/b")))

這里是另一個dplyr解決方案（雖然不是最簡潔的）：

df %>% mutate() blue = ifelse(grepl("blue"/span>, ind）。 "blue","other"）, red = ifelse(grepl("red", ind）。 "red","other"）, 目標 = ifelse(blue=="blue"|red=="red"。粘貼(紅色,藍色）。 "other"), 目標 = sub("^others(?=blue|red)|(? < =blue|red)sother$", "", 目標。 perl = TRUE）） %> % 選擇(-c(3。 5））名稱和目標 1 spec 1 blue; green blue 2 spec 2 red red 3 spec 3 green other 4 spec 4 red;green;blue red blue 5 spec 5 other

資料：

df<-data. frame(name=paste("spec", 1。 5）, ind=c（"blue; 綠色","紅色""綠色","紅色。 green;blue",""))

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/324470.html
標籤：

上一篇：不能分配給文字，Python
下一篇：在Java中生成小寫字符而不是大寫字符