我的資料
我有一個詞向量,如下所示。這過于簡單化了,我的真實向量超過 600 個單詞:
myvec <- c("cat", "dog, "bird")
我有一個具有以下結構的資料框:
structure(list(id = c(1, 2, 3), onetext= c("cat furry pink british",
"dog cat fight", "bird cat issues"), cop= c("Little Grey Cat is the nickname given to a kitten of the British Shorthair breed that rose to viral fame on Tumblr through a variety of musical tributes and photoshopped parodies in late September 2014",
"Dogs have soft fur and tails so do cats Do cats like to chase their tails",
"A cat and bird can coexist in a home but you will have to take certain measures to ensure that a cat cannot physically get to the bird at any point"
), text3 = c("On October 4th the first single topic blog devoted to the little grey cat was launched On October 20th Tumblr blogger Torridgristle shared a cutout exploitable image of the cat, which accumulated over 21000 notes in just over three months.",
"there are many fights going on and this is just an example text",
"Some cats will not care about a pet bird at all while others will make it its life mission to get at a bird You will need to assess the personalities of your pets and always remain on guard if you allow your bird and cat to interact"
)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
它看起來像下圖

我的問題
對于vector 上的每個關鍵字myvec,我需要遍歷資料集并檢查列onetext, cop, text3,如果在這 3 列中的任何一列中找到關鍵字,則需要將關鍵字附加到新列中。結果如下圖所示:

我的原始資料集非常大(最后一列最長),因此執行多個嵌套回圈(這是我嘗試過的)并不理想。
編輯:請注意,只要該詞在該行中出現一次,就足夠了,應該列出。應列出所有關鍵字。
我怎么能這樣做?我正在使用 tidyverse,所以我的資料集實際上是一個tibble.
類似的帖子(但不完全是)
以下帖子有些相似,但不完全相同:
- 如果列包含字串,則輸入該行的值
- R列檢查是否包含來自另一列的值
- 如果列范圍包含 R 中的字串,則添加新列
uj5u.com熱心網友回復:
更新:如果首選串列:使用 str_extract_all:
df %>%
transmute(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}"))
給出:
new_colonetext new_colcop new_coltext3
<list> <list> <list>
1 <chr [1]> <NULL> <chr [2]>
2 <chr [2]> <chr [2]> <NULL>
3 <chr [2]> <chr [4]> <chr [5]>
以下是實作結果的方法:
- 創建向量的模式
- 用于
mutateacross檢查所需的列 - 如果檢測到所需的字串,則提取到新列!
myvec <- c("cat", "dog", "bird")
pattern <- paste(myvec, collapse="|")
library(dplyr)
library(tidyr)
df %>%
mutate(across(-id, ~case_when(str_detect(., pattern) ~ str_extract_all(., pattern)), .names = "new_col{col}")) %>%
unite(topic, starts_with('new'), na.rm = TRUE, sep = ',')
id onetext cop text3 topic
<dbl> <chr> <chr> <chr> <chr>
1 1 cat furry pink british Little Grey Cat is the nickname given to a kitten of the British Shorthai~ On October 4th the first single topic blog devoted to the little grey cat was lau~ "cat,NULL,c(\"cat\", \"cat\")"
2 2 dog cat fight Dogs have soft fur and tails so do cats Do cats like to chase their tails there are many fights going on and this is just an example text "c(\"dog\", \"cat\"),c(\"cat\", \"cat\"),~
3 3 bird cat issues A cat and bird can coexist in a home but you will have to take certain me~ Some cats will not care about a pet bird at all while others will make it its lif~ "c(\"bird\", \"cat\"),c(\"cat\", \"bird\"~
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/383094.html
