我需要在自由文本描述列中識別特定詞和特定詞的組合。我的資料集包含兩列 - 參考編號和描述。資料與維修有關。我需要能夠確定每個參考編號的維修發生在哪個房間。這可能包括“廚房”、“浴室”、“餐廳”等。
資料集看起來像這樣
|reference|description |
|————————-|———————————————————————-|
|123456 |repair light in kitchen |
我需要的輸出是這樣的:
|reference|Room |
|————————-|————————|
|123456 |kitchen |
非常感謝任何幫助。
uj5u.com熱心網友回復:
這將從room_vector每個描述中提取第一個匹配項。
room_vector = c("kitchen", "bathroom", "dining room")
library(stringr)
your_data$room = str_extract(your_data$description, paste(room_vector, collapse = "|"))
uj5u.com熱心網友回復:
此版本考慮了與單詞的組合repair:
library(dplyr)
library(stringr)
my_vector <- c("kitchen", "bathroom", "dining room")
pattern <- paste(my_vector, collapse = "|")
df %>%
mutate(Room = case_when(
str_detect(description, "repair") &
str_detect(description, pattern) ~ str_extract(description, pattern)))
如果將代碼應用于此資料框:
reference description
1 123456 live in light in kitchen
你會得到:
reference description Room
1 123456 live in light in kitchen <NA>
第一個版本沒有考慮與單詞的組合repair:類似于 Gregor Thomas 的解決方案:
library(dplyr)
library(stringr)
my_vector <- c("kitchen", "bathroom", "dining room")
pattern <- paste(my_vector, collapse = "|")
df %>%
mutate(Room = case_when(
str_detect(description, "repair") |
str_detect(description, pattern) ~ str_extract(description, pattern)))
reference description Room
1 123456 repair light in kitchen kitchen
uj5u.com熱心網友回復:
Using Base R:
rooms <- c("kitchen", "bathroom", "dining room")
pat <- sprintf('.*repair.*(%s).*|.*', paste0(rooms, collapse = '|'))
transform(df, room = sub(pat, '\\1', reference))
reference room
1 repair bathroom bathroom
2 live bathroom
3 repair lights in kitchen kitchen
4 food in kitchen
5 tv in dining room
6 table repair dining room dining room
資料:
df <- structure(list(reference = c("repair bathroom", "live bathroom",
"repair lights in kitchen", "food in kitchen", "tv in dining room",
"table repair dining room ")), class = "data.frame", row.names = c(NA,
-6L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/383109.html
標籤:r
