根據匹配的正則運算式創建列值-有解無憂

我在名為“句子”的列中有以下字串df：

I like an apple

我想創建第二列，稱為Type，其值由匹配字串確定。我想采用正則運算式\bapple\b，將其與句子匹配，如果匹配，則Fruit_apple在Type列中添加值。

從長遠來看，我想用其他幾個字串和型別來做到這一點。

有沒有一種簡單的方法可以使用函式來做到這一點？

資料集（調查_1）：

structure(list(slider_8.response = c(1L, 1L, 3L, 7L, 7L, 7L, 
1L, 3L, 2L, 1L, 1L, 7L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 1L, 7L, 
7L, 7L, 1L, 1L, 7L, 6L, 6L, 1L, 1L, 7L, 1L, 7L, 7L, 1L, 7L, 7L, 
7L, 7L, 7L, 6L, 7L, 7L, 7L, 1L, 1L, 6L, 1L, 1L, 1L, 1L, 7L, 2L
), Sentences = c("He might could do it.", "I ever see the film.", 
"I may manage to come visit soon.", "She’ll never be forgotten.", 
"They might find something special.", "It might not be a good buy.", 
"Maybe my pain will went away.", "Stephen maybe should fix your bicycle.", 
"It used to didn?t matter if you walked in late.", "He’d could climb the stairs.", 
"Only Graeme would might notice that.", "I used to cycle a lot. ", 
"Your dad belongs to disagree with this. ", "We can were pleased to see her.", 
"He may should take us to the city.", "I could never forgot his deep voice.", 
"I should can turn this thing over to Ann.", "They must knew who they really are.", 
"We used to runs down three flights.", "I don’t care what he may be up to. ", 
"That’s something I ain’t know about.", "That must be quite a skill.", 
"We must be able to invite Jim.", "She used to play with a trolley.", 
"He is done gone. ", "You might can check this before making a decision.", 
"It would have a positive effect on the team. ", "Ruth can maybe look for it later.", 
"You should tag along at the dance.", "They’re finna leave town.", 
"A poem should looks like that.", "I can tell you didn’t do your homework. ", 
"I can driving now.", "They should be able to put a blanket over it.", 
"We could scarcely see each other.", "I might says I was never good at maths.", 
"The next dance will be a quickstep. ", "I might be able to find myself a seat in this place.", 
"Andrew thinks we shouldn’t do it.", "Jack could give a hand.", 
"She’ll be able to come to the event.", "She’d maybe keep the car the way it is.", 
"Sarah used to be able to agree with this proposal.", "I’d like to see your lights working. ", 
"I’d be able to get a little bit more sleep.", "John may has a second name.", 
"You must can apply for this job.", "I maybe could wait till the 8 o’clock train.", 
"She used to could go if she finished early.", "That would meaned something else, eh?", 
"You’ll can enjoy your holiday.", "We liketa drowned that day. ", 
"I must say it’s a nice feeling.", "I eaten my lunch."), construct = c(NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA)), row.names = c(NA, 54L), class = "data.frame")

型別串列：

list("DM_will_can"=c("ll can","will can"), "DM_would_could"=c("d could","would could"),
                  "DM_might_can"="might can","DM_might_could"="might could","DM_used_to_could"="used to could",
                  "DM_should_can"="should can","DM_would_might"=c("d might", "would might"),"DM_may_should"="may should",
                  "DM_must_can"="must can", "SP_will_be_able"=c("ll be able","will be able"),
                  "SP_would_be_able"=c("d be able","would be able"),"SP_might_be_able"="might be able",
                  "SP_maybe_could"="maybe could","SP_used_to_be_able"="used to be able","SP_should_be_able"=
                    "should be able","SP_would_maybe"=c("d maybe", "would maybe"), "SP_maybe_should"="maybe should",
                  "SP_must_be_able"="must be able", "Filler_will_a"="quickstep","Filler_will_b"="forgotten",
                  "Filler_would_a"="lights working","Filler_would_b"="positive effect","Filler_can_a"="homework",
                  "Filler_can_b"="Ruth","Filler_could_a"="scarcely","Filler_could_b"="Jack", "Filler_may_a"="may be up to",
                  "Filler_may_b"="visit soon", "Filler_might_a"="good buy","Filler_might_be"="something special",
                  "Filler_should_a"="tag along","Filler_should_b"="Andrew","Filler_used_to_a"="trolley",
                  "Filler_used_to_b"="cycle a lot","Filler_must_a"="quite a skill","Filler_must_b"="nice feeling",
                  "Dist_gram_will_went"="will went","Dist_gram_meaned"="meaned","Dist_gram_can_were"="can were",
                  "Dist_gram_forgot"="never forgot", "Dist_gram_may_has"="may has", 
                  "Dist_gram_might_says"="might says","Dist_gram_used_to_runs"="used to runs",
                  "Dist_gram_should_looks"="should looks","Dist_gram_must_knew"="must knew","Dist_dial_liketa"="liketa",
                  "Dist_dial_belongs"="belongs to disagree","Dist_dial_finna"="finna","Dist_dial_used_to_didnt"="used to didn't matter",
                  "Dist_dial_eaten"="I eaten", "Dist_dial_can_driving"="can driving","Dist_dial_aint_know"="That's something",
                  "Dist_dial_ever_see"="ever see the film","Dist_dial_done_gone"="done gone")

uj5u.com熱心網友回復：

我想用 Python 字典來做這件事，但我們談論的是 R，所以我或多或少地翻譯了這個方法。在 R 中可能有比兩個for回圈更慣用的方法來執行此操作，但這應該有效：

# Define data
df <- data.frame(
    id = c(1:5),
    sentences = c("I like apples", "I like dogs", "I have cats", "Dogs are cute", "I like fish")
)

#   id     sentences
# 1  1 I like apples
# 2  2   I like dogs
# 3  3   I have cats
# 4  4 Dogs are cute
# 5  5   I like fish

type_list <- list(
    "fruit" = c("apples", "oranges"),
    "animals" = c("dogs", "cats")
)

types <- names(type_list)

df$type <- NA
df$item <- NA

for (type in types) {
    for (item in type_list[[type]]) {
        matches <- grep(item, df$sentences, ignore.case = TRUE)
        df[matches, "type"]  = type
        df[matches, "item"]  = item
    }
}


# Output:
#   id     sentences    type   item
# 1  1 I like apples   fruit apples
# 2  2   I like dogs animals   dogs
# 3  3   I have cats animals   cats
# 4  4 Dogs are cute animals   dogs
# 5  5   I like fish    <NA>   <NA>

編輯

添加資料后添加。如果我讀入您的資料并呼叫它df，以及您的型別串列并呼叫它type_list，則以下作業：


types <- names(type_list)

df$type <- NA
df$item <- NA

for (type in types) {
    for (item in type_list[[type]]) {
        matches <- grep(item, df$Sentences, ignore.case = TRUE)
        df[matches, "type"]  = type
        df[matches, "item"]  = item
    }
}

這與我之前的代碼完全相同，只是Sentences在您的資料框中有一個大寫的 S。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/456711.html

標籤：r 正则表达式细绳数据框匹配

上一篇：如何在python中搜索字串以查找軍事時間的第一次和第二次出現

下一篇：如何按排序順序遍歷字典鍵