我有一個資料集,我已經在其中為雞尾酒成分創建了單獨的列,因此一列中顯示了一種成分。現在我有這樣的變數:
ingredients <- c("1 1/2 oz Plymouth gin", "1 oz egg white", "3/4 oz lemon juice", "2 oz rye (50% abv)", "2 oz white rum (40% abv)", "3/4 oz lime juice", "3/4 oz honey syrup")
等等。
我需要通過去除所有數量(例如 1/2 盎司、2 個破折號等)和酒精含量指標(例如 47.3% abv)來清潔它。我試著一一做(洗掉數字,然后洗掉“1/2”和“3/4”,洗掉“oz”、“破折號”、“()”和“%”和“ abv"),
df %>%
mutate(ingredient1 = str_remove(ingredient1, "[[:digit:]] ")) %>%
mutate(ingredient1 = str_remove(ingredient1, "oz"))
但這是很多作業,我很確定有一個更優雅、更有效的解決方案。
我正在尋找一種解決方案,我可以告訴 R 洗掉之前的所有內容,包括“oz”或“破折號”,并洗掉以“(”開頭的所有內容。
uj5u.com熱心網友回復:
以下是您如何完成任務的起點:
library(dplyr)
library(stringr)
df %>%
mutate(across(everything(), ~sub(".*oz ", '', .))) %>%
mutate(across(everything(), ~sub(".*OZ ", '', .))) %>%
mutate(across(everything(), ~str_replace(., " \\s*\\([^\\)] \\)", "")))
ingredient1 ingredient2 ingredient3
<chr> <chr> <chr>
1 pisco egg white lime juice
2 Plymouth gin egg white lemon juice
3 Plymouth gin egg white Dolin dry vermo
4 rye simple syrup lemon juice
5 white rum lime juice simple syrup
6 white rum lime juice honey syrup
7 white rum lime juice simple syrup
8 Scotch Cherry Herring sweet vermouth
9 Cognac heavy cream Demerara syrup
10 white rum lime juice grapefruit juice
11 bourbon grapefruit juice honey syrup
12 Absolut Citron vodka Cointreau cranberry juice
13 bourbon lemon juice honey syrup
資料:
structure(list(ingredient1 = c("2 oz pisco (40% abv)", "1 1/2 oz Plymouth gin",
"2 oz Plymouth gin", "2 oz rye (50% abv)", "2 oz white rum (40% abv)",
"2 oz white rum (40% abv)", "2 oz white rum (40% abv)", "1 oz Scotch (43% abv)",
"2 oz Cognac (41% abv)", "2 oz white rum (40% abv)", "2 oz bourbon (45% abv)",
"1 1/2 oz Absolut Citron vodka", "2 OZ bourbon (47% abv)"), ingredient2 = c("1 oz egg white",
"1 oz egg white", "1 oz egg white", "3/4 oz simple syrup", "0.875 oz lime juice",
"3/4 oz lime juice", "3/4 oz lime juice", "3/4 oz Cherry Herring",
"1 oz heavy cream", "3/4 oz lime juice", "1 oz grapefruit juice",
"3/4 oz Cointreau", "3/4 oz lemon juice"), ingredient3 = c("3/4 oz lime juice",
"3/4 oz lemon juice", "1/2 oz Dolin dry vermo", "0.625 oz lemon juice",
"3/4 oz simple syrup", "3/4 oz honey syrup", "3/4 oz simple syrup",
"3/4 oz sweet vermouth", "1/4 oz Demerara syrup", "1/2 oz grapefruit juice",
"1/2 oz honey syrup", "3/4 oz cranberry juice", "3/4 oz honey syrup"
)), row.names = c(NA, -13L), spec = structure(list(cols = list(
ingredient1 = structure(list(), class = c("collector_character",
"collector")), ingredient2 = structure(list(), class = c("collector_character",
"collector")), ingredient3 = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), delim = "\t"), class = "col_spec"), problems = <pointer: 0x00000179794ebf20>, class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"))
uj5u.com熱心網友回復:
當您可以通過str_extract使用其定界符(oz在字串的左側和末尾或(右側)為環視運算式將目標資訊放在一行中時,為什么要在多行中執行此操作?
library(stringr)
str_extract(ingredients, "(?<=oz\\s).*?(?=\\s\\(|$)")
[1] "Plymouth gin" "egg white" "lemon juice" "rye" "white rum" "lime juice"
[7] "honey syrup"
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/361188.html
上一篇:如何使用正則運算式將大寫文本轉換為小寫文本并結合前瞻和后視
下一篇:合并兩個正則運算式
