如何在R中洗掉部分字符變數？-有解無憂

我有一個資料集，我已經在其中為雞尾酒成分創建了單獨的列，因此一列中顯示了一種成分。現在我有這樣的變數：

ingredients <- c("1 1/2 oz Plymouth gin", "1 oz egg white", "3/4 oz lemon juice", "2 oz rye (50% abv)", "2 oz white rum (40% abv)", "3/4 oz lime juice", "3/4 oz honey syrup")

等等。

我需要通過去除所有數量（例如 1/2 盎司、2 個破折號等）和酒精含量指標（例如 47.3% abv）來清潔它。我試著一一做（洗掉數字，然后洗掉“1/2”和“3/4”，洗掉“oz”、“破折號”、“()”和“%”和“ abv"),

df %>%
mutate(ingredient1 = str_remove(ingredient1, "[[:digit:]] ")) %>%
  mutate(ingredient1 = str_remove(ingredient1, "oz"))

但這是很多作業，我很確定有一個更優雅、更有效的解決方案。

我正在尋找一種解決方案，我可以告訴 R 洗掉之前的所有內容，包括“oz”或“破折號”，并洗掉以“（”開頭的所有內容。

uj5u.com熱心網友回復：

以下是您如何完成任務的起點：

library(dplyr)
library(stringr)
df %>% 
  mutate(across(everything(), ~sub(".*oz ", '', .))) %>%
  mutate(across(everything(), ~sub(".*OZ ", '', .))) %>% 
  mutate(across(everything(), ~str_replace(., " \\s*\\([^\\)] \\)", "")))

   ingredient1          ingredient2      ingredient3     
   <chr>                <chr>            <chr>           
 1 pisco                egg white        lime juice      
 2 Plymouth gin         egg white        lemon juice     
 3 Plymouth gin         egg white        Dolin dry vermo 
 4 rye                  simple syrup     lemon juice     
 5 white rum            lime juice       simple syrup    
 6 white rum            lime juice       honey syrup     
 7 white rum            lime juice       simple syrup    
 8 Scotch               Cherry Herring   sweet vermouth  
 9 Cognac               heavy cream      Demerara syrup  
10 white rum            lime juice       grapefruit juice
11 bourbon              grapefruit juice honey syrup     
12 Absolut Citron vodka Cointreau        cranberry juice 
13 bourbon              lemon juice      honey syrup

資料：

structure(list(ingredient1 = c("2 oz pisco (40% abv)", "1 1/2 oz Plymouth gin", 
"2 oz Plymouth gin", "2 oz rye (50% abv)", "2 oz white rum (40% abv)", 
"2 oz white rum (40% abv)", "2 oz white rum (40% abv)", "1 oz Scotch (43% abv)", 
"2 oz Cognac (41% abv)", "2 oz white rum (40% abv)", "2 oz bourbon (45% abv)", 
"1 1/2 oz Absolut Citron vodka", "2 OZ bourbon (47% abv)"), ingredient2 = c("1 oz egg white", 
"1 oz egg white", "1 oz egg white", "3/4 oz simple syrup", "0.875 oz lime juice", 
"3/4 oz lime juice", "3/4 oz lime juice", "3/4 oz Cherry Herring", 
"1 oz heavy cream", "3/4 oz lime juice", "1 oz grapefruit juice", 
"3/4 oz Cointreau", "3/4 oz lemon juice"), ingredient3 = c("3/4 oz lime juice", 
"3/4 oz lemon juice", "1/2 oz Dolin dry vermo", "0.625 oz lemon juice", 
"3/4 oz simple syrup", "3/4 oz honey syrup", "3/4 oz simple syrup", 
"3/4 oz sweet vermouth", "1/4 oz Demerara syrup", "1/2 oz grapefruit juice", 
"1/2 oz honey syrup", "3/4 oz cranberry juice", "3/4 oz honey syrup"
)), row.names = c(NA, -13L), spec = structure(list(cols = list(
    ingredient1 = structure(list(), class = c("collector_character", 
    "collector")), ingredient2 = structure(list(), class = c("collector_character", 
    "collector")), ingredient3 = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), delim = "\t"), class = "col_spec"), problems = <pointer: 0x00000179794ebf20>, class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"))

uj5u.com熱心網友回復：

當您可以通過str_extract使用其定界符（oz在字串的左側和末尾或(右側）為環視運算式將目標資訊放在一行中時，為什么要在多行中執行此操作？

library(stringr)
str_extract(ingredients, "(?<=oz\\s).*?(?=\\s\\(|$)")
[1] "Plymouth gin" "egg white"    "lemon juice"  "rye"          "white rum"    "lime juice"  
[7] "honey syrup"

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/361188.html

標籤：r 正则表达式

上一篇：如何使用正則運算式將大寫文本轉換為小寫文本并結合前瞻和后視

下一篇：合并兩個正則運算式