計算唯一串列項-有解無憂

假設我有一個dt.recipes包含各種專案的串列的資料表，例如：

recipe_id     ingredients
1             apple, banana, cucumber, water
2             apple, meat, water
3             water

如何創建一個表格，計算其中存在的唯一專案的數量dt.recipes$ingredients？換句話說，我正在尋找與此類似的結果：

ingredient    count
water         3
apple         2
banana        1
cucumber      1
meat          1

任何指標將不勝感激，在此先感謝！

uj5u.com熱心網友回復：

你可以做：

 as.data.frame(table(unlist(strsplit(df$ingredients, ", "))))
#>       Var1 Freq
#> 1    apple    2
#> 2   banana    1
#> 3 cucumber    1
#> 4     meat    1
#> 5    water    3

資料

df <- structure(list(recipe_id = 1:3, 
               ingredients = c("apple, banana, cucumber, water", 
                               "apple, meat, water", 
                               "water")), 
          class = "data.frame", row.names = c(NA, -3L))

df
#>   recipe_id                    ingredients
#> 1         1 apple, banana, cucumber, water
#> 2         2             apple, meat, water
#> 3         3                          water

^{由reprex 包于 2022-03-07 創建(v2.0.1)}

uj5u.com熱心網友回復：

具有以下功能tidyverse：

library(tidyverse)
df %>% 
  separate_rows(ingredients) %>% 
  count(ingredients, name = "count") %>% 
  arrange(desc(count))

# A tibble: 5 x 2
#  ingredients count
#  <chr>       <int>
#1 water           3
#2 apple           2
#3 banana          1
#4 cucumber        1
#5 meat            1

uj5u.com熱心網友回復：

一種data.table方法可能是

library(data.table)
dt[, .(table(unlist(ingredients)))]
#         V1 N
#1:    apple 2
#2:   banana 1
#3: cucumber 1
#4:     meat 1
#5:    water 3

資料

dt <- data.table(
  "recipe_id" = 1:3,
  "ingredients" = list(
    c("apple", "banana", "cucumber", "water"),
    c("apple", "meat", "water"),
    c("water")
  )
)

uj5u.com熱心網友回復：

帶有scan table 的基本 R 選項as.data.frame

> with(df, as.data.frame(table(trimws(scan(text = ingredients, what = "", sep = ",", quiet = TRUE)))))
      Var1 Freq
1    apple    2
2   banana    1
3 cucumber    1
4     meat    1
5    water    3

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/439255.html

標籤：r 列表数据框数据表总计的

上一篇：從字典值中獲取子字串

下一篇：Python：回圈后輸出錯誤