多串列示一個值時的頻率表(R)-有解無憂

我有一個這樣的資料集：

ID    color1   color2  color3   shape1       shape2        size
55    red     blue     NA       circle       triangle      small
67    yellow  NA       NA       triangle     NA            medium
83    blue    yellow   NA       circle       NA            large
78    red     yellow   blue     square       circle        large
43    green   NA       NA       square       circle        small
29    yellow  green    NA       circle       triangle      medium

我想創建一個資料框，其中包含每個變數的頻率和百分比，但我遇到了麻煩，因為在某些情況下同一變數有多個列。


Variable      Level        Freq        Percent 
 
color         blue          3           27.27
              red           2           18.18
              yellow        4           36.36
              green         2           18.18
              total         11          100.00

shape         circle        5           50.0       
              triangle      3           30.0
              square        2           20.0
              total         10          100.0

size          small         2           33.3
              medium        2           33.3
              large         2           33.3
              total         6           100.0

我相信我需要將這些變數轉換為 long，然后使用匯總/變異來獲取頻率，但我似乎無法弄清楚。任何幫助是極大的贊賞。

uj5u.com熱心網友回復：

您可以使用tidyverse包將資料轉換為長格式，然后匯總所需的統計資訊。

library(tidyverse)

df |> 
  # Transform all columns into a long format
  pivot_longer(cols = -ID,
               names_pattern = "([A-z] )",
               names_to = c("variable")) |>
  # Drop NA entries
  drop_na(value) |>
  # Group by variable
  group_by(variable) |>
  # Count
  count(value) |>
  # Calculate percentage as n / sum of n by variable
  mutate(perc = 100* n / sum(n))

# A tibble: 10 x 4
# Groups:   variable [3]
#   variable value        n  perc
#   <chr>    <chr>    <int> <dbl>
# 1 color    blue         3  27.3
# 2 color    green        2  18.2
# 3 color    red          2  18.2
# 4 color    yellow       4  36.4
# 5 shape    circle       5  50  
# 6 shape    square       2  20  
# 7 shape    triangle     3  30  
# 8 size     large        2  33.3
# 9 size     medium       2  33.3
#10 size     small        2  33.3

uj5u.com熱心網友回復：

組合并添加到：

在 R 中合并多個頻率表
為堆疊頻率表中的每個組添加一列總計 n


library(dplyr)
library(tidyr)
library(janitor)

options(digits = 3)

df %>% 
  pivot_longer(
    -ID,
    names_to = "Variable",
    values_to = "Level"
  ) %>% 
  mutate(Variable = str_extract(Variable, '[A-Za-z]*')) %>% 
  group_by(Variable, Level) %>% 
  count(Level, name = "Freq") %>% 
  na.omit() %>% 
  group_by(Variable) %>% 
  mutate(Percent = Freq/sum(Freq)*100) %>% 
  group_split() %>% 
  adorn_totals() %>% 
  bind_rows() %>% 
  mutate(Level = ifelse(Level == last(Level), last(Variable), Level)) %>% 
  mutate(Variable = ifelse(duplicated(Variable) |
                             Variable == "Total", NA, Variable))

 Variable    Level Freq Percent
    color     blue    3    27.3
     <NA>    green    2    18.2
     <NA>      red    2    18.2
     <NA>   yellow    4    36.4
     <NA>    Total   11   100.0
    shape   circle    5    50.0
     <NA>   square    2    20.0
     <NA> triangle    3    30.0
     <NA>    Total   10   100.0
     size    large    2    33.3
     <NA>   medium    2    33.3
     <NA>    small    2    33.3
     <NA>    Total    6   100.0

uj5u.com熱心網友回復：

嘗試這種matrix在list在基礎R

uniq <- unique( sub( "[0-9]","", colnames(dat[,-1]) ) )
uniq
[1] "color" "shape" "size"

sapply( uniq, function(x){ tbl <- table( unlist( dat[,grep( x, colnames(dat) )] ) ); 
  rbind( cbind( Percent=round( tbl/sum(tbl)*100, digits=2 ), Freq=tbl ), 
         cbind( sum(tbl/sum(tbl)*100), sum(tbl) ) ) } )
$color
         Percent Freq
blue       27.27    3
green      18.18    2
red        18.18    2
yellow     36.36    4
          100.00   11

$shape
         Percent Freq
circle        50    5
square        20    2
triangle      30    3
             100   10

$size
         Percent Freq
large      33.33    2
medium     33.33    2
small      33.33    2
          100.00    6

資料

dat <- structure(list(ID = c(55L, 67L, 83L, 78L, 43L, 29L), color1 = c("red", 
"yellow", "blue", "red", "green", "yellow"), color2 = c("blue", 
NA, "yellow", "yellow", NA, "green"), color3 = c(NA, NA, NA, 
"blue", NA, NA), shape1 = c("circle", "triangle", "circle", "square", 
"square", "circle"), shape2 = c("triangle", NA, NA, "circle", 
"circle", "triangle"), size = c("small", "medium", "large", "large", 
"small", "medium")), class = "data.frame", row.names = c(NA, 
-6L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/383141.html

標籤：r

上一篇：根據最高和最低列值過濾行

下一篇：有沒有辦法使用CPACK NSIS指定添加/洗掉圖示？