在資料框中創建自定義分位數？-有解無憂

如果我有下表：

tibble(year = c("2020", "2020", "2020","2021", "2021", "2021"),
       website  = c("facebook", "google", "youtube","facebook", "google", "youtube"), 
       method = c("laptop", "laptop", "laptop", "mobile", "mobile", "mobile"), 
       values = c(10,30,60, 90,25, 40))

我如何嘗試根據值列中數字的自定義 q-tile 創建列。

例如，如果我有以下自定義 q-tile 條件：

有風險 - > 50% 既不 - 25-50% 安全 - <25%

這些基本上是針對值列中的數字，根據上面的 q-tile 條件計算它們的排名，并相應地給它們一個 1,2,3 的排名值。

決賽桌應如下所示：

tibble(year = c("2020", "2020", "2020","2021", "2021", "2021"),
       website  = c("facebook", "google", "youtube","facebook", "google", "youtube"), 
       method = c("laptop", "laptop", "laptop", "mobile", "mobile", "mobile"), 
       values = c(10,30,60, 90,25, 40), 
       rank = c(3,2,1,1,3,2))

我知道該表必須按年份和方法分組，因此代碼如下所示：

df %>% group_by(year, method) %>% mutate(rank = quantile(???))

uj5u.com熱心網友回復：

您可以使用quantile(x, c(0.25, 0.5))獲取切點并將它們傳遞到findInterval(). 請注意，這findInterval()類似于cut(*, labels = FALSE)但更有效。

library(dplyr)

df %>%
  group_by(year, method) %>%
  mutate(rank = findInterval(-values, quantile(-values, c(0.25, 0.5)), left.open = TRUE)   1) %>%
  ungroup()

# # A tibble: 6 × 5
#   year  website  method values  rank
#   <chr> <chr>    <chr>   <dbl> <dbl>
# 1 2020  facebook laptop     10     3
# 2 2020  google   laptop     30     2
# 3 2020  youtube  laptop     60     1
# 4 2021  facebook mobile     90     1
# 5 2021  google   mobile     25     3
# 6 2021  youtube  mobile     40     2

如果您想要標簽而不是排名，請使用cut()：

df %>%
  group_by(year, method) %>%
  mutate(rank = cut(values, quantile(values, c(0, 0.25, 0.5, 1)),
                    c("Safe", "Neither", "Risky"), include.lowest = TRUE)) %>%
  ungroup()

# # A tibble: 6 × 5
#   year  website  method values rank   
#   <chr> <chr>    <chr>   <dbl> <fct>  
# 1 2020  facebook laptop     10 Safe   
# 2 2020  google   laptop     30 Neither
# 3 2020  youtube  laptop     60 Risky  
# 4 2021  facebook mobile     90 Risky  
# 5 2021  google   mobile     25 Safe   
# 6 2021  youtube  mobile     40 Neither

uj5u.com熱心網友回復：

您可以使用ntile函式 fromdplyr創建分位數：

library(dplyr)
df %>%
  group_by(year, method) %>%
  mutate(rank = ntile(values, 4))

輸出：

# A tibble: 6 × 5
# Groups:   year, method [2]
  year  website  method values  rank
  <chr> <chr>    <chr>   <dbl> <int>
1 2020  facebook laptop     10     1
2 2020  google   laptop     30     2
3 2020  youtube  laptop     60     3
4 2021  facebook mobile     90     3
5 2021  google   mobile     25     1
6 2021  youtube  mobile     40     2

uj5u.com熱心網友回復：

df %>%
  group_by(year, method) %>%
  mutate(rank = rank(-cut(values, breaks = c(-Inf, quantile(values, probs = c(0.25, 0.50), names = F), Inf), labels = F)))

# # A tibble: 6 x 5
# # Groups:   year, method [2]
#   year  website  method values  rank
#   <chr> <chr>    <chr>   <dbl> <dbl>
# 1 2020  facebook laptop     10     3
# 2 2020  google   laptop     30     2
# 3 2020  youtube  laptop     60     1
# 4 2021  facebook mobile     90     1
# 5 2021  google   mobile     25     3
# 6 2021  youtube  mobile     40     2

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/474527.html

標籤：r 数据框 dplyr tidyverse

上一篇：如何將對稱矩陣轉換為鄰接表

下一篇：獲取資料框中搜索項的索引