我有一個非正態分布的資料集,包含很多 0 值。我現在想計算每列的三分位數。
df <- tribble(
~shop, ~products, ~sales,
'A', 300, 100,
'B', 0, 0,
'C', 10, 2000,
'D', 0, 0,
'E', 0, 0,
'F', 0, 0,
'G', 20, 10,
'H', 0, 0,
'J', 700, 50,
'K', 0, 0,
)
感謝@AlexB 對該問題的回答,我嘗試使用以下代碼計算三分位數:
df %>%
arrange(products) %>%
mutate(tertiles = ntile(products, 3)) %>%
mutate(tertiles = if_else(tertiles == 1, 'Low', if_else(tertiles == 2, 'Medium', 'High')))
但是,即使值為 0,輸出也會變為“高”。我怎樣才能更準確地計算它?
uj5u.com熱心網友回復:
我認為您正在尋找的可以通過使用cut來實作,而不是ntile. 使用breaks引數 incut定義三個標簽的限制,并使用labels引數指定標簽本身。
df %>%
arrange(products) %>%
mutate(tertile = cut(products,
breaks = c(-1, 1, 100, Inf),
labels = c("low", "medium", "high")))
#> # A tibble: 10 x 4
#> shop products sales tertile
#> <chr> <dbl> <dbl> <fct>
#> 1 B 0 0 low
#> 2 D 0 0 low
#> 3 E 0 0 low
#> 4 F 0 0 low
#> 5 H 0 0 low
#> 6 K 0 0 low
#> 7 C 10 2000 medium
#> 8 G 20 10 medium
#> 9 A 300 100 high
#> 10 J 700 50 high
附錄
要將相同的方法應用于每一列,我們可以這樣做:
f <- function(x) cut(x, c(-1, 1, 100, Inf), c("low", "medium", "high"))
df %>%
arrange(products) %>%
mutate(across(c("products", "sales"), .fns = f, .names = "{.col}_tertile"))
#> # A tibble: 10 x 5
#> shop products sales products_tertile sales_tertile
#> <chr> <dbl> <dbl> <fct> <fct>
#> 1 B 0 0 low low
#> 2 D 0 0 low low
#> 3 E 0 0 low low
#> 4 F 0 0 low low
#> 5 H 0 0 low low
#> 6 K 0 0 low low
#> 7 C 10 2000 medium high
#> 8 G 20 10 medium medium
#> 9 A 300 100 high medium
#> 10 J 700 50 high medium
由reprex 包于 2022-01-23 創建(v2.0.1)
uj5u.com熱心網友回復:
更新:
不知何故,很明顯我是第一個找到正確解決方案的人。但艾倫卡梅隆完美地完成了它。所以這沒關系,因為我從艾倫卡梅隆那里學到了很多東西:
給出我的最終解決方案:
df %>%
mutate(across(c(products, sales), ~cut(., breaks = 3, labels = c("low", "medium", "high")), .names = "tertile_{.col}"))
shop products sales tertile_products tertile_sales
<chr> <dbl> <dbl> <fct> <fct>
1 A 300 100 medium low
2 B 0 0 low low
3 C 10 2000 low high
4 D 0 0 low low
5 E 0 0 low low
6 F 0 0 low low
7 G 20 10 low low
8 H 0 0 low low
9 J 700 50 high low
10 K 0 0 low low
第一個答案:
對于列products:
df %>%
arrange(products) %>%
mutate(tertiles = cut(products, breaks = 3, labels = c(1:3))) %>%
mutate(tertiles = case_when(tertiles==1 ~ "Low",
tertiles==2 ~ "Medium",
tertiles==3 ~ "High",
TRUE ~NA_character_))
shop products sales tertiles
<chr> <dbl> <dbl> <chr>
1 B 0 0 Low
2 D 0 0 Low
3 E 0 0 Low
4 F 0 0 Low
5 H 0 0 Low
6 K 0 0 Low
7 C 10 2000 Low
8 G 20 10 Low
9 A 300 100 Medium
10 J 700 50 High
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/420131.html
標籤:
