我在下面有這個資料。我想用中位數的 10 倍來估算收入的大值。這是使用以下代碼完成的:
df$income_imputed = ifelse(df$income > (10* median(df$income,na.rm = T)), (10* median(df$income,na.rm = T)), df$income)
但是,我想分別針對每個國家和年份而不是整個資料集執行此操作。我知道group_by可能對此類任務有所幫助,但我不確定如何將這兩個功能結合在一起。
country year income
<dbl> <dbl> <dbl>
1 1 1999 5000
2 1 1999 5000
3 1 1999 10000000
4 1 1999 3000
5 1 2000 4000
6 1 2000 4000
7 1 2000 20000000
8 1 2000 4000
9 2 1999 10000
10 2 1999 10000
11 2 1999 30000000
12 2 1999 4000
13 2 2000 12000
14 2 2000 12000
15 2 2000 40000000
df= structure(list(country = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
2, 2, 2), year = c(1999, 1999, 1999, 1999, 2000, 2000, 2000,
2000, 1999, 1999, 1999, 1999, 2000, 2000, 2000), income = c(5000,
5000, 1e 07, 3000, 4000, 4000, 2e 07, 4000, 10000, 10000, 3e 07,
4000, 12000, 12000, 4e 07), income2 = c(5000, 5000, 1e 05, 3000,
4000, 4000, 1e 05, 4000, 10000, 10000, 1e 05, 4000, 12000, 12000,
1e 05)), row.names = c(NA, -15L), class = c("tbl_df", "tbl",
"data.frame"))
uj5u.com熱心網友回復:
我相信這是你所期望的:
my_df <- my_df %>% group_by(country, year) %>% mutate(income_imputed = ifelse(income > (10* median(income,na.rm = T)), (10* median(income,na.rm = T)), income))
uj5u.com熱心網友回復:
df
country year income income2
<dbl> <dbl> <dbl> <dbl>
1 1 1999 5000 5000
2 1 1999 5000 5000
3 1 1999 10000000 100000
4 1 1999 3000 3000
5 1 2000 4000 4000
6 1 2000 4000 4000
7 1 2000 20000000 100000
8 1 2000 4000 4000
9 2 1999 10000 10000
10 2 1999 10000 10000
11 2 1999 30000000 100000
12 2 1999 4000 4000
13 2 2000 12000 12000
14 2 2000 12000 12000
15 2 2000 40000000 100000
使用 dplyr:
df %>% group_by(country,year) %>% mutate(income_imputed =ifelse(income > (10* median(income,na.rm = T)), (10* median(income,na.rm = T)),income))
輸出:
country year income income2 income_imputed
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1999 5000 5000 5000
2 1 1999 5000 5000 5000
3 1 1999 10000000 100000 50000
4 1 1999 3000 3000 3000
5 1 2000 4000 4000 4000
6 1 2000 4000 4000 4000
7 1 2000 20000000 100000 40000
8 1 2000 4000 4000 4000
9 2 1999 10000 10000 10000
10 2 1999 10000 10000 10000
11 2 1999 30000000 100000 100000
12 2 1999 4000 4000 4000
13 2 2000 12000 12000 12000
14 2 2000 12000 12000 12000
15 2 2000 40000000 100000 120000
uj5u.com熱心網友回復:
將 group_by() 放在計算之前應該可以解決問題
library(dplyr)
df <- df %>%
group_by(country, year) %>%
mutate(income2 = ifelse(income > 10 * median(income, na.rm = TRUE),
yes = 10 * median(income, na.rm = TRUE),
no = income))
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/376486.html
上一篇:如何在年尺度上繪制月份值?
