我有一個面板資料集,可能看起來像
set.seed(123)
df <- data.frame(
year = rep(2011:2020,5),
county = rep(c("a","b",'c','d','e'), each=10),
state = rep(c("A","B",'C','D','E'), each=10),
country = rep(c("AA","BB",'CC','DD','EE'), each=10),
var1 = runif(50, 0, 50),
var2 = runif(50, 50, 100)
)
我想將面板資料集轉換為縣的 5 年平均值
df <- df %>%
mutate(period = cut(df$year, seq(2011, 2021, by = 5),right = F)) %>%
group_by(county, period) %>%
summarise_all(mean)
資料集看起來像
county period year state country var1 var2
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a [2011,2016) 2013 NA NA 33.1 69.7
2 a [2016,2021) 2018 NA NA 24.7 73.6
3 b [2011,2016) 2013 NA NA 27.6 72.3
4 b [2016,2021) 2018 NA NA 24.7 83.1
5 c [2011,2016) 2013 NA NA 38.7 75.7
6 c [2016,2021) 2018 NA NA 22.8 66.8
7 d [2011,2016) 2013 NA NA 33.8 72.2
8 d [2016,2021) 2018 NA NA 20.0 83.7
9 e [2011,2016) 2013 NA NA 14.9 71.0
10 e [2016,2021) 2018 NA NA 19.6 70.4
例如,變暖資訊是
In mean.default(state) :
argument is not numeric or logical: returning NA
有沒有一種聰明的方法(實際上不是通過合并,我有很多字符列)來保持每個縣在轉換后的時不變字符?我渴望的是
county period year state country var1 var2
<chr> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 a [2011,2016) 2013 A AA 33.1 69.7
2 a [2016,2021) 2018 A AA 24.7 73.6
3 b [2011,2016) 2013 B BB 27.6 72.3
4 b [2016,2021) 2018 B BB 24.7 83.1
5 c [2011,2016) 2013 C CC 38.7 75.7
6 c [2016,2021) 2018 C CC 22.8 66.8
7 d [2011,2016) 2013 D DD 33.8 72.2
8 d [2016,2021) 2018 D DD 20.0 83.7
9 e [2011,2016) 2013 E EE 14.9 71.0
10 e [2016,2021) 2018 E EE 19.6 70.4
先感謝您!
uj5u.com熱心網友回復:
警告結果不僅計算&上的summarise_all(mean)平均值,而且計算&上的平均值。如果要保留和作為分組列,則應將它們放入:var1var2statecountrystatecountrygroup_by()
library(dplyr)
df %>%
group_by(county, state, country,
period = cut(year, seq(2011, 2021, by = 5), right = FALSE)) %>%
summarise_all(mean) %>%
ungroup()
# # A tibble: 10 × 7
# county state country period year var1 var2
# <chr> <chr> <chr> <fct> <dbl> <dbl> <dbl>
# 1 a A AA [2011,2016) 2013 33.1 69.7
# 2 a A AA [2016,2021) 2018 24.7 73.6
# 3 b B BB [2011,2016) 2013 27.6 72.3
# 4 b B BB [2016,2021) 2018 24.7 83.1
# 5 c C CC [2011,2016) 2013 38.7 75.7
# 6 c C CC [2016,2021) 2018 22.8 66.8
# 7 d D DD [2011,2016) 2013 33.8 72.2
# 8 d D DD [2016,2021) 2018 20.0 83.7
# 9 e E EE [2011,2016) 2013 14.9 71.0
# 10 e E EE [2016,2021) 2018 19.6 70.4
如果分組列是簡單的countyand period,并且其他分類變數在每個組中是唯一的,則可以通過將第一個值保留為first()while doing來保留它們summarise()。
df %>%
group_by(county,
period = cut(year, seq(2011, 2021, by = 5), right = FALSE)) %>%
summarise(across(!where(is.numeric), first),
across( where(is.numeric), mean)) %>%
ungroup()
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/480600.html
