根據下面的資料,我如何sum
跨兩個時間段和?一些縣可能沒有特定財政年度的資料,但這并不重要,因為我們的想法是將這兩個時間段范圍內的資料相加。當然,最終資料集中會出現一些。我正在使用,因為有幾個同名的縣。因此,不再需要該列,因為它是and的串聯。Inflow
Outflow
NetMigration
InAGI
OutAGI
FIPS
2011-2015
2016-2020
NAs
FIPS
Key
FIPS
Year
所需的輸出模式/列:
FIPS County State TotInflow TotOutflow TotNetMigration TotInAGI TotOutAGI Time_Period
12001 Alachua County FL 2011-2015
12001 Alachua County FL 2016-2020
08001 Adams County CO 2011-2015
08001 Adams County CO 2016-2020
樣本資料:
df = structure(list(Key = c("080012020", "120012020", "120012018",
"120012017", "080012017", "120012016", "120012015", "080012014",
"120012013", "120012012", "080012012", "080012011", "080012016"
), County = c("Adams County", "Alachua County", "Alachua County",
"Alachua County", "Adams County", "Alachua County", "Alachua County",
"Adams County", "Alachua County", "Alachua County", "Adams County",
"Adams County", "Adams County"), State = c("CO", "FL", "FL",
"FL", "CO", "FL", "FL", "CO", "FL", "FL", "CO", "CO", "CO"),
FIPS = c("08001", "12001", "12001", "12001", "08001", "12001",
"12001", "08001", "12001", "12001", "08001", "08001", "08001"
), Inflow = c(38L, 261L, 321L, 339L, 58L, 288L, 254L, 46L,
413L, 433L, 30L, 42L, NA), InAGI = c(1817L, 6287L, 8423L,
8364L, 1865L, 14720L, 5224L, 1074L, 11774L, 10151L, 921L,
500L, NA), FiscalYear = c("2019- 2020", "2019- 2020", "2017 - 2018",
"2016 - 2017", "2016 - 2017", "2015 - 2016", "2014 - 2015",
"2013 - 2014", "2012 - 2013", "2011 - 2012", "2011 - 2012",
"2010 - 2011", "2015 - 2016"), Year = c(2020L, 2020L, 2018L,
2017L, 2017L, 2016L, 2015L, 2014L, 2013L, 2012L, 2012L, 2011L,
2016L), Outflow = c(54L, 447L, 444L, 558L, 44L, 436L, 334L,
49L, 466L, 495L, 39L, 31L, 51L), OutAGI = c(1879L, 13106L,
15409L, 16496L, 2408L, 12675L, 7448L, 733L, 10309L, 11677L,
847L, 605L, 1114L), NetMigration = c(-16L, -186L, -123L,
-219L, 14L, -148L, -80L, -3L, -53L, -62L, -9L, 11L, NA)), row.names = c(NA,
-13L), class = "data.frame")
uj5u.com熱心網友回復:
根據“年”是否介于某個開始年和結束年之間,按“FIPS”、“縣”、“州”和Time_Period
從“年”創建的列進行分組,然后通過回圈這些列來獲取sum
感興趣的列across
名字
library(dplyr)
df %>%
group_by(FIPS, County, State,
Time_Period = case_when(between(Year, 2011, 2015)~
'2011-2015', between(Year, 2016, 2020)~ '2016-2020')) %>%
summarise(across(c(Inflow, InAGI, Outflow, OutAGI, NetMigration),
~ sum(.x, na.rm = TRUE), .names = "Total{.col}"),.groups = "drop")
-輸出
# A tibble: 4 × 9
FIPS County State Time_Period TotalInflow TotalInAGI TotalOutflow TotalOutAGI TotalNetMigration
<chr> <chr> <chr> <chr> <int> <int> <int> <int> <int>
1 08001 Adams County CO 2011-2015 118 2495 119 2185 -1
2 08001 Adams County CO 2016-2020 96 3682 149 5401 -2
3 12001 Alachua County FL 2011-2015 1100 27149 1295 29434 -195
4 12001 Alachua County FL 2016-2020 1209 37794 1885 57686 -676
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/530805.html
標籤:rdplyr
上一篇:每個元素出現在向量中的百分比