我有一個平衡的面板資料,其中每年都會記錄 ID (cnpjcei),顯示給定公司的雇員總數。我的目標是計算資料庫中所有年份的 (t) 中的員工和 (t-1) 中的員工之間的差異(如果是 empreg(t) - empreg(t-1))
# A tibble: 386,763 x 3
ano cnpjcei empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12
3 2008 1000786001505 16
4 2009 1000786001505 19
5 2010 1000786001505 7
6 2011 1000786001505 7
7 2012 1000786001505 7
8 2013 1000786001505 7
9 2014 1000786001505 8
10 2015 1000786001505 9
# ... with 386,753 more rows
像這樣的東西:
# A tibble: 386,763 x 4
ano cnpjcei empreg variation_empreg
<dbl> <chr> <dbl>
1 2006 1000786001505 10
2 2007 1000786001505 12 2
3 2008 1000786001505 16 4
4 2009 1000786001505 19 3
5 2010 1000786001505 7 -12
6 2011 1000786001505 7 0
7 2012 1000786001505 7 0
8 2013 1000786001505 7 0
9 2014 1000786001505 8 1
10 2015 1000786001505 9 1
# ... with 386,753 more rows
有沒有人有任何想法?謝謝 :)
uj5u.com熱心網友回復:
您可以使用diff:
library(dplyr)
df %>% mutate(variation_empreg = c(NA, diff(empreg)))
#> ano cnpjcei empreg variation_empreg
#> 1 2006 1000786001505 10 NA
#> 2 2007 1000786001505 12 2
#> 3 2008 1000786001505 16 4
#> 4 2009 1000786001505 19 3
#> 5 2010 1000786001505 7 -12
#> 6 2011 1000786001505 7 0
#> 7 2012 1000786001505 7 0
#> 8 2013 1000786001505 7 0
#> 9 2014 1000786001505 8 1
#> 10 2015 1000786001505 9 1
資料
df <- structure(list(ano = 2006:2015, cnpjcei = c("1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505", "1000786001505", "1000786001505", "1000786001505",
"1000786001505"), empreg = c(10L, 12L, 16L, 19L, 7L, 7L, 7L,
7L, 8L, 9L)), row.names = c("1", "2", "3", "4", "5", "6", "7",
"8", "9", "10"), class = "data.frame")
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/446290.html
