Rdata.table/frame中不同列中前行值與當前行值的總和比率-有解無憂

我的問題希望很容易理解，但在編碼術語中并不容易。一旦我們弄清楚其他人會尋找什么，我將調整標題以獲得更準確/通用的術語。我想每天計算一個 HHI（赫芬達爾指數），也就是說，取每個industry國家/地區的份額平方和。Value_country是id上一年按國家/地區value_country_industry劃分的ids數量，是上一年每個國家/地區和行業的s數量。

計算示例：HHI2017/01/03計算如下(3/5)^2 (industry A) (1/5)^2 (industry B) (1/5)^2 (industry C) = 0.44，分母等于value_country各自日期，分子是value_country_industry每個行業的最新值（即份額，但在當前日期。這意味著我不能只使用該share列。）

擴展到更大資料（并可能使用NAs）的解決方案將是理想的（因此，data.table標簽）。

示例資料

library(data.table)
library(dplyr)
ID    <- c("1","2","3","4","5","6")
Date <- c("2017-01-01","2017-01-02", "2017-01-02", "2017-01-02", "2017-01-03","2017-01-02")
Industry <- c("A","A","B","C","A","A")
Country <- c("UK","UK","UK","UK","UK","US")
Value_country<- c(1,4,4,4,5,1)
Value_country_industry<- c(1,2,1,1,3,1)
Share <- c(1,0.5,0.25,0.25,0.6,1)
Desired <- c(1,0.375,0.375,0.375,0.44,1)

dt <- data.frame(id=ID, date=Date, industry=Industry, country=Country, value_country=Value_country, value_country_industry=Value_country_industry, desired_output=Desired)
setDT(dt)[, date := as.Date(date)]

uj5u.com熱心網友回復：

一種方法是將資料更廣泛，以便更容易地解釋“最近的”value_country_industry”規則。
然后填充并用 0 替換任何 NA。
然后可以跨列計算 HHI，確保它無論如何都能作業有多少行業。

library(magrittr)
dt_wide <- dt[, -c('desired_output')] %>% 
  setnames(c('value_country', 'value_country_industry'), c('vc', 'vci')) %>% 
  dcast(country   date ~ industry, fun.aggregate = last, fill = NA, 
        value.var = c('vc', 'vci'))
vci_cols <- names(dt_wide) %>% .[grepl('vci', .)]
dt_wide[, (vci_cols) := lapply(.SD, nafill, type = 'locf'), by = 'country', 
        .SDcols = vci_cols] %>% 
  setnafill(fill = 0L, cols = 3:length(.))
dt_wide[, num := Reduce(' ', .SD ^ 2), .SDcols = patterns('vci_')]
dt_wide[, den := Reduce('pmax', .SD ^ 2), .SDcols = patterns('vc_')]
dt_wide[, hhi := num / den]
dt_wide[, c('num', 'den') := NULL]
dt[dt_wide, hhi := hhi, on = c('country', 'date')]
dt
   id       date industry country value_country value_country_industry desired_output   hhi
1:  1 2017-01-01        A      UK             1                      1          1.000 1.000
2:  2 2017-01-02        A      UK             4                      2          0.375 0.375
3:  3 2017-01-02        B      UK             4                      1          0.375 0.375
4:  4 2017-01-02        C      UK             4                      1          0.375 0.375
5:  5 2017-01-03        A      UK             5                      3          0.440 0.440
6:  6 2017-01-02        A      US             1                      1          1.000 1.000

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/351301.html

標籤：r 数据表

上一篇：如何按年份創建百分比列并輸入R

下一篇：R：基于索引的應用到向量