我的問題希望很容易理解,但在編碼術語中并不容易。一旦我們弄清楚其他人會尋找什么,我將調整標題以獲得更準確/通用的術語。我想每天計算一個 HHI(赫芬達爾指數),也就是說,取每個industry國家/地區的份額平方和。Value_country是id上一年按國家/地區value_country_industry劃分的ids數量,是上一年每個國家/地區和行業的s數量。
計算示例:HHI2017/01/03計算如下(3/5)^2 (industry A) (1/5)^2 (industry B) (1/5)^2 (industry C) = 0.44,分母等于value_country各自日期,分子是value_country_industry每個行業的最新值(即份額,但在當前日期。這意味著我不能只使用該share列。)
擴展到更大資料(并可能使用NAs)的解決方案將是理想的(因此,data.table標簽)。
示例資料
library(data.table)
library(dplyr)
ID <- c("1","2","3","4","5","6")
Date <- c("2017-01-01","2017-01-02", "2017-01-02", "2017-01-02", "2017-01-03","2017-01-02")
Industry <- c("A","A","B","C","A","A")
Country <- c("UK","UK","UK","UK","UK","US")
Value_country<- c(1,4,4,4,5,1)
Value_country_industry<- c(1,2,1,1,3,1)
Share <- c(1,0.5,0.25,0.25,0.6,1)
Desired <- c(1,0.375,0.375,0.375,0.44,1)
dt <- data.frame(id=ID, date=Date, industry=Industry, country=Country, value_country=Value_country, value_country_industry=Value_country_industry, desired_output=Desired)
setDT(dt)[, date := as.Date(date)]
uj5u.com熱心網友回復:
一種方法是將資料更廣泛,以便更容易地解釋“最近的”value_country_industry”規則。
然后填充并用 0 替換任何 NA。
然后可以跨列計算 HHI,確保它無論如何都能作業有多少行業。
library(magrittr)
dt_wide <- dt[, -c('desired_output')] %>%
setnames(c('value_country', 'value_country_industry'), c('vc', 'vci')) %>%
dcast(country date ~ industry, fun.aggregate = last, fill = NA,
value.var = c('vc', 'vci'))
vci_cols <- names(dt_wide) %>% .[grepl('vci', .)]
dt_wide[, (vci_cols) := lapply(.SD, nafill, type = 'locf'), by = 'country',
.SDcols = vci_cols] %>%
setnafill(fill = 0L, cols = 3:length(.))
dt_wide[, num := Reduce(' ', .SD ^ 2), .SDcols = patterns('vci_')]
dt_wide[, den := Reduce('pmax', .SD ^ 2), .SDcols = patterns('vc_')]
dt_wide[, hhi := num / den]
dt_wide[, c('num', 'den') := NULL]
dt[dt_wide, hhi := hhi, on = c('country', 'date')]
dt
id date industry country value_country value_country_industry desired_output hhi
1: 1 2017-01-01 A UK 1 1 1.000 1.000
2: 2 2017-01-02 A UK 4 2 0.375 0.375
3: 3 2017-01-02 B UK 4 1 0.375 0.375
4: 4 2017-01-02 C UK 4 1 0.375 0.375
5: 5 2017-01-03 A UK 5 3 0.440 0.440
6: 6 2017-01-02 A US 1 1 1.000 1.000
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/351301.html
上一篇:如何按年份創建百分比列并輸入R
下一篇:R:基于索引的應用到向量
