我想計算上一年Aindustry組 ( country) 與B ( )組的唯一值出現次數的平方和(分別為行數)。
計算示例第 5 行:(2x A 1x B 1x C = 2^2 1^2 ^ 1^2 = 6不包括第 1 行中的 A,因為它早于一年,也不包括第 6 行中的 A,因為它在另一個國家/地區)。
我設法按行計算數字,但未能將其移至匯總日期級別:
dt[, count_by_industry:= sapply(date, function(x) length(industry[between(date, x - lubridate::years(1), x)])),
by = c("country", "industry")]
該解決方案理想地擴展到具有約 200 萬行和大約 1 萬個日期和組元素(因此是data.table標簽)的真實資料。
示例資料
ID <- c("1","2","3","4","5","6")
Date <- c("2016-01-02","2017-01-01", "2017-01-03", "2017-01-03", "2017-01-04","2017-01-03")
Industry <- c("A","A","B","C","A","A")
Country <- c("UK","UK","UK","UK","UK","US")
Desired <- c(1,4,3,3,6,1)
library(data.table)
dt <- data.frame(id=ID, date=Date, industry=Industry, country=Country, desired_output=Desired)
setDT(dt)[, date := as.Date(date)]
uj5u.com熱心網友回復:
從一開始就適應:
dt[, output:= sapply(date, function(x) sum(table(industry[between(date, x - lubridate::years(1), x)]) ^ 2)),
by = c("country")]
dt
id date industry country desired_output output
1: 1 2016-01-02 A UK 1 1
2: 2 2017-01-01 A UK 4 4
3: 3 2017-01-03 B UK 3 3
4: 4 2017-01-03 C UK 3 3
5: 5 2017-01-04 A UK 6 6
6: 6 2017-01-03 A US 1 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/356537.html
