根據上一篇文章,我可以通過以下方式添加一個列,其中包含過去一年中出現的次數:
df[, boundary := date - 365]
df[, counts := df[df, .N, on = .(id, date < date, date > boundary), by = .EACHI]$N]
這對我來說很好用。但是,我想通過僅計算另一列具有特定值的出現次數來做到這一點。例如,給定這樣的資料集
id type date
ny 0 2021-09-27
ny 0 2021-09-09
ny 1 2021-08-01
ny 1 2021-07-07
ch 0 2020-04-01
ch 1 2020-03-01
ch 0 2020-02-01
我只想計算其中的行數type = 1。我怎樣才能修改上面的函式來做到這一點?我試過這樣的事情,但它不起作用:
df[, counts := df[df, .N(type = 1), on = .(id, date < date, date > boundary), by = .EACHI]$N]
編輯:上述資料集的預期輸出為:
id type date counts
ny 0 2021-09-27 2
ny 0 2021-09-09 2
ny 1 2021-08-01 1
ny 1 2021-07-07 0
ch 0 2020-04-01 1
ch 1 2020-03-01 0
ch 0 2020-02-01 0
uj5u.com熱心網友回復:
您可以計算sum(type == 1)而不是.N。
setDT(df)
df[, boundary := date - 365]
df[, counts := df[df, sum(type == 1),
on = .(id, date < date, date > boundary), by = .EACHI]$V1]
df[is.na(counts), counts := 0]
df
# id type date boundary counts
#1: ny 0 2021-09-27 2020-09-27 2
#2: ny 0 2021-09-09 2020-09-09 2
#3: ny 1 2021-08-01 2020-08-01 1
#4: ny 1 2021-07-07 2020-07-07 0
#5: ch 0 2020-04-01 2019-04-02 1
#6: ch 1 2020-03-01 2019-03-02 0
#7: ch 0 2020-02-01 2019-02-01 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/324573.html
上一篇:如何顯示計算出的時間差的資料框?
下一篇:用R中的變數替換串列鍵名
