如何使用矢量化函式將不同資料框中的所有值乘以R中的給定ID？-有解無憂

我有一個包含 750,000 個 ID 的龐大資料集，我想通過將給定 ID 的所有值相乘來將月度值聚合為年度值。ID 由識別號和年份的組合組成。

我要提取的資料：

ID	月值
1 - 1997	1997年月值乘積
1 - 1998	1998 年月值乘積
1 - 1999	1999 年月值乘積
...	...
2 - 1997	1997年月值乘積
2 - 1998 年	1998 年月值乘積
2 - 1999	1999 年月值乘積
...	...

作為源的資料集：

ID	月值
1 - 1997	1997 年的月值 1
1 - 1997	1997 年的月值 2
1 - 1997	1997 年的月值 3
...	...
2 - 1997	1997 年的月值 1
2 - 1997	1997 年的月值 2
2 - 1997	1997 年的月值 3
...	...

我寫了一個 for 回圈，10 個 ID 大約需要 0.74 秒，這太慢了。整個資料運行大約需要 15 個小時。for 回圈將給定 ID 的所有月值相乘，并將其存盤在單獨的資料框中。

for (i in 1:nrow(yearlyreturns)){
  
  yearlyreturns[i, "yret"] <- prod(monthlyreturns[monthlyreturns$ID == yearlyreturns[i,"ID"],"change"]) - 1
  yearlyreturns[i, "monthcount"] <- length(monthlyreturns[monthlyreturns$ID == yearlyreturns[i,"ID"],"change"])
  
}

我不知道如何從這里得到一個矢量化函式，這需要更少的時間。

這可以在R中做到嗎？

uj5u.com熱心網友回復：

像這樣的東西：

library(dplyr)

df %>% 
  mutate(monthly_value = paste("Product of", str_replace(monthly_value, 'Value\\s\\d', 'Values'))) %>% 
  group_by(ID, monthly_value) %>% 
  summarise()

  ID       monthly_value                         
  <chr>    <chr>                                 
1 1 - 1997 Product of Monthly Values in Year 1997
2 2 - 1997 Product of Monthly Values in Year 1997

資料：

structure(list(ID = c("1 - 1997", "1 - 1997", "1 - 1997", "2 - 1997", 
"2 - 1997", "2 - 1997"), monthly_value = c("Monthly Value 1 in Year 1997", 
"Monthly Value 2 in Year 1997", "Monthly Value 3 in Year 1997", 
"Monthly Value 1 in Year 1997", "Monthly Value 2 in Year 1997", 
"Monthly Value 3 in Year 1997")), class = "data.frame", row.names = c(NA, 
-6L))

uj5u.com熱心網友回復：

根據for回圈代碼，這可能是通過連接完成的

library(data.table)
setDT(yearlyreturns)[monthlyreturns, c("yret", "monthcount") 
     := .(prod(change) -1, .N), on = .(ID), by = .EACHI]

uj5u.com熱心網友回復：

除了最優秀的先前答案之外 -這里是一個鏈接到較早的帖子，比較了按組計算平均值的 10 種常用方法。基于 Data.table 的解決方案絕對是要走的路——尤其是對于具有數百萬行的資料集。除非您正在寫入單個輸出檔案 - 我不確定為什么這需要幾個小時而不是幾分鐘。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/429330.html

標籤：r for循环聚合

上一篇：對于回圈列印所有步驟，但我只需要最后一個

下一篇：Python：檢查矩形內的所有點