我試圖在R中用for回圈做一個累加的功能,因為公司提供的財務資訊是針對不同概念累積的(這意味著一月的資訊只是一月的,二月的資訊是一月的總和和二月,三月中的一個是一月、二月和三月的總和,以此類推)。
例如,假設我有下一個資料框:
Concepts <- c("Concept1", "Concept2", "Concept3")
January <- c(5,10,16)
February <- c(9,14,20)
March <- c(16,20,23)
df <- data.frame(Concepts, January, February, March)
這將為我提供下一個資料框:
Concepts January February March
Concept1 5 9 16
Concept2 10 14 20
Concept3 16 20 23
我需要實作的是下一個資料框(注意二月是二月和一月的差,三月是二月和三月的差):
Concepts January February March
Concept1 5 4 7
Concept2 10 4 6
Concept3 16 4 3
為了實作第二個資料幀,我首先創建了一個具有相同數量 df 行的空資料幀,然后使用 for 回圈 cbind 資料幀的前兩行(因為它們不需要任何操作)并使用索引添加下一個計算差值后的。上面的代碼如下:
df <- data.frame(Concepts, January, February, March)
df2 <- data.frame(matrix(nrow=nrow(df),ncol=ncol(df))) #Empty Dataframe with the same number of rows
for(i in 1:ncol(df)) {
if(i == 1){
df2 <- cbind(df2, df[ , i])
} else if (i == 2){
df2 <- cbind(df2, df[, i])
} else {
diference <- df[,i] - df[,i-1]
df2 <- cbind(df2,diference)
}
我收到以下錯誤:
[.data.table(df, , i) 中的錯誤:j([...] 中的第二個引數)是單個符號,但未找到列名“i”。也許您打算使用 DT[, ..i]。與 data.frame 的這種差異是經過深思熟慮的,并在 FAQ 1.1 中進行了解釋。
我很想收到對我的代碼的更正或一些替代方案,讓我能夠為多年的資料幀計算上述內容。
uj5u.com熱心網友回復:
首先請注意,如果您apply將函式diff設定為月份列,您將減少一列但轉置。
apply(df[-1], 1, diff)
# [,1] [,2] [,3]
#February 4 4 4
#March 7 6 4
所以轉置它以獲得正確的方向。
t(apply(df[-1], 1, diff))
# February March
#[1,] 4 7
#[2,] 4 6
#[3,] 4 4
而且cbind它與第一兩列。由于第一個引數是 data.frame 的子集,因此呼叫的方法是cbind.data.frame,結果也是 df。
cbind(df[1:2], t(apply(df[-1], 1, diff)))
# Concepts January February March
#1 Concept1 5 4 7
#2 Concept2 10 4 6
#3 Concept3 15 4 4
uj5u.com熱心網友回復:
這可能不是最優雅的,但它應該有效。訣竅是提取資料框的數字部分,并將差異逐行應用于結果,將其轉置,然后將其粘貼回初始值。
df <- data.frame(Concepts = c("Concept1", "Concept2", "Concept3"),
January = c(5,10,16),
February = c(9,14,20),
March = c(16,20,23),
April = c(20, 27, 33))
dfdiff <- apply(df[, -1L], 1L, diff)
df2 <- data.frame(Concepts = c("Concept1", "Concept2", "Concept3"),
January = c(5,10,16))
df2 <- cbind(df2, t(dfdiff))
df2
Concepts January February March April
1 Concept1 5 4 7 4
2 Concept2 10 4 6 7
3 Concept3 16 4 3 10
現在你知道它是如何作業的,為了更有效的呼叫,你可以這樣做:
df2 <- cbind(df[, 1:2], t(apply(df[, -1L], 1L, diff)))
這應該適用于您上面的結構的任何大小的資料框:一個標題列,其余是累積資料列。
與 tidyverse 方法的速度比較
microbenchmark(TV = df2 <- df %>% pivot_longer(!Concepts) %>% group_by(Concepts) %>%
dplyr::mutate(value2 = value - lag(value, default = first(value))) %>%
rowwise %>%mutate(value2 = ifelse(value2 == 0, value, value2)) %>%
select(-value) %>%pivot_wider(names_from = "name", values_from = "value2"),
BASE = df2 <- cbind(df[, 1:2], t(apply(df[, -1L], 1L, diff))),
times = 1000L, control = list(order = 'block'))
Unit: microseconds
expr min lq mean median uq max neval cld
TV 11141.7 11554.05 12245.1253 11803.75 12300.25 22903.5 1000 b
BASE 160.4 164.35 176.8356 165.70 168.70 3833.2 1000 a
uj5u.com熱心網友回復:
一個tidyverse解決方案:
library(tidyverse)
df %>%
pivot_longer(!Concepts) %>%
group_by(Concepts) %>%
mutate(value2 = value - lag(value, default = first(value))) %>%
rowwise %>%
mutate(value2 = ifelse(value2 == 0, value, value2)) %>%
select(-value) %>%
pivot_wider(names_from = "name", values_from = "value2")
輸出
# A tibble: 3 × 4
Concepts January February March
<chr> <dbl> <dbl> <dbl>
1 Concept1 5 4 7
2 Concept2 10 4 6
3 Concept3 16 4 3
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/397803.html
