我正在做一個瘋狂三月的專案。我有df.A每個球隊和每個賽季的資料框。例如:
Season Team Name Code
2003 Creighton 2003-1166
2003 Notre Dame 2003-1323
2003 Arizona 2003-1112
還有另一個資料框df.B,其中包含每個賽季每場比賽的比賽結果:
WTeamScore LTeamScore WTeamCode LTeamCode
15 10 2003-1166 2003-1323
20 15 2003-1323 2003-1112
10 5 2003-1112 2003-1166
我試圖得到一個列,df.A其中包含輸贏的總點數。基本上:
Season Team Name Code Points
2003 Creighton 2003-1166 20
2003 Notre Dame 2003-1323 30
2003 Arizona 2003-1112 25
每個資料框中顯然還有數千行,但這是一般的想法。解決這個問題的最佳方法是什么?
uj5u.com熱心網友回復:
這是另一個使用 的選項tidyverse,我們可以在其中df.B轉到長格式,然后獲取每個團隊的總和,然后重新加入df.A。
library(tidyverse)
df.B %>%
pivot_longer(everything(),names_pattern = "(WTeam|LTeam)(.*)",
names_to = c("rep", ".value")) %>%
group_by(Code) %>%
summarise(Points = sum(Score)) %>%
left_join(df.A, ., by = "Code")
輸出
Season Team.Name Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
資料
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), Team.Name = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")), class = "data.frame", row.names = c(NA,
-3L))
uj5u.com熱心網友回復:
我們可以使用'df.A'上的'Code'到'WTeamCode',df.B中的'LTeamCode'之間的match(from )來獲取匹配索引,提取相應的'Score'列并得到總和( )base R
df.A$Points <- with(df.A, df.B$WTeamScore[match(Code,
df.B$WTeamCode)]
df.B$LTeamScore[match(Code, df.B$LTeamCode)])
-輸出
> df.A
Season TeamName Code Points
1 2003 Creighton 2003-1166 20
2 2003 Notre Dame 2003-1323 30
3 2003 Arizona 2003-1112 25
如果有不匹配導致缺失值 ( NA) 來自match,cbind則要創建的向量 amatrix并使用rowSumswithna.rm = TRUE
df.A$Points <- with(df.A, rowSums(cbind(df.B$WTeamScore[match(Code,
df.B$WTeamCode)],
df.B$LTeamScore[match(Code, df.B$LTeamCode)]), na.rm = TRUE))
資料
df.A <- structure(list(Season = c(2003L, 2003L, 2003L), TeamName = c("Creighton",
"Notre Dame", "Arizona"), Code = c("2003-1166", "2003-1323",
"2003-1112")), class = "data.frame", row.names = c(NA, -3L))
df.B <- structure(list(WTeamScore = c(15L, 20L, 10L), LTeamScore = c(10L,
15L, 5L), WTeamCode = c("2003-1166", "2003-1323", "2003-1112"
), LTeamCode = c("2003-1323", "2003-1112", "2003-1166")),
class = "data.frame", row.names = c(NA,
-3L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/442228.html
