我有一張這樣的桌子:
country continent date n_case Ex TD TC
--------------------------------------------------------------------------------
Italy Europe 2022-02-24 6 NA 2 90
Italy Europe 2022-01-17 12 87 2 86
USA America 2022-02-23 NA NA 3 65
USA America 2022-01-08 6 NA 5 67
USA America 2022-01-04 6 7 7 87
etc etc...
我希望的是一個新的資料框,每個國家有一行,每個國家都將國家名稱(列 = 國家)和大陸(列 = 大陸)存盤為列,以及列中每個值報告的最新日期(日期、n_case、Ex、TD、TC):
理想的資料框:
country continent date n_case Ex TD TC
--------------------------------------------------------------------------------
Italy Europe 2022-02-24 6 87 2 90
USA America 2022-02-23 6 7 3 65
etc etc..
要忽略的值是 NA 或“”(空白)
謝謝你!
uj5u.com熱心網友回復:
使用dplyr,您可以按日期對資料進行降序排序,然后在每列中選擇第一個非 NA 值。
library(dplyr)
df %>%
group_by(country, continent) %>%
arrange(desc(date), .by_group = TRUE) %>%
summarise(across(everything(), ~ .x[!is.na(.x)][1])) %>%
ungroup()
# # A tibble: 2 × 7
# country continent date n_case Ex TD TC
# <chr> <chr> <date> <int> <int> <int> <int>
# 1 Italy Europe 2022-02-24 6 87 2 90
# 2 USA America 2022-02-23 6 7 3 65
資料
df <- structure(list(country = c("Italy", "Italy", "USA", "USA", "USA"),
continent = c("Europe", "Europe", "America", "America", "America"),
date = structure(c(19047, 19009, 19046, 19000, 18996), class = "Date"),
n_case = c(6L, 12L, NA, 6L, 6L), Ex = c(NA, 87L, NA, NA, 7L),
TD = c(2L, 2L, 3L, 5L, 7L), TC = c(90L, 86L, 65L, 67L, 87L)),
row.names = c(NA, -5L), class = "data.frame")
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/474491.html
