如何在R中的資料框中進行組劃分？-有解無憂

我有以下資料框：

df <- tibble(year = c("2020","2020","2020","2021","2021","2021"), 
             website = c("google","facebook","twitter","google","facebook","twitter"), 
             category = c("big","big","small","big","big","small"), 
             value = c(10,20,30,40,50,60))

我如何計算不同年份之間的變化？

因此，例如，我想比較 2021 和 2020，我如何在 R 中做到這一點？

例如輸出應該是這樣的

年	網站	類別	比較
2021 年與 2020 年	谷歌	大的	4
2021 年與 2020 年	Facebook	大的	2.5
2021 年與 2020 年	推特	小的	2

列比較實際上是當年的值/上一年的值

我不太確定如何在 dplyr 中執行此操作？

uj5u.com熱心網友回復：

我會將年份列旋轉到兩個不同的列，然后使用mutate()to calcualte comparison。

這是一個例子：

library(dplyr)

df %>% 
  tidyr::pivot_wider(names_from = year) %>% 
  mutate(
    comparison = `2021`/`2020`
  )
#> # A tibble: 3 × 5
#>   website  category `2020` `2021` comparison
#>   <chr>    <chr>     <dbl>  <dbl>      <dbl>
#> 1 google   big          10     40        4  
#> 2 facebook big          20     50        2.5
#> 3 twitter  small        30     60        2

^{由reprex 包創建于 2022-04-04 (v2.0.1)}

之后您可以洗掉多余的列select()。

注意：如果您的列沒有被呼叫value，那么您需要將此引數添加到pivot_wider(): values_from = <your column name with values>。

更新：如果您想以長格式保存資料

只需使用group_by(website),arrange(year)和lag()里面mutate()：

library(dplyr)

df %>% 
  group_by(website) %>% 
  arrange(year) %>% 
  mutate(
    comparison = value / lag(value)
  )
#> # A tibble: 6 × 5
#> # Groups:   website [3]
#>   year  website  category value comparison
#>   <chr> <chr>    <chr>    <dbl>      <dbl>
#> 1 2020  google   big         10       NA  
#> 2 2020  facebook big         20       NA  
#> 3 2020  twitter  small       30       NA  
#> 4 2021  google   big         40        4  
#> 5 2021  facebook big         50        2.5
#> 6 2021  twitter  small       60        2

^{由reprex 包創建于 2022-04-04 (v2.0.1)}

uj5u.com熱心網友回復：

你可以做

library(tidyverse)

df %>% 
  group_by(website) %>%
  arrange(year) %>%
  summarize(year = paste(year[2], year[1], sep = ' vs '),
            category = category[1],
            comparison = value[2] / value[1]) 
#>  A tibble: 3 x 4
#>   website  year         category comparison
#>   <chr>    <chr>        <chr>         <dbl>
#> 1 facebook 2021 vs 2020 big             2.5
#> 2 google   2021 vs 2020 big             4  
#> 3 twitter  2021 vs 2020 small           2

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/455788.html

標籤：r 数据框 dplyr tidyverse

上一篇：如何在Python中使用正確的鍵值對將有序集合轉換為資料框？

下一篇：如何根據熊貓中另一列的值修剪字串？