我在電子表格中有這些資料
Country Sales
Spain 1 1000
Spain 2 200
France 300
Nigeria 1 500
Nigeria 2 700
我希望這個國家的銷售額總和存盤一個單獨的資料框。
我嘗試使用 dplyr 函式,但結果不是我想要的
這是我想要的輸出
Country Sum_of_sales
Spain 1200
France 300
Nigeria 1200
有沒有一種方法可以在 R 上運行它,這會給我存盤在單獨的資料框中的上述輸出。
uj5u.com熱心網友回復:
去掉國家標簽的尾隨數字(和空格),然后做一個正常的分組求和:
library(dplyr)
df %>%
mutate(Country= gsub(pattern = " *[0-9] ", replacement = "", x = Country)) %>%
group_by(Country) %>%
summarize(Sum_of_Sales = sum(Sales))
# # A tibble: 3 × 2
# Country Sum_of_Sales
# <chr> <int>
# 1 France 300
# 2 Nigeria 1200
# 3 Spain 1200
使用此示例輸入:
df = read.table(text = "Country Sales
'Spain 1' 1000
'Spain 2' 200
'France' 300
'Nigeria 1' 500
'Nigeria 2' 700", header = T)
uj5u.com熱心網友回復:
如果你想保留它tidyverse,你可以使用extract:
library(tidyr)
library(dplyr)
df %>%
extract(Country, "Country") %>%
group_by(Country) %>%
summarise(Sum_of_sales = sum(Sales))
# A tibble: 3 × 2
Country Sum_of_sales
<chr> <int>
1 France 300
2 Nigeria 1200
3 Spain 1200
對于更復雜的情況,您可以使用:
extract(Country, "Country", "([A-Za-z -]*)\\s*[0-9]*")
uj5u.com熱心網友回復:
一個選項str_remove
library(dplyr)
library(stringr)
df %>%
group_by(Country = str_remove(Country, "\\s \\d ")) %>%
summarise(Sum_of_sales = sum(Sales))
-輸出
# A tibble: 3 × 2
Country Sum_of_sales
<chr> <int>
1 France 300
2 Nigeria 1200
3 Spain 1200
uj5u.com熱心網友回復:
library(dplyr)
df <- data.frame(
Country = c("Spain","Spain","France","Nigeria","Nigeria"),
Sales = c(1000,200,300,500,700)
)
df_by_country <-
df %>%
group_by(Country) %>%
summarise(Sales = sum(Sales,na.rm = TRUE))
df_by_country
# A tibble: 3 x 2
Country Sales
<chr> <dbl>
1 France 300
2 Nigeria 1200
3 Spain 1200
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/527009.html
標籤:rdplyr
