我是 R 新手,并試圖將以下字符變數“Tax.Rate...”變異為四個不同的列(即 CGST、SGST、UTGST 和 IGST),稅率適用于該列下的那個標題。資料集示例如下:
df # A tibble: 3 x 1 Tax.Rate....
1 "CGST 2.5% SGST 2.5%" 2 "CGST 6% UTGST 6%"
3 "IGST 12% "
我曾嘗試使用 'separate' 和 'mutate' 函式,但收效甚微
任何指導將不勝感激
uj5u.com熱心網友回復:
我相信這也可以在基本 R 中簡潔地完成,但這里有一個 tidyverse 方法,我首先將資料在每個加號處拆分為一個新行,然后修剪額外的空格,然后拆分為兩列。
library(tidyverse)
df <- data.frame(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%", "IGST 12% "))
df %>%
mutate(orig_row = row_number()) %>% # optional, for later tracking
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("group", "rate"), extra = "merge", remove = FALSE)
# A tibble: 5 × 4
Tax.Rate group rate orig_row
<chr> <chr> <chr> <int>
1 CGST 2.5% CGST 2.5% 1
2 SGST 2.5% SGST 2.5% 1
3 CGST 6% CGST 6% 2
4 UTGST 6% UTGST 6% 2
5 IGST 12% IGST 12% 3
這將產生一個“長”形狀的表格,但如果您希望它“寬”,每個組(管轄權?)都有單獨的列,那么您可以添加以下內容:
[from the end of the "separate()" line] %>%
select(-Tax.Rate) %>%
pivot_wider(names_from = group, values_from = rate)
對于這個結果
# A tibble: 3 × 5
orig_row CGST SGST UTGST IGST
<int> <chr> <chr> <chr> <chr>
1 1 2.5% 2.5% NA NA
2 2 6% NA 6% NA
3 3 NA NA NA 12%
uj5u.com熱心網友回復:
我們可以:
- 使用
separate_rows通過分離使用\\逃脫特殊字符 - 然后
str_trim洗掉起始空間等... separate此列由" "4.group_by并添加id以避免嵌套輸出pivot_wider
library(dplyr)
library(tidyr)
library(stringr)
df %>%
separate_rows(Tax.Rate, sep = "\\ ") %>%
mutate(Tax.Rate = str_trim(Tax.Rate)) %>%
separate(Tax.Rate, c("name", "value"), sep = " ") %>%
group_by(name) %>%
mutate(id = row_number()) %>%
pivot_wider(
names_from = name,
values_from = value
) %>%
select(-id)
CGST SGST UTGST IGST
<chr> <chr> <chr> <chr>
1 2.5% 2.5% 6% 12%
2 6% NA NA NA
資料:
structure(list(Tax.Rate = c("CGST 2.5% SGST 2.5%", "CGST 6% UTGST 6%",
"IGST 12%")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-3L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/350422.html
上一篇:沒有找到“eslint”目標
