我的任務是在 R 中從 SAS 重現流程。在過去 71 個月中,我有 1 個表,其中包含 140 萬行和 156 列。在列中只有 ID,這些將被文本替換。
為此,有 60 個查找表。其中一些被多次使用,而一些只使用一次。
我無法顯示真實資料,但這是表格外觀的一個小示例。:
df <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
feature_a = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1),
feature_b = c(3, 2, 1, 3, 2, 1, 3, 2, 1, 3),
feature_c = c(2, 3, 1, 2, 3, 1, 2, 3, 1, 2),
feature_d = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
feature_e = c(2, 1, 2, 1, 2, 1, 2, 1, 2, 1),
feature_f = c(2, 2, 1, 1, 2, 2, 1, 1, 2, 2))
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1001 1 3 2 1 2 2
1002 2 2 3 2 1 2
1003 3 1 1 1 2 1
1004 1 3 2 2 1 1
1005 2 2 3 1 2 2
1006 3 1 1 2 1 2
1007 1 3 2 1 2 1
1008 2 2 3 2 1 1
1009 3 1 1 1 2 2
1010 1 3 2 2 1 2
這些是 60 個查找表中的 2 個,它們被多次使用,例如,lookup_a 被使用了 8 次,lookup_b 被使用了 15 次:
lookup_a = tibble(id = c(1, 2, 3),
value = c("yes", "no", "yes, mandatory"))
lookup_b = tibble(id = c(1, 2),
value = c("yes", "no"))
這是所需結果的外觀(feature_a - c 使用 lookup_a 和 feature_d - f 使用 lookup b):
df_expected <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
feature_a = c("yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes"),
feature_b = c("yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory"),
feature_c = c("no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no"),
feature_d = c("yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes", "no"),
feature_e = c("no", "yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes"),
feature_f = c("no", "no", "yes", "yes", "no", "no", "yes", "yes", "no", "no"))
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1001 yes yes, mandatory no yes no no
1002 no no yes, mandatory no yes no
1003 yes, mandatory yes yes yes no yes
1004 yes yes, mandatory no no yes yes
1005 no no yes, mandatory yes no no
1006 yes, mandatory yes yes no yes no
1007 yes yes, mandatory no yes no yes
1008 no no yes, mandatory no yes yes
1009 yes, mandatory yes yes yes no no
1010 yes yes, mandatory no no yes no
我當然可以為每一列創建一個連接,但這并不令人滿意。我想保持盡可能少的連接數:
df %>%
left_join(lookup_a, by = c("feature_a" = "id")) %>%
select(-feature_a) %>%
rename(feature_a = value)
我也嘗試過使用 data.table 或 match 的不同方法,但我還沒有找到同時連接多個列的方法。我的問題是所有列都被更改了,而不是選定的列。
以下是我的問題:
- 有沒有辦法一次對多列的查找表進行連接/匹配(例如left_join)并使用列的名稱進行重命名?
- 或者是否可以一次替換多列的值?
可能我現在想的太復雜了,解決方法也比較簡單。
先感謝您!
uj5u.com熱心網友回復:
歡迎來到 SO!可以代替使用多列的值across在mutate動詞使用要改變的列索引(2至4為A至C的列,并且列的5至7天至f):
library(dplyr)
df %>%
mutate(across(2:4,
~case_when(. == 1 ~ "Yes",
. == 2 ~ "No",
. == 3 ~ "Yes, mandatory",
TRUE ~ "Error"))) %>%
mutate(across(5:7,
~case_when(. == 1 ~ "Yes",
. == 2 ~ "No",
TRUE ~ "Error")))
輸出:
# A tibble: 10 x 7
contract_id feature_a feature_b feature_c feature_d feature_e feature_f
<dbl> <chr> <chr> <chr> <chr> <chr> <chr>
1 1001 Yes Yes, mandatory No Yes No No
2 1002 No No Yes, mandatory No Yes No
3 1003 Yes, mandatory Yes Yes Yes No Yes
4 1004 Yes Yes, mandatory No No Yes Yes
5 1005 No No Yes, mandatory Yes No No
6 1006 Yes, mandatory Yes Yes No Yes No
7 1007 Yes Yes, mandatory No Yes No Yes
8 1008 No No Yes, mandatory No Yes Yes
9 1009 Yes, mandatory Yes Yes Yes No No
10 1010 Yes Yes, mandatory No No Yes No
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/374307.html
下一篇:MySQL合并兩個表并得到總和
