將多個列與多個查找表連接起來-有解無憂

我的任務是在 R 中從 SAS 重現流程。在過去 71 個月中，我有 1 個表，其中包含 140 萬行和 156 列。在列中只有 ID，這些將被文本替換。

為此，有 60 個查找表。其中一些被多次使用，而一些只使用一次。

我無法顯示真實資料，但這是表格外觀的一個小示例。：

df <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
            feature_a = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 1),
            feature_b = c(3, 2, 1, 3, 2, 1, 3, 2, 1, 3),
            feature_c = c(2, 3, 1, 2, 3, 1, 2, 3, 1, 2),
            feature_d = c(1, 2, 1, 2, 1, 2, 1, 2, 1, 2),
            feature_e = c(2, 1, 2, 1, 2, 1, 2, 1, 2, 1),
            feature_f = c(2, 2, 1, 1, 2, 2, 1, 1, 2, 2))

   contract_id feature_a feature_b feature_c feature_d feature_e feature_f
         <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
         1001         1         3         2         1         2         2
         1002         2         2         3         2         1         2
         1003         3         1         1         1         2         1
         1004         1         3         2         2         1         1
         1005         2         2         3         1         2         2
         1006         3         1         1         2         1         2
         1007         1         3         2         1         2         1
         1008         2         2         3         2         1         1
         1009         3         1         1         1         2         2
         1010         1         3         2         2         1         2

這些是 60 個查找表中的 2 個，它們被多次使用，例如，lookup_a 被使用了 8 次，lookup_b 被使用了 15 次：

lookup_a = tibble(id = c(1, 2, 3),
                 value = c("yes", "no", "yes, mandatory"))
                 
lookup_b = tibble(id = c(1, 2),
                  value = c("yes", "no"))

這是所需結果的外觀（feature_a - c 使用 lookup_a 和 feature_d - f 使用 lookup b）：

df_expected <-tibble(contract_id = c(1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010),
                     feature_a = c("yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes"),
            feature_b = c("yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory", "no", "yes", "yes, mandatory"),
            feature_c = c("no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no", "yes, mandatory", "yes", "no"),
            feature_d = c("yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes", "no"),
            feature_e = c("no", "yes", "no", "yes", "no", "yes", "no", "yes", "no", "yes"),
            feature_f = c("no", "no", "yes", "yes", "no", "no", "yes", "yes", "no", "no"))

   contract_id feature_a      feature_b      feature_c      feature_d feature_e feature_f
         <dbl> <chr>          <chr>          <chr>          <chr>     <chr>     <chr>    
         1001 yes            yes, mandatory no             yes       no        no       
         1002 no             no             yes, mandatory no        yes       no       
         1003 yes, mandatory yes            yes            yes       no        yes      
         1004 yes            yes, mandatory no             no        yes       yes      
         1005 no             no             yes, mandatory yes       no        no       
         1006 yes, mandatory yes            yes            no        yes       no       
         1007 yes            yes, mandatory no             yes       no        yes      
         1008 no             no             yes, mandatory no        yes       yes      
         1009 yes, mandatory yes            yes            yes       no        no       
         1010 yes            yes, mandatory no             no        yes       no

我當然可以為每一列創建一個連接，但這并不令人滿意。我想保持盡可能少的連接數：

df %>% 
      left_join(lookup_a, by = c("feature_a" = "id")) %>% 
      select(-feature_a) %>% 
      rename(feature_a = value)

我也嘗試過使用 data.table 或 match 的不同方法，但我還沒有找到同時連接多個列的方法。我的問題是所有列都被更改了，而不是選定的列。

以下是我的問題：

有沒有辦法一次對多列的查找表進行連接/匹配（例如left_join）并使用列的名稱進行重命名？
或者是否可以一次替換多列的值？

可能我現在想的太復雜了，解決方法也比較簡單。

先感謝您！

uj5u.com熱心網友回復：

歡迎來到 SO！可以代替使用多列的值across在mutate動詞使用要改變的列索引（2至4為A至C的列，并且列的5至7天至f）：

library(dplyr)
df %>% 
  mutate(across(2:4,
         ~case_when(. == 1 ~ "Yes",
                    . == 2 ~ "No",
                    . == 3 ~ "Yes, mandatory",
                    TRUE ~ "Error"))) %>%
  mutate(across(5:7,
                ~case_when(. == 1 ~ "Yes",
                           . == 2 ~ "No",
                           TRUE ~ "Error")))

輸出：

# A tibble: 10 x 7
   contract_id feature_a      feature_b      feature_c      feature_d feature_e feature_f
         <dbl> <chr>          <chr>          <chr>          <chr>     <chr>     <chr>    
 1        1001 Yes            Yes, mandatory No             Yes       No        No       
 2        1002 No             No             Yes, mandatory No        Yes       No       
 3        1003 Yes, mandatory Yes            Yes            Yes       No        Yes      
 4        1004 Yes            Yes, mandatory No             No        Yes       Yes      
 5        1005 No             No             Yes, mandatory Yes       No        No       
 6        1006 Yes, mandatory Yes            Yes            No        Yes       No       
 7        1007 Yes            Yes, mandatory No             Yes       No        Yes      
 8        1008 No             No             Yes, mandatory No        Yes       Yes      
 9        1009 Yes, mandatory Yes            Yes            Yes       No        No       
10        1010 Yes            Yes, mandatory No             No        Yes       No

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/374307.html

標籤：r 加入数据处理

上一篇：熊貓在兩列上連接表而不對值進行排序

下一篇：MySQL合并兩個表并得到總和