我正在使用兩個資料集 - 一組有成對的專案:
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
original
label1 label2
1 cat dog
2 cat dog
3 dog cat
4 dog cat
5 cat dog
6 tiger cow
7 tiger cow
8 cow tiger
第二個資料集包含第一組專案的索引代碼:
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
index
item code
1 cat 1
2 dog 0
3 tiger 1
4 cow 0
我正在尋找一種方式來創建兩個新列:tag0和tag1,使它看起來像這樣:
new <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"),
tag1 = c("cat", "cat", "cat", "cat", "cat", "tiger", "tiger", "tiger"),
tag0 = c("dog", "dog", "dog", "dog", "dog", "cow", "cow", "cow"))
new
label1 label2 tag1 tag0
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
tag0指的是對應的標簽code=0,tag1指的code=1是indexdataframe中對應的標簽。
任何人都可以幫助我提供tidyverse基于解決方案的解決方案嗎?
uj5u.com熱心網友回復:
這里有兩個解決方案tidyverse。雖然第一個適用于這種特殊情況,但我更喜歡第二個,它更優雅且可擴展。
解決方案 1:JOIN每個Alabel*
首先匯入tidyverse并生成您的資料集original和index.
library(tidyverse)
# ...
# Code to generate 'original' and 'index' datasets.
# ...
然后應用此作業流程。
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Match 'label1' to the tags.
left_join(
index,
by = c("label1" = "item"),
keep = TRUE
) %>%
# Match 'label2' to the tags.
left_join(
index,
by = c("label2" = "item"),
keep = TRUE,
suffix = c(".1", ".2")
) %>%
# Pivot 'item.1 | ... | item.n | code.1 | ... | code.n' into a consolidated
# 'item | code' form.
pivot_longer(
cols = matches("^(item|code)\\.(\\d )?$"),
names_pattern = "^(item|code)\\.(\\d )?$",
names_to = c(".value", NA)
) %>%
# Pivot back into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
結果
鑒于這里復制的original和index資料集
original <- data.frame(
label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger")
)
index <- data.frame(
item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0)
)
該解決方案應產生以下結果:
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
筆記
如果您的original資料集有任何其他label*列,您將需要為這些列中的每一列執行額外JOIN的操作。
解決方案 2:單 CROSS JOIN
這是一個更優雅的作業流程,它也更靈活:它適用于任意數量的label*列 inoriginal和任意一組codes in index。
original %>%
# Uniquely identify each row (for pivoting later).
mutate(row_id = row_number()) %>%
# Perform a cross-join compare every 'item' to every 'label*'.
full_join(
index,
by = character()
) %>%
# Keep only those rows where 'item' matches a 'label*'.
rowwise() %>%
filter(item %in% c_across(matches("^label\\d "))) %>%
# Pivot into a 'tag1 | tag0' form.
pivot_wider(
values_from = item,
names_from = code,
names_glue = "tag{code}"
) %>%
# Omit unique identifier.
select(!row_id)
結果
結果保持不變。
# A tibble: 8 x 4
label1 label2 tag1 tag0
<chr> <chr> <chr> <chr>
1 cat dog cat dog
2 cat dog cat dog
3 dog cat cat dog
4 dog cat cat dog
5 cat dog cat dog
6 tiger cow tiger cow
7 tiger cow tiger cow
8 cow tiger tiger cow
筆記
唯一的缺點是它必須執行 a CROSS JOIN,這可能會阻礙更大資料集的性能。
uj5u.com熱心網友回復:
另一種可能的解決方案:
library(tidyverse)
original <- data.frame(label1 = c("cat", "cat", "dog", "dog", "cat", "tiger", "tiger", "cow"),
label2 = c("dog", "dog", "cat", "cat", "dog", "cow", "cow", "tiger"))
index <- data.frame(item = c("cat", "dog", "tiger", "cow"),
code = c(1, 0, 1, 0))
original %>%
full_join(index, by=c("label1" = "item")) %>%
full_join(index, by=c("label2" = "item")) %>%
mutate(tag1 = if_else(code.x == 1, label1, label2)) %>%
mutate(tag2 = if_else(code.y == 1, label1, label2)) %>%
select(!starts_with("code"))
#> label1 label2 tag1 tag2
#> 1 cat dog cat dog
#> 2 cat dog cat dog
#> 3 dog cat cat dog
#> 4 dog cat cat dog
#> 5 cat dog cat dog
#> 6 tiger cow tiger cow
#> 7 tiger cow tiger cow
#> 8 cow tiger tiger cow
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/409172.html
標籤:
上一篇:如何計算字串并設定一個數字?
下一篇:將ggplot變成函式
