我想到了下面的例子來說明我的問題。
假設有 5 個球:
- 紅色的
- 藍色
- 綠
- 黃色的
- 橘子

通常有5個!= 120 種方式可以組織這些球(n!)。我可以在下面列舉所有這些組合:
library(combinat)
library(dplyr)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(my_list)
all_combinations = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
setNames(paste0("col", 1:120))
all_combinations[,1:5]
col1 col2 col3 col4 col5
1 Red Red Red Red Orange
2 Blue Blue Blue Orange Red
3 Green Green Orange Blue Blue
4 Yellow Orange Green Green Green
5 Orange Yellow Yellow Yellow Yellow
我的問題:
假設我想按以下條件過濾此串列:
- “紅”球可以在第一個或第二個位置(從左到右)
- “藍”球和“綠”球之間必須至少有 2 個位置
- “黃”球不能在最后位置
然后我嘗試根據這 3 個條件過濾上述資料:
# attempt to write first condition
cond_1 <- all_combinations[which(all_combinations[1,]== "Red" || all_combinations[2,] == "Red"), ]
#not sure how to write the second condition
# attempt to write the third condition
cond_3 <- data_frame_version[which(data_frame_version[5,] !== "Yellow" ), ]
# if everything worked, an "anti join" style statement could be written to remove "cond_1, cond_2, cond_3" from the original data?
但這根本不起作用 - 第一個和第三個條件回傳一個資料框,所有列只包含 4 行。
有人可以告訴我如何使用上述 3 個過濾器正確過濾“all_combinations”?
筆記:
The following code can transpose the original data:
library(data.table)
tpose = transpose(all_combinations)
df = tpose
#group every 5 rows by the same id to identify unique combinations
bloc_len <- 5
df$bloc <-
rep(seq(1, 1 nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
head(df)
V1 V2 V3 V4 V5 bloc
1 Red Blue Green Yellow Orange 1
2 Red Blue Green Orange Yellow 1
3 Red Blue Orange Green Yellow 1
4 Red Orange Blue Green Yellow 1
5 Orange Red Blue Green Yellow 1
6 Orange Red Blue Yellow Green 2
uj5u.com熱心網友回復:
你可以做:
library(tidyverse)
tpose %>%
mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
V1 == "Blue" & V3 == "Green" ~ TRUE,
V2 == "Blue" & V3 == "Green" ~ TRUE,
V3 == "Blue" & V4 == "Green" ~ TRUE,
V4 == "Blue" & V5 == "Green" ~ TRUE,
TRUE ~ FALSE)) %>%
filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
V5 != "Yellow",
blue_delete == FALSE) %>%
select(-blue_delete)
uj5u.com熱心網友回復:
這是一個可擴展的 tidyverse 解決方案。
首先,讓我們將資料設為 120 行的小塊,每個球的組合對應一個。
library(tidyverse)
library(combinat)
data = my_list %>%
permn() %>%
map(~ set_names(.x, paste0("ball", 1:5))) %>%
do.call(bind_rows, args = .) %>%
mutate(id = row_number())
我們的資料:
# A tibble: 120 x 6
ball1 ball2 ball3 ball4 ball5 id
<chr> <chr> <chr> <chr> <chr> <int>
1 Red Blue Green Yellow Orange 1
2 Red Blue Green Orange Yellow 2
3 Red Blue Orange Green Yellow 3
4 Red Orange Blue Green Yellow 4
5 Orange Red Blue Green Yellow 5
6 Orange Red Blue Yellow Green 6
7 Red Orange Blue Yellow Green 7
8 Red Blue Orange Yellow Green 8
9 Red Blue Yellow Orange Green 9
10 Red Blue Yellow Green Orange 10
# ... with 110 more rows
該解決方案的關鍵思想是將資料轉換為長格式。這將使檢查每個條件變得微不足道。之后,我們可以將其恢復為寬幅。
data %>%
pivot_longer(-id) %>%
mutate(ball_number = as.numeric(str_extract(name, "[1-5]"))) %>%
group_by(id) %>%
filter(
# Condition 1
ball_number[value == "Red"] %in% c(1, 2),
# Condition 2
abs(ball_number[value == "Blue"] - ball_number[value == "Green"]) >= 3,
# Condition 3
ball_number[value == "Yellow"] != 5
) %>%
select(-ball_number) %>%
pivot_wider(values_from = "value", names_from = "name")
輸出顯示有 10 個排列:
# A tibble: 10 x 6
# Groups: id [10]
id ball1 ball2 ball3 ball4 ball5
<int> <chr> <chr> <chr> <chr> <chr>
1 8 Red Blue Orange Yellow Green
2 9 Red Blue Yellow Orange Green
3 32 Red Green Yellow Orange Blue
4 33 Red Green Orange Yellow Blue
5 48 Green Red Orange Yellow Blue
6 49 Green Red Yellow Orange Blue
7 50 Green Red Yellow Blue Orange
8 111 Blue Red Yellow Green Orange
9 112 Blue Red Yellow Orange Green
10 113 Blue Red Orange Yellow Green
此解決方案提供的改進是,由于我們的變數,您要檢查的所有條件都非常簡單ball_number。如果有更多球,您可以輕松地將此解決方案擴展到更復雜的條件,例如前 5 個球為紅色,或者藍色球加綠色球等于 7。
uj5u.com熱心網友回復:
這是你可以做的。我知道這不是您能找到的最漂亮、最優化的解決方案。但它有效!
all_combinations = as.data.frame(matrix(unlist(d), ncol = 5)) %>%
setNames(paste0("col", 1:5))
cond_1 <- all_combinations %>%
filter(col1 == "Red" | col2 == "Red")
cond_2 <- cond_1 %>%
filter(col1 == "Blue" | col1 == "Green" |
col2 == "Blue" | col2 == "Green" |
col3 == "Blue" | col3 == "Green" |
col4 == "Blue" | col4 == "Green" |
col5 == "Blue" | col5 == "Green")
cond_2 <- cond_2 %>%
mutate(cond = ifelse(col1 == 'Blue' & col4 == 'Green', 2, NA) |
ifelse(col1 == 'Blue' & col5 == 'Green', 3, NA) |
ifelse(col2 == 'Blue' & col5 == 'Green', 2, NA) |
ifelse(col1 == 'Green' & col4 == 'Blue', 2, NA) |
ifelse(col2 == 'Green' & col5 == 'Blue', 3, NA)) %>%
filter(cond == T)
cond_3 <- cond_2%>%
filter(col5 != "Yellow")
輸出:
col1 col2 col3 col4 col5 cond
1 Blue Red Orange Green Red TRUE
uj5u.com熱心網友回復:
如果您不太關心data.frame結構,我的首選方法是將每個結果保留為串列(即您的d變數)的成員,并sapply()使用一個函式檢查該結果是否滿足所有條件。
觀察:
library(combinat)
my_list <- c("Red", "Blue", "Green", "Yellow", "Orange")
my_list_perm <- combinat::permn(my_list)
# This function examines one particular outcome of the trial, e.g. outcome = ["Blue", "Orange", "Red", "Green", "Yellow"]
test_conditions <- function(outcome) {
# Condition 1
condition_1 <- "Red" %in% outcome[c(1,2)]
# Condition 2
condition_2 <- base::abs(base::which(outcome == "Blue") - base::which(outcome == "Green")) >= 2
# Condition 3
condition_3 <- base::which(outcome == "Yellow") != base::length(outcome)
all <- condition_1 && condition_2 && condition_3
return(all)
}
my_list_matches <- base::which(base::sapply(my_list_perm, test_conditions)) # applies the function to each list element (which itself is an outcome)
print(my_list_matches) # displays which trials / outcomes satisfied all conditions
#> [1] 6 7 8 9 10 12 19 22 29 31 32 33 34 35 41 48 49 50 111 112 113 120
由reprex 包(v1.0.0)于 2022 年 1 月 4 日創建
然后您可以使用匹配的索引來過濾原始串列。
uj5u.com熱心網友回復:
也許我誤讀了這個問題,但正如我所看到的,沒有一個答案似乎顯示了一個解決方案,其中在問題的第 2 步中的顏色之間有 2 列。
我冒昧地測驗了資料,發現只有當您使用“黃色”和“橙色”時,您才能找到滿足您要求的過濾條件(據我所知)。
這不是一個通用的答案,它實際上并不正確,因為“黃色”在最后一行,違反了規則,但是:
在已經考慮到最后一行的情況下,顏色之間的距離為 2 將問題減少到 4 列問題。因此只能在第 1 列和第 4 列之間實作距離為 2。這導致了 4 個假設:
第 1 列需要是“綠色”或“藍色”
第 2 列需要為“紅色”
第 3 列不應為“綠色”或“藍色”
第 4 列應該再次是“綠色”或“藍色”,但不是第 1 列
這是我想出的代碼,不漂亮,正如解釋的那樣,“綠色”和“藍色”切換到“黃色”和“橙色”,但我認為這有效。
library(combinat)
library(tidyverse)
my_list = c("Red", "Blue", "Green", "Yellow", "Orange")
d = permn(my_list)
all_combinations = as.data.frame(matrix(unlist(d), ncol = 5)) %>%
setNames(paste0("col", 1:5))
`%!in%` <- Negate(`%in%`)
combis <- all_combinations %>%
filter(col1 %in% c("Yellow", "Orange"),
col2 == "Red",
!col3 %in% c("Yellow", "Orange"),
col5 == "Yellow")
results <- vector()
for(i in seq_along(combis[,1])){
if(combis[i,][1] %!in% c(combis[i,][4], "Red", "Green", "Blue")){
results <- combis[i,]
}
}
results
col1 col2 col3 col4 col5
3 Yellow Red Green Orange Yellow
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/403210.html
標籤:
