用于計算所有布爾組合DataFrame的演算法，與|和&運算式包括-有解無憂

我有一個 DataFrame，它包含幾千列和幾萬行（我以后想增加的行數）。DataFrame 中的所有值都是布林值。

import numpy as np
import pandas as pd

column_amount = 6 # ambigue example value
row_amount = 10 # ambigue example value
df = pd.DataFrame([np.random.choice([True, False], column_amount) for i in range(row_amount)])
df

    0       1       2       3       4       5
0   False   False   True    False   False   False
1   True    True    True    True    False   False
2   True    False   True    False   False   False
3   True    True    True    True    True    True
4   True    False   True    False   False   False
5   True    False   False   True    False   False
6   True    True    True    True    False   True
7   False   True    True    True    True    True
8   True    False   True    True    False   False
9   True    True    False   False   True    True

所有列代表所有可用時間戳的信號，因此每列都有一個從 0 到 1000 的獎勵值（例如 453）。獎勵值由與這部分問題無關的函式計算。

def get_reward(column):
    reward = ...(column)
    return reward

該獎勵值是我想通過列組合來優化的。這些列可以通過使用 AND 或 OR 運算式和特定的操作順序進行合并，獲得新的獎勵值。

假設第 0 1 2 列的獎勵值分別為 100 200 和 300。將第 0 列和第 1 列與“或”結合起來，會給出一個模糊的獎勵值，例如 250。但是將第 0 列“或”第 2 列結合起來會得到更糟糕的組合獎勵值 80。這些列彼此之間沒有任何關系并且是完全獨立的（為了爭論，列可能會被打亂）。

# An example solution of combinations
combination_column = (df[5] | df[1]) & ((df[4] | df[2]) & df[3])
reward = get_reward(combination_column) # example reward value of 3000

我需要使用有效的演算法找到最佳組合列（最高獎勵指標），否則搜索空間可能會很快變得非常大。解決方案可能是使用 & 或 | 組合數十個不同列的某種形式。. 問題在于，僅探索具有最高獎勵指標的列的貪心演算法可能不是最佳解決方案。因為可能有兩個列的組合，其中包含所有 100 個獎勵值，它們一起獲得數千個獎勵值。但我會認為獎勵值越高，獲得好的組合獎勵值的機會就越高。因此，考慮到這一點，我希望所有可能的組合都可以計算，但順序有效。這樣，如果搜索空間失控并且看起來不再那么有希望，我可以自己實作一個提前停止功能來中止搜索空間（它希望到那時找到最好的解決方案）。

from itertools import permutations

排列可能有助于對所有列進行所有可能的組合。但是我被困在如何有效地通過搜索空間，結合“與”“或”的可能性和括號來表示操作順序。所以我希望互聯網上的天才看到這一點并幫助我解決這個問題。如果有什么不清楚的地方，我會進一步詳細說明，但我想讓問題盡可能簡潔。

uj5u.com熱心網友回復：

我想出了一個我的用例可以接受的貪婪解決方案。以防萬一有人遇到類似的問題，也許這可能會對您有所幫助：

"""
Sort all columns by reward_values
Make all combinations of 2 columns and add to dataframe, if satisfies given rules
Repeat until a full trial doesn't add a new column

RULES: Don't try combinations in which a starting column exists on both sides
RULES: Don't include columns with a negative training reward_value
RULES: Don't include equal columns
RULES: Don't include a new column if the new training reward_value is not better than the max of the two originals
"""

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/520462.html

標籤：Python熊猫算法搜索布尔值

上一篇：從具有父子關系的陣列中查找頂層層次結構元素

下一篇：如何將每個元素指向其他物體的List<Entity>轉換為Java中的Map<String,Entity>