使用多索引資料框，根據另一列的條件獲取布爾列的求和結果-有解無憂

我們有一個多索引資料框，如下所示：

                                  date   condition_1    condition_2
item1   0    2021-06-10 06:30:00 00:00          True          False
        1    2021-06-10 07:00:00 00:00         False           True
        2    2021-06-10 07:30:00 00:00          True           True
item2   3    2021-06-10 06:30:00 00:00          True          False
        4    2021-06-10 07:00:00 00:00          True           True
        5    2021-06-10 07:30:00 00:00          True           True
item3   6    2021-06-10 06:30:00 00:00          True           True
        7    2021-06-10 07:00:00 00:00         False           True
        8    2021-06-10 07:30:00 00:00          True           True

專案之間的重復值date（因為 df 是資料幀字典上默認 concat 的結果）。

我們基本上要矢量化的邏輯是“對于所有專案的條件_1 為真的每個日期：在所有專案的新結果列中將條件_2 為真的出現求和”。

根據上面的示例，結果基本上看起來像這樣（關于它是如何派生的評論：在結果列旁邊）：

                                  date   condition_1    condition_2    result
item1   0    2021-06-10 06:30:00 00:00          True          False         1 [because condition_1 is True for all items and condition_2 is True once]
        1    2021-06-10 07:00:00 00:00         False           True         0 [condition_1 is not True for all items so condition_2 is irrelevant]
        2    2021-06-10 07:30:00 00:00          True           True         3 [both conditions are True for all 3 items]
item2   3    2021-06-10 06:30:00 00:00          True          False         1 [a repeat for the same reasons]
        4    2021-06-10 07:00:00 00:00          True           True         0 [a repeat for the same reasons]
        5    2021-06-10 07:30:00 00:00          True           True         3 [a repeat for the same reasons]
item3   6    2021-06-10 06:30:00 00:00          True           True         1 [a repeat for the same reasons]
        7    2021-06-10 07:00:00 00:00         False           True         0 [a repeat for the same reasons]
        8    2021-06-10 07:30:00 00:00          True           True         3 [a repeat for the same reasons]

uj5u.com熱心網友回復：

這是我想出的。

def cond_sum(s):
    return s.cond1.all() * s.cond2.sum()

df.reset_index(level=0, inplace=True)
df['result'] = df.groupby('date').apply(cond_sum)
df.set_index('item', append=True)

然后，如果您想要原始索引，可以將其添加回來。

df.set_index('item', append=True).swaplevel()

請注意，您提到了矢量化，因此您可以將其換成：

dfg = df.groupby(level=0).agg({'cond1': 'all', 'cond2': 'sum'})
df['result'] = dfg.cond1 * dfg.cond2

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/474765.html

標籤：熊猫数据框麻木的

上一篇：鏈接PandasDataFrame樣式

下一篇：如何使用熊貓從excel中匯入日期為索引的資料