PandasGroupby和應用-有解無憂

我正在執行一個 grouby 并應用一個回傳一些奇怪結果的資料框，我使用的是 pandas 1.3.1

這是代碼：

ddf = pd.DataFrame({
    "id": [1,1,1,1,2]
})

def do_something(df):
    return "x"

ddf["title"] = ddf.groupby("id").apply(do_something)
ddf

我希望列中的每一行都title被分配值“x”但是當這種情況發生時我得到這個資料：

        id title
0        1   NaN
1        1     x
2        1     x
3        1   NaN
4        2   NaN

這是預期的嗎？

uj5u.com熱心網友回復：

結果并不奇怪，這是正確的行為：apply回傳組的值，這里 1 和 2 成為聚合的索引：

>>> list(ddf.groupby("id"))
[(1,        # the group name (the future index of the grouped df)
     id     # the subset dataframe of the group 2
  0   1
  1   1
  2   1
  3   1),
 (2,        # the group name (the future index of the grouped df)
     id     # the subset dataframe of the group 2
  4   2)]

為什么我有結果？因為該組的標簽與您的資料框索引相同：

>>> ddf.groupby("id").apply(do_something)
id
1    x
2    x
dtype: object

現在改成id這樣：

ddf['id']  = 10
#    id
# 0  11
# 1  11
# 2  11
# 3  11
# 4  12

ddf["title"] = ddf.groupby("id").apply(do_something)
#    id title
# 0  11   NaN
# 1  11   NaN
# 2  11   NaN
# 3  11   NaN
# 4  12   NaN

或更改index：

ddf.index  = 10
#    id
# 10  1
# 11  1
# 12  1
# 13  1
# 14  2

ddf["title"] = ddf.groupby("id").apply(do_something)
#     id title
# 10   1   NaN
# 11   1   NaN
# 12   1   NaN
# 13   1   NaN
# 14   2   NaN

uj5u.com熱心網友回復：

是的，這是預期的。

首先，這apply(do_something)部分作業起來就像一個魅力，正是在這之前的 groupby 導致了問題。Groupby 回傳一個groupby 物件，它與普通的資料幀有點不同。如果您除錯并檢查 groupby 回傳的內容，那么您會看到您需要某種形式的匯總函式來使用它（平均最大值或總和）。如果您運行其中一個作為示例，如下所示：

df = ddf.groupby("id")
df.mean()

它導致了這個結果：

Empty DataFrame
Columns: []
Index: [1, 2]

之后do_something僅應用于索引 1 和 2；然后整合到你原來的df中。這就是為什么你只有索引 1 和 2 和 x。現在我建議不要使用 groupby，因為目前還不清楚為什么要在這里使用它。并深入了解groupby 物件

uj5u.com熱心網友回復：

如果需要在聚合函式中使用新列GroupBy.transform，則需要指定列groupby用于處理后，這里id：

ddf["title"] = ddf.groupby("id")['id'].transform(do_something)

或在函式中分配新列：

def do_something(x):
    x['title'] = 'x'
    return x

ddf = ddf.groupby("id").apply(do_something)

解釋為什么不在另一個答案中作業。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/421093.html

標籤：

上一篇：R中的資料框-根據第一列中因子的值對列進行操作

下一篇：如何使用Pandas指定和定位單元格并使用fillna