在我對列first_register和second_register執行 groupby 函式后,我試圖從我的資料幀的每一組中選擇一個列類值,但它似乎不起作用。
假設我有一個這樣的資料框:
import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})
我嘗試過但根本沒有用的方法:
group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)
如何從每組資料框中選擇/訪問每個類標簽?
所需的輸出可以是這樣的有序串列,以表示從第一組到最后一組的每個組的每個類:
label_class = [1, 2, 0, 1]
uj5u.com熱心網友回復:
使用dropna=False:
group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()
first_register second_register
70/20 NaN [1]
71/20 NaN [2]
NaN 72/20 [0]
73/20 [1]
Name: class, dtype: object
如果你知道唯一類的長度是 1 或者你想得到第一個或最后一個:
label_class = group_by_df["class"].first()
要么:
label_class = group_by_df["class"].last()
uj5u.com熱心網友回復:
使用GroupBy.first:
out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)
first_register second_register
70/20 NaN 1
71/20 NaN 2
NaN 72/20 0
73/20 1
Name: class, dtype: int64
label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/447951.html
標籤:Python 熊猫 数据框 熊猫-groupby
