使用 a pandas.Dataframe,我應該如何使用另一列的模式洗掉重復的(基于多列)行?
import pandas as pd
df = pd.DataFrame(
data={
"col_1": [0, 0, 0, 0, 1, 1, 1, 1],
"col_2": [1, 1, 1, 1, 2, 2, 2, 2],
"col_3": [5, 5, 0, 1, 8, 8, 0, 1],
"another_column": [0, 0, 0, 0, 0, 0, 0, 0],
}
)
# the following line shows the correct answer but doesn't return original dataframe
# with only the two unique rows
print(df.groupby(by=["col_1", "col_2"])["col_3"].agg(lambda x: x.mode()[0]))
uj5u.com熱心網友回復:
使用GroupBy.transform和比較原始col_3列boolean indexing:
s = df.groupby(by=["col_1", "col_2"])["col_3"].transform(lambda x: x.mode()[0])
df1 = df[df['col_3'].eq(s)]
print (df1)
col_1 col_2 col_3 another_column
0 0 1 5 0
1 0 1 5 0
4 1 2 8 0
5 1 2 8 0
如果需要每組的第一行:
s = df.groupby(by=["col_1", "col_2"])["col_3"].transform(lambda x: x.mode()[0])
df1 = df[df['col_3'].eq(s)].drop_duplicates(["col_1", "col_2"])
print (df1)
col_1 col_2 col_3 another_column
0 0 1 5 0
4 1 2 8 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/458770.html
上一篇:在條件下按列分組以計算平均值
下一篇:將串列串列與串列進行比較
