我有一個資料框,其列如下:
Name Measurement
0 Blue_Water_Final_Rev_0 3
1 Blue_Water_Final_Rev_1 4
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
4 Red_Water_Initial_Rev_0 6
如果另一個是“初始”,我只想保留具有最新轉速的行或具有“最終”的行。在上述情況下,我的輸出將如下所示:
Name Measurement
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
如何在我的 pandas 資料框中的 python 中執行此操作?謝謝。
uj5u.com熱心網友回復:
您可以在“Final”之前提取名稱并drop_duplicates使用keep='last':
keep = (df['Name']
.str.extract('^(.*)_Final', expand=False)
.drop_duplicates(keep='last')
.dropna()
)
out = df.loc[keep.index]
注意。假設資料按修訂排序。
輸出:
Name Measurement
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
如果要保留上一個修訂版的所有副本:
out = df[df['Name'].isin(df.loc[keep.index, 'Name'])]
uj5u.com熱心網友回復:
如果可能只存在Initial并且不存在Final并且需要保留它Series.str.extract用于獲取組的 3 列Final或Initial修訂號,將最后一列轉換為整數,然后按所有列排序DataFrame.sort_values并按以下方式獲取每個組的最后重復項DataFrame.duplicated:
print (df)
Name Measurement
0 Blue_Water_Final_Rev_0 3
1 Blue_Water_Final_Rev_1 4
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
4 Red_Water_Initial_Rev_0 6
5 Green_Water_Initial_Rev_0 6
df1 = (df['Name'].str.extract(r'(?P<a>\w )_(?P<b>Final|Initial)_Rev_(?P<c>\d )$')
.assign(c=lambda x: x.c.astype(int)))
df = df[~df1.sort_values(['a','c','b'], ascending=[True, True, False])
.duplicated('a', keep='last')]
print (df)
Name Measurement
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
5 Green_Water_Initial_Rev_0 6
但是,如果需要全部洗掉Initial并僅處理Final行使用與上面相同的第一部分,則只有使用Initial和 過濾掉最后修訂使用DataFrame.loc的行DataFrameGroupBy.idxmax:
df1 = (df['Name'].str.extract(r'(?P<a>\w )_(?P<b>Final|Initial)_Rev_(?P<c>\d )$')
.assign(c=lambda x: x.c.astype(int)))
df = df.loc[df1[df1.b.ne('Initial')].groupby('a')['c'].idxmax()]
print (df)
Name Measurement
2 Blue_Water_Final_Rev_2 5
3 Red_Water_Final_Rev_0 7
uj5u.com熱心網友回復:
您可以為此使用 df.iloc[2:4,:]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/516074.html
標籤:Python熊猫麻木的
上一篇:在拋硬幣模擬中實作偏差
