在我的資料集中,我試圖獲得兩個值之間的邊距。如果不包括第四場比賽,下面的代碼可以完美運行。根據一列分組后,有時似乎只有一個值,因此,沒有其他值可以從中獲得邊距。在這種情況下,我想忽略這些分組。這是我當前的代碼:
import pandas as pd
data = {'Name':['A', 'B', 'B', 'C', 'A', 'C', 'A'], 'RaceNumber':
[1, 1, 2, 2, 3, 3, 4], 'PlaceWon':['First', 'Second', 'First', 'Second', 'First', 'Second', 'First'], 'TimeRanInSec':[100, 98, 66, 60, 75, 70, 75]}
df = pd.DataFrame(data)
print(df)
def winning_margin(times):
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin)
winning_margins.columns = ['margin']
winners = df.loc[df.PlaceWon == 'First', :]
winners = winners.join(winning_margins, on='RaceNumber')
avg_margins = winners[['Name', 'margin']].groupby('Name').mean()
avg_margins
uj5u.com熱心網友回復:
如果times沒有足夠的元素,如何回傳 NaN :
import numpy as np
def winning_margin(times):
if len(times) <= 1: # New code
return np.NaN # New code
times = list(times)
winner = min(times)
times.remove(winner)
return min(times) - winner
您的代碼在此更改下運行,并且似乎產生了合理的結果。但是,如果您想在此行中進一步洗掉 NaN
winning_margins = df[['RaceNumber', 'TimeRanInSec']] \
.groupby('RaceNumber').agg(winning_margin).dropna() # note the addition of .dropna()
uj5u.com熱心網友回復:
您可以一步獲得贏家和利潤:
def get_margin(x):
if len(x) < 2:
return np.NaN
i = x['TimeRanInSec'].idxmin()
nl = x['TimeRanInSec'].nsmallest(2)
margin = nl.max()-nl.min()
return [x['Name'].loc[i], margin]
然后:
df.groupby('RaceNumber').apply(get_margin).dropna()
RaceNumber
1 [B, 2]
2 [C, 6]
3 [C, 5]
(資料有'第一'指示符對應于資料中較慢的時間)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/409089.html
標籤:
上一篇:值大于當前行數的條件計數
