我無法從我的資料框中獲取不同的值。下面是我當前使用的代碼,在第 25 行(vier() 的第 3 行)是問題:我想根據他們的平均熱量顯示前 10 名最快的車手(卡丁車熱)時間。
輸入:
HeatNumber,NumberOfKarts,KartNumber,DriverName,Laptime
334,11,5,Monique,00:53.862
334,11,5,Monique,00:59.070
334,11,5,Monique,00:47.832
334,11,5,Monique,00:47.213
334,11,5,Monique,00:51.975
334,11,5,Monique,00:46.423
334,11,5,Monique,00:49.539
334,11,5,Monique,00:49.935
334,11,5,Monique,00:45.267
334,11,12,Robert-Jan,00:55.606
334,11,12,Robert-Jan,00:52.249
334,11,12,Robert-Jan,00:50.965
334,11,12,Robert-Jan,00:53.878
334,11,12,Robert-Jan,00:48.802
334,11,12,Robert-Jan,00:48.766
334,11,12,Robert-Jan,00:46.003
334,11,12,Robert-Jan,00:46.257
334,11,12,Robert-Jan,00:47.334
334,11,20,Katja,00:56.222
334,11,20,Katja,01:01.005
334,11,20,Katja,00:50.296
334,11,20,Katja,00:48.004
334,11,20,Katja,00:51.203
334,11,20,Katja,00:47.672
334,11,20,Katja,00:50.243
334,11,20,Katja,00:50.453
334,11,20,Katja,01:06.192
334,11,13,Bensu,00:56.332
334,11,13,Bensu,00:54.550
334,11,13,Bensu,00:52.023
334,11,13,Bensu,00:52.518
334,11,13,Bensu,00:50.738
334,11,13,Bensu,00:50.359
334,11,13,Bensu,00:49.307
334,11,13,Bensu,00:49.595
334,11,13,Bensu,00:50.504
334,11,17,Marit,00:56.740
334,11,17,Marit,00:52.534
334,11,17,Marit,00:48.331
334,11,17,Marit,00:56.204
334,11,17,Marit,00:49.066
334,11,17,Marit,00:49.210
334,11,17,Marit,00:45.655
334,11,17,Marit,00:46.261
334,11,17,Marit,00:46.837
334,11,11,Niels,00:58.518
334,11,11,Niels,01:01.562
334,11,11,Niels,00:51.238
334,11,11,Niels,00:48.808
代碼:
import pandas as pd
import matplotlib.pyplot as plt
#Data
df = pd.read_csv('dataset_kartanalyser.csv')
df = df.dropna(axis=0, how='any')
df = df.join(df['Laptime'].str.split(':', 1, expand=True).rename(columns={0:'M', 1:'S'}))
df['M'] = df['M'].astype(int)
df['S'] = df['S'].astype(float)
df['Laptime'] = (df['M'] * 60) df['S']
df.drop(['M', 'S'], axis=1, inplace=True)
#Funties
def twee():
print("Het totaal aantal karts = " str(df['KartNumber'].nunique()))
print("Het aantal unique drivers = " str(df['DriverName'].nunique()))
print("Het totaal aantal heats = " str(df['HeatNumber'].nunique()))
def drie():
print("De 10 snelste Drivers obv individuele tijd zijn: ")
print((df.groupby('DriverName')['Laptime'].nsmallest(1)).nsmallest(10))
def vier():
print('De 10 snelste Drivers obv snelste heat gemiddelde:')
print((df.groupby(['DriverName', 'HeatNumber'])['Laptime'].mean().round(3)).nsmallest(10))
print(df)
HeatNumber NumberOfKarts KartNumber DriverName Laptime
0 334 11 5 Monique 53.862
1 334 11 5 Monique 59.070
2 334 11 5 Monique 47.832
3 334 11 5 Monique 47.213
4 334 11 5 Monique 51.975
... ... ... ... ... ...
4053 437 2 20 luuk 39.678
4054 437 2 20 luuk 39.872
4055 437 2 20 luuk 39.454
4056 437 2 20 luuk 39.575
4057 437 2 20 luuk 39.648
輸出:
DriverName HeatNumber
giovanni 411 26.233
ryan 411 27.747
giovanni 408 27.938
papa 394 28.075
guus 406 28.998
Rob 427 29.371
Suus 427 29.416
Jan-jullius 394 29.428
Joep 427 29.934
Indy 423 29.991
我得到的輸出幾乎是正確的,預計驅動程式“giovanni”會出現兩次。我只想顯示每個驅動程式的最快平均加熱時間。有誰知道如何做到這一點?
uj5u.com熱心網友回復:
您可以按 Drivername 和 HeatNumber 對資料進行分組。查看 HeatNumbers,其中一個是 411,另一個是 408。因此,熊貓知道它們是完全不同的。如果你等于他們,他們將是一。
uj5u.com熱心網友回復:
好的,所以在這樣的列上添加 drop_duplication 只需要添加排序以及 df.sort_values('B', ascending=True)
.drop_duplicates('A', keep='first')
(df.groupby(['DriverName', 'HeatNumber'])['Laptime'].mean().round(3)sort_values('Laptime', ascending=True).drop_duplicates('DriverName', keep='first')).nsmallest(10))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/479706.html
