我有一個如下例所示的資料框:
Timestamp ComponentName Utilization
18.10.2020-19:07.10 A Available
19.10.2020-21:07.10 A Available
19.10.2020-19:07.10 A In use
22.10.2020-19:07.10 A In use
25.10.2020-19:07.10 A In use
所需的輸出應該是:
ComponentName Total_Inuse_time Total_Available_time
A 6 days 1 day 2 hours
基本上,我希望每個組件的總使用時間和可用時間。我曾嘗試按組件名稱分組并在時間差異上聚合總和,但無法獲得所需的結果。
uj5u.com熱心網友回復:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df['Timestamp'] = df.groupby(['ComponentName', 'Utilization'])['Timestamp'].diff().fillna(pd.Timedelta(0))
sums = df.groupby(['ComponentName', 'Utilization'])['Timestamp'].sum()
輸出:
>>> sums
ComponentName Utilization
A Available 1 days 02:00:00
In use 6 days 00:00:00
Name: Timestamp, dtype: timedelta64[ns]
>>> sums['A']
Utilization
Available 1 days 02:00:00
In use 6 days 00:00:00
Name: Timestamp, dtype: timedelta64[ns]
>>> sums['A']['Available']
Timedelta('1 days 02:00:00')
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/362540.html
上一篇:在PySpark中合并(左右)
下一篇:過濾列包含所有子字串
