我有一個這樣的資料框:
Folder1 Folder2
0 2021-11-22 12:00:00 2021-11-24 10:00:00
1 2021-11-23 10:30:00 2021-11-25 18:30:00
2 2021-11-12 10:30:00 2021-11-15 18:30:00
3 2021-11-23 10:00:00 NaN
使用此代碼:
def strfdelta(td: pd.Timestamp):
seconds = td.total_seconds()
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
seconds = int(seconds % 60)
return f"{hours:02}:{minutes:02}:{seconds:02}"
df["Folder1"] = pd.to_datetime(df["Folder1"])
df["Folder2"] = pd.to_datetime(df["Folder2"])
bm1 = df["Folder1"].notna() & df["Folder2"].notna()
bm2 = df["Folder1"].notna() & df["Folder2"].isna()
df["Time1"] = (df.loc[bm1, "Folder2"] - df.loc[bm1, "Folder1"]).apply(strfdelta)
df["Time2"] = (datetime.now() - df.loc[bm2, "Folder1"]).apply(strfdelta)
我有這個 df:
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 80:00:00 NaN
3 2021-11-23 10:00:00 NaN NaN 03:00:00
基本上,這就是我想要的,但是,在計算來自 Folder1 和 Folder2 的時間戳之間的差異時,我如何排除周末時間?我應該改變什么才能擁有這樣的 df:
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32:00:00 NaN
3 2021-11-23 10:00:00 NaN NaN 03:00:00
因此,在索引 2 的行中,13.11 和 14.11 是周末,因此,在時間 1 中,差異應該是32而不是 80
uj5u.com熱心網友回復:
我認為您可以像這樣將pandas.date_range函式與pandas.tseries.offsets.CustomBusinessHour結合使用:
# import pandas and numpy
import pandas as pd
import numpy as np
# construct dataframe
df = pd.DataFrame()
df["Folder1"] = pd.to_datetime(
pd.Series(
[
"2021-11-22 12:00:00",
"2021-11-23 10:30:00",
"2021-11-12 10:30:00",
"2021-11-23 10:00:00",
]
)
)
df["Folder2"] = pd.to_datetime(
pd.Series(
[
"2021-11-24 10:00:00",
"2021-11-25 18:30:00",
"2021-11-15 18:30:00",
np.NaN
]
)
)
# define custom business hours
cbh = pd.tseries.offsets.CustomBusinessHour(start="0:00", end="23:59")
# actual calculation
df["Time1"] = df[~(df["Folder1"].isnull() | df["Folder2"].isnull())].apply(
lambda row: len(
pd.date_range(
start=row["Folder1"],
end=row["Folder2"],
freq=cbh)),
axis=1,
)
df.head()
這對我來說產生:
print(df.head())
Folder1 Folder2 Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0
3 2021-11-23 10:00:00 NaT NaN
作為獎勵,您還可以使用它更有效地進行 Time2 計算:
df["Time2"] = df[df["Folder2"].isnull()].apply(
lambda row: len(
pd.date_range(
start=row["Folder1"],
end=datetime.datetime.now(),
freq=cbh)),
axis=1,
)
這對我來說產生了(歐洲中部時間 14:45):
print(df.head())
Folder1 Folder2 Time1 Time2
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0 NaN
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0 NaN
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0 NaN
3 2021-11-23 10:00:00 NaT NaN 5.0
uj5u.com熱心網友回復:
df = pd.DataFrame({'Folder1': ['2021-11-22 12:00:00', '2021-11-23 10:30:00', '2021-11-12 10:30:00', '2021-11-23 10:00:00'],
'Folder2': ['2021-11-24 10:00:00', '2021-11-25 18:30:00', '2021-11-15 18:30:00', None]})
df[['Folder1','Folder2']] = df[['Folder1','Folder2']].astype('datetime64')
def strfdelta(t1, t2):
hd = pd.date_range(t1, t2, freq='W-SAT').append(pd.date_range(t1, t2, freq='W-SUN'))
sec = (t2-t1).total_seconds() - len(hd)*24*3600
return f"{int(sec//3600):02d}:{int((sec%3600)//60):02d}:{int(sec%60):02d}"
now = pd.to_datetime('now')
df['Time1'] = df.fillna(now).apply(lambda x: strfdelta(x['Folder1'], x['Folder2']), axis=1)
print(df)
印刷:
Folder1 Folder2 Time1
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46:00:00
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56:00:00
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32:00:00
3 2021-11-23 10:00:00 NaT 20:58:26
uj5u.com熱心網友回復:
df['Folder1']=pd.to_datetime(df['Folder1'])
df['Folder2']=pd.to_datetime(df['Folder2']).fillna(df['Folder1'])
df['missing']=df.apply(lambda x: pd.date_range(start=x['Folder1'], end=x['Folder2'], freq='D'), axis=1)#Create column with missing date periods
df=(df.assign(time=np.where((df['missing'].apply(lambda x: x.strftime('%w'))).map(set).astype(str).str.contains('0|6'),#Where missing periods have a Saturday or Sunday
(df['Folder2']-df['Folder1']).astype('timedelta64[h]')-48,# When above condition is met, subtract two 48 hours from the two days columns difference
(df['Folder2']-df['Folder1']).astype('timedelta64[h]'))#When condition not met substract just the two date columns)
).drop(columns=['missing']) )
print(df)
Folder1 Folder2 time
0 2021-11-22 12:00:00 2021-11-24 10:00:00 46.0
1 2021-11-23 10:30:00 2021-11-25 18:30:00 56.0
2 2021-11-12 10:30:00 2021-11-15 18:30:00 32.0
3 2021-11-23 10:00:00 2021-11-23 10:00:00 0.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/364709.html
