我正在處理一個艱巨的挑戰,我不知道如何解決它。
我有一個這樣的資料框:
Product_Name Start_Time End_Time
Product X 2021-10-20 20:32:00 2021-10-21 03:50:00
Product Y 2021-10-21 11:50:00 2021-10-21 16:00:00
Product Z 2022-01-11 20:10:00 2022-01-12 15:30:00
我有 3 個范圍時間和一個類別:
A: 05:01 to 14:00
B: 14:01 to 22:00
C: 22:01 to 05:00
我想要做的是根據“Start_Time”和“End_Time”計算每個類別(A、B 和 C)有多少小數小時,達到如下所示:
Product_Name Start_Time End_Time A B C
Product X 2021-10-20 20:30:00 2021-10-21 03:50:00 0.00 1.50 5.82
Product Y 2021-10-21 11:50:00 2021-10-21 16:00:00 2.17 1.98 0.00
Product Z 2022-01-11 20:10:00 2022-01-12 15:30:00 8.98 3.31 6.98
你們能幫我怎么做嗎?
我是 python、pandas 等的真正初學者,當我第一次寫這篇文章時,我真的不知道如何開始撰寫它。所以我開始思考一些事情,我得到了這個代碼,我確定它是不對的,但我認為這是一個開始:
start_a = 05:01:00
end_a = 14:00:00
start_b = 14:01:00
end_b = 22:00:00
start_c = 22:01:00
end_c = 05:00:00
if df['Start_Time'] > start_a and df['End_Time'] < end_a:
df['A'] = ( df['End_Time'] - start_a ) - ( end_a - df['Start_Time'] )
else:
df['A'] = 0
if df['Start_Time'] > start_b and df['End_Time'] < end_b:
df['B'] = ( df['End_Time'] - start_b ) - ( end_b - df['Start_Time'] )
else:
df['B'] = 0
if df['Start_Time'] > start_c and df['End_Time'] < end_c:
df['C'] = ( df['End_Time'] - start_c ) - ( end_c - df['Start_Time'] )
else:
df['C'] = 0
uj5u.com熱心網友回復:
你的問題比我想象的要困難得多。必須注意的一件事是Start_TimeandEnd_Time可以有不同的日期。此外,C 類跨越兩天。這兩個事實使代碼有點復雜,但它似乎作業。
首先,針對您的問題進行設定。我創建了您的資料框并創建了變數。重要的是這些結構具有正確的資料型別。
import pandas as pd
from io import StringIO
from datetime import datetime, time, date, timedelta
# Create your data frame
data = StringIO("""Product_Name Start_Time End_Time
Product X 2021-10-20 20:32:00 2021-10-21 03:50:00
Product Y 2021-10-21 11:50:00 2021-10-21 16:00:00
Product Z 2022-01-11 20:10:00 2022-01-12 15:30:00""")
df = pd.read_csv(data, sep=' ', engine='python')
# Convert the columns to date time format
df[["Start_Time", "End_Time"]] = df[["Start_Time", "End_Time"]].apply(pd.to_datetime)
# Create the range start and end time as datetime format
start_a = datetime.strptime('05:01:00', '%H:%M:%S')
end_a = datetime.strptime('14:00:00', '%H:%M:%S')
start_b = datetime.strptime('14:01:00', '%H:%M:%S')
end_b = datetime.strptime('22:00:00', '%H:%M:%S')
start_c = datetime.strptime('22:01:00', '%H:%M:%S')
end_c = datetime.strptime('05:00:00', '%H:%M:%S')
然后,我創建了一個可以計算您的問題的小時數的函式。start并且end是為一個范圍定義的時間。該函式現在在幾天內迭代并查看您的范圍有多少適合它。通常,它只需要一次迭代,但您Product Z需要兩天以上的時間,因此需要兩次迭代。
def calc_hours(start_time, end_time, start, end):
# Set range to have date also => allows us to compare to start_time and end_time
range_start = datetime.combine(start_time.date(), start.time())
range_end = datetime.combine(start_time.date(), end.time())
# Special case for range C as end of range is on the next day
if (range_end<range_start):
range_end = range_end timedelta(days=1)
# start_time and end_time can go over one or more days => Iterate over the days and sum the ours in the range
total_hours=0.0
while (range_start < end_time):
# Calculation to get the hours or zero if range is not within time frame
hours_in_frame = max((min(range_end, end_time) - max(range_start, start_time)).total_seconds(), 0)/3600
total_hours = hours_in_frame
# Increment the day to check if range is in time frame
range_start = range_start timedelta(days=1)
range_end = range_end timedelta(days=1)
return total_hours
為了使用該函式并將結果添加到資料框中,我使用apply()了pandas. 獲取資料框的apply()每一行,并使用前面顯示的函式計算一個范圍內的小時數。這對所有三個范圍都進行。
# Use apply to calculate the hours for each row and each range
df['A'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_a, end_a), axis=1)
df['B'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_b, end_b), axis=1)
df['C'] = df.apply(lambda x: calc_hours(x['Start_Time'], x['End_Time'], start_c, end_c), axis=1)
輸出幾乎是您想要的,但沒有四舍五入到小數點后兩位:
Product_Name Start_Time End_Time A B C
0 Product X 2021-10-20 20:32:00 2021-10-21 03:50:00 0.000000 1.466667 5.816667
1 Product Y 2021-10-21 11:50:00 2021-10-21 16:00:00 2.166667 1.983333 0.000000
2 Product Z 2022-01-11 20:10:00 2022-01-12 15:30:00 8.983333 3.316667 6.983333
uj5u.com熱心網友回復:
另一種方法是使用相關范圍的所有序列分鐘數創建一個系列,然后將它們相交以獲得重疊持續時間。
沒有時間提供完整的答案,但我想我會放棄這個想法,你可以從那里拿走它。
創建參考系列:
start = pd.Timestamp('22:01')
end = pd.Timestamp('05:00')
if end < start:
end = pd.Timedelta(days=1)
drC = pd.Series(pd.date_range(start=start, end=end, freq='min')).dt.hour * 60 \
pd.Series(pd.date_range(start=start, end=end, freq='min')).dt.minute
創建一個函式來計算交集和持續時間:
def intersecting_duration(x):
min_of_day = pd.Series(pd.Series(pd.date_range(start=x['Start_Time'], end=x['End_Time'], freq='min')).dt.hour * 60 \
pd.Series(pd.date_range(start=x['Start_Time'], end=x['End_Time'], freq='min')).dt.minute)
dur_mins = len(np.intersect1d(min_of_day, drC))
return 0 if (dur_mins == 0) else (dur_mins-1)/60
然后應用它:
df.apply(intersecting_duration, axis=1)
0 5.816667
1 0.000000
2 6.983333
你需要從那里拿走它。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/412836.html
標籤:
上一篇:我如何找到從現在開始的下周日上午12點的日期,然后再加上10個小時
下一篇:DashPlotly日期時間選擇
