我有一個帶有開始和結束日期時間的元素資料框。查找日期交叉點的最佳選擇是什么?我現在的幼稚方法包括兩個嵌套回圈交叉比較元素,這顯然非常慢。實作這一目標的更好方法是什么?
dict = {}
start = "start_time"
end = "end_time"
for index1, rowLoop1 in df[{start, end}].head(500).iterrows():
matches = []
dict[(index1, rowLoop1[start])] = 0
for index2, rowLoop2 in df[{start,end}].head(500).iterrows():
if index1 != index2:
if date_intersection(rowLoop1[start], rowLoop1[end], rowLoop2[start], rowLoop2[end]):
dict[(index1, rowLoop1[start])] = 1
date_intersection 的代碼:
def date_intersection(t1start, t1end, t2start, t2end):
if (t1start <= t2start <= t2end <= t1end): return True
elif (t1start <= t2start <= t1end):return True
elif (t1start <= t2end <= t1end):return True
elif (t2start <= t1start <= t1end <= t2end):return True
else: return False
樣本資料:
id,start_date,end_date
41234132,2021-01-10 10:00:05,2021-01-10 10:30:27
64564512,2021-01-10 10:10:00,2021-01-11 10:28:00
21135765,2021-01-12 12:30:00,2021-01-12 12:38:00
87643252,2021-01-12 12:17:00,2021-01-12 12:42:00
87641234,2021-01-12 12:58:00,2021-01-12 13:17:00
uj5u.com熱心網友回復:
您可以執行一些操作,例如將資料框與自身合并以獲取笛卡爾積并比較列。
df = df.merge(df, how='cross', suffixes=('','_2'))
df['date_intersection'] = (((df['start_date'].le(df['start_date_2']) & df['start_date_2'].le(df['end_date'])) | # start 2 within start/end
(df['start_date'].le(df['end_date_2']) & df['end_date_2'].le(df['end_date'])) | # end 2 within start/end
(df['start_date_2'].le(df['start_date']) & df['start_date'].le(df['end_date_2'])) | # start within start 2/end 2
(df['start_date_2'].le(df['end_date']) & df['end_date'].le(df['end_date_2']))) & # end within start 2/end 2
df['id'].ne(df['id_2'])) # id not compared to itself
然后回傳ID,如果他們有一個日期交集......
df.groupby('id')['date_intersection'].any()
id
21135765 True
41234132 True
64564512 True
87641234 False
87643252 True
或者如果您需要相交的 id
df.loc[df['date_intersection'], :].groupby(['id'])['id_2'].agg(list).to_frame('intersected_ids')
intersected_ids
id
21135765 [87643252]
41234132 [64564512]
64564512 [41234132]
87643252 [21135765]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/422818.html
標籤:
