如何根據datetime迭代和比較2個pandas列并添加true或false值？-有解無憂

我有一個包含 2 列的 pandas 資料框：預測時間和實際時間。我想要包含真值或假值的第三列。換句話說，如果對于每個預測時間行，時間與同一行中的實際時間之一匹配，或者預測時間介于這些實際時間之一之間，則將“真”添加到第三列值。否則，在行中添加“False”。

任何想法從哪里開始？我假設這需要兩件事：用于比較和迭代每一行以產生新寫入的行值的日期時間模塊？

當前資料框：

;Predicted time;Actual times
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51]
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]]

期望的輸出

;Predicted time;Actual times;True or False
0;[2017-09-09 06:53:37, 2017-09-09 06:53:46];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];FALSE
1;[2017-09-09 06:54:19, 2017-09-09 06:54:43];[2017-09-09 06:54:11, 2017-09-09 06:54:21, 2017-09-09 06:54:29, 2017-09-09 06:55:14, 2017-09-09 06:55:30, 2017-09-09 06:55:51];TRUE
2;[2017-09-09 06:54:44, 2017-09-09 06:54:48];[2017-09-09 06:54:11, 2017-09-09 06:54:21,] [2017-09-09 06:54:29, 2017-09-09 06:55:14], [2017-09-09 06:55:30, 2017-09-09 06:55:51]];TRUE

我還附上了一張圖片，以更清楚地顯示所需的輸出。

如何根據 datetime 迭代和比較 2 個 pandas 列并添加 true 或 false 值？

uj5u.com熱心網友回復：

下面函式中的大部分作業是將資料框中的字串轉換為datetime可用于比較的物件集合 -

def pred_intersects_act(row):
    #Convert predicted times to list of datetime objects
    predicted_time = re.sub(r'\[|\]', '', row['predicted_time'])
    predicted_time = re.sub(r',\ *', ',', predicted_time)
    pt_list = predicted_time.split(',')
    pt_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in pt_list]

    #Convert actual times to list of datetime objects
    actual_times = re.sub(r'\[|\]', '', row['actual_times'])
    actual_times = re.sub(r',\ *', ',', actual_times)
    at_list = actual_times.split(',')
    at_list = [dt.strptime(_, '%Y-%m-%d %H:%M:%S') for _ in at_list]

    #pair up actual times and check for intersection
    for pair in zip(at_list[:-1], at_list[1:]):
        exact_match = any(_ in pair for _ in pt_list)
        approx_match = any(bisect.bisect(pair, _) == 1 for _ in pt_list)
        if exact_match or approx_match:
            return True
    return False

df.apply(pred_intersects_act, axis=1)
0    False
1     True
2     True
dtype: bool

uj5u.com熱心網友回復：

使用串列推導與測驗值之間any的測驗至少一個True：

#for converting to datetimes, in Actual times was removed nested lists
f = lambda x: pd.to_datetime(x.strip('[]').split(', ')).tolist()
df[['Actual times', 'Predicted time']] = df[['Actual times', 'Predicted time']].applymap(f)

df['True or False'] = [any((s < y) & (e > y) for y in x) 
                        for (s, e), x in zip(df['Predicted time'], df['Actual times'])]
print (df)
                               Predicted time  \
0  [2017-09-09 06:53:37, 2017-09-09 06:53:46]   
1  [2017-09-09 06:54:19, 2017-09-09 06:54:43]   
2  [2017-09-09 06:54:44, 2017-09-09 06:54:48]   

                                        Actual times  True or False  
0  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...          False  
1  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...           True  
2  [2017-09-09 06:54:11, 2017-09-09 06:54:21, 201...          False

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/424461.html

標籤：Python 熊猫循环约会时间布尔值

上一篇：python定義從API檢索資料然后放入資料框的函式

下一篇：鏈表實作出錯