我有一個 CSV,它有幾列包含日期時間資料。洗掉“無效”的行后,我希望能夠找到每行的 Min 和 Max Datetime 值,并將此結果作為新列放置。但是,當我嘗試為這 2 個新列添加代碼時,我似乎收到了未來警告錯誤。
FutureWarning: Dropping of nuisance columns in DataFrame reductions (with 'numeric_only=None') is deprecated; in a future version this will raise TypeError. Select only valid columns before calling the reduction.
這是我的腳本:
import pandas as pd
import os
import glob
# Datetime columns
dt_columns = ['DT1','DT2','DT3']
df = pd.read_csv('full_list.csv', dtype=object)
# Remove "invalid" Result rows
remove_invalid = df['Result'].str.contains('Invalid')
df.drop(index=df[remove_invalid].index, inplace=True)
for eachCol in dt_columns:
df[eachCol] = pd.to_datetime(df[eachCol]).dt.strftime('%Y-%m-%d %H:%M:%S')
# Create new columns
df['MinDatetime'] = df[dt_columns].min(axis=1)
df['MaxDatetime'] = df[dt_columns].max(axis=1)
# Save as CSV
df.to_csv('test.csv', index=False)
對于背景關系,CSV 看起來像這樣(通常包含 100000 行):
Area Serial Route DT1 DT2 DT3 Result
Brazil 17763 4 13/08/2021 23:46:31 16/10/2021 14:04:27 28/10/2021 08:19:59 Confirmed
China 28345 2 15/09/2021 03:09:21 24/04/2021 09:56:34 04/05/2021 22:07:13 Confirmed
Malta 13630 5 21/03/2021 11:59:27 18/09/2021 11:03:25 02/07/2021 02:32:48 Invalid
Serbia 49478 2 12/04/2021 06:38:05 19/03/2021 03:16:47 29/06/2021 06:39:30 Confirmed
France 34732 1 29/04/2021 03:03:14 24/03/2021 01:49:48 26/04/2021 06:44:21 Invalid
Mexico 21840 3 23/11/2021 12:53:33 10/01/2022 02:42:48 29/04/2021 14:22:51 Invalid
Ukraine 20468 3 04/11/2021 18:40:44 13/11/2021 03:38:39 11/03/2021 09:09:14 Invalid
China 28830 1 07/02/2021 23:50:34 03/12/2021 14:04:32 14/07/2021 22:59:10 Confirmed
India 49641 4 02/06/2021 11:17:35 09/05/2021 13:51:55 19/01/2022 06:56:07 Confirmed
Greece 43163 3 30/11/2021 09:31:29 28/01/2021 08:52:50 12/05/2021 07:49:48 Invalid
我希望每行的最小值和最大值作為一個新列,但目前代碼生成了沒有資料的 2 列。我做錯了什么?
uj5u.com熱心網友回復:
沒有更好的過濾drop:
# Remove "invalid" Result rows
remove_invalid = df['Result'].str.contains('Invalid')
df = df[~remove_invalid]
您需要使用日期時間,因此在轉換使用Series.dt.strftime日期時間的字串 repr 后不能:
for eachCol in dt_columns:
df[eachCol] = pd.to_datetime(df[eachCol], dayfirst=True)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/421493.html
標籤:
