這是我需要完成的作業流程:
- 驗證 column_1 和 column_2 中所有日期的日期格式。
- 如果日期不是兩種格式:mm/dd/yy hh:mm 或 mm/dd/yyyy hh:mm
- 需要幫助- 列印不匹配的值。
注意:我不知道日期的格式是什么,有些根本就不是日期。
示例輸入資料 CSV:
column_1 column_2
8/22/22 15:27 8/24/22 15:27
8/23/22 15:27 Tuesday, August 23, 2022
8/24/22 15:27 abc123
8/25/22 15:27 8/25/2022 15:27
8/26/22 15:27 8/26/2022 18:27
8/26/22 15:27 8/22/22
to_datetime()當函式回傳 ValueError時,以下方法始終按照設計拋出例外。如何驗證日期然后捕獲與format_one或format_two不匹配的值?
df = pd.read_csv('input.csv', encoding='ISO-8859-1', dtype=str)
date_columns = ['column_1', 'column_2']
format_one = '%m/%d/%y %H:%M'
format_two = '%m/%d/%Y %H:%M'
for column in date_columns:
for item in df[column]:
try:
if pd.to_datetime(df[item], format=format_one):
print('format 1: ' item)
elif pd.to_datetime(df[item], format=format_two):
print('format 2: ' item)
else:
print('unknown format: ' item)
except Exception as e:
print('Exception:' )
print(e)
輸出:
Exception:
'8/22/22 15:27'
Exception:
'8/23/22 15:27'
Exception:
'8/24/22 15:27'
Exception:
'8/25/22 15:27'
Exception:
'8/26/22 15:27'
Exception:
'8/26/22 15:27'
Exception:
'8/24/22 15:27'
Exception:
'Tuesday, August 23, 2022'
Exception:
'abc123'
Exception:
'8/25/2022 15:27'
Exception:
'8/26/2022 18:27'
Exception:
'8/22/22'
期望的輸出:
Exception:
'Tuesday, August 23, 2022'
Exception:
'abc123'
Exception:
'8/22/22'
謝謝你。
uj5u.com熱心網友回復:
您需要單獨測驗每種允許的格式(try在問題中給出的示例中,它們目前都在同一個塊中)。通用解決方案可以使用無法由任何格式轉換的屏蔽值。那可能看起來像
import pandas as pd
allowed = ('%m/%d/%y %H:%M', '%m/%d/%Y %H:%M')
# dummy df
df = pd.DataFrame({"date": ["8/24/22 15:27", "Tuesday, August 23, 2022",
"abc123", "8/25/2022 15:27"]})
# this will be our mask, where the input format is invalid.
# initially, assume all invalid.
m = pd.Series([True]*df["date"].size)
# for each allowed format, test where the result is not NaT, i.e. valid.
# update the mask accordingly.
for fmt in allowed:
m[pd.to_datetime(df["date"], format=fmt, errors="coerce").notna()] = False
# invalid format:
print(df["date"][m])
# 1 Tuesday, August 23, 2022
# 2 abc123
# Name: date, dtype: object
應用于問題中的具體示例,可能看起來像
# for reference:
df
column_1 column_2
0 8/22/22 15:27 8/24/22 15:27
1 8/23/22 15:27 Tuesday, August 23, 2022
2 8/24/22 15:27 abc123
3 8/25/22 15:27 8/25/2022 15:27
4 8/26/22 15:27 8/26/2022 18:27
5 8/26/22 15:27 8/22/22
date_columns = ['column_1', 'column_2']
for column in date_columns:
m = pd.Series([True]*df[column].size)
for fmt in allowed:
m[pd.to_datetime(df[column], format=fmt, errors="coerce").notna()] = False
print(f"{column}\n", df[column][m])
# column_1
# Series([], Name: column_1, dtype: object)
# column_2
# 1 Tuesday, August 23, 2022
# 2 abc123
# 5 8/22/22
# Name: column_2, dtype: object
uj5u.com熱心網友回復:
只是分享邏輯思維在技術上是可行的。請嘗試一下。讓我知道它沒有用。
import pandas as pd
df = pd.DataFrame({'date': {0: '8/24/22 15:27', 1: '24/8/22 15:27', 2: 'a,b,c', 3: 'Tuesday, August 23, 2022'}})
mask1 = df.loc[pd.to_datetime(df['date'], errors='coerce',format='%m/%d/%y %H:%M').isnull()]
mask2 = df.loc[pd.to_datetime(df['date'], errors='coerce',format='%d/%m/%y %H:%M').isnull()]
df = pd.merge(mask1,mask2,on = ['date'],how ='inner')
print(df)
觀察樣本#
輸入 df
date
0 8/24/22 15:27
1 24/8/22 15:27
2 a,b,c
3 Tuesday, August 23, 2022
輸出 #
date
0 a,b,c
1 Tuesday, August 23, 2022
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/534229.html
上一篇:保存設定了模式('overwrite')的SparkDataFrame時出現“相關位置已存在”
下一篇:自定義注釋中的默認值不起作用
