我有一個包含字串的列,例如:Posted: 1 day ago, Posted: 2 days ago。我想將此列轉換為日期列,即:datetime.date(2021, 12, 22), datetime.date(2021, 12, 21)。
我嘗試結合使用正則運算式組df.replace()以在緊湊的操作中實作它:
df2 = df.replace({r"Posted: (\d ) days? ago": str(date.today() - timedelta(int(r"\1")))}, regex=True)
但這會導致ValueError: invalid literal for int() with base 10: '\\1'錯誤,因為將int()其輸入評估為不是對早期正則運算式組的參考,而是作為文字字串。僅僅獲得匹配的模式就可以正常作業,但如果我只想保留列中的數值,而不是將其轉換為日期時間物件,則以下兩個中的任何一個都可以作業:
df2 = df.replace({r"Posted: (\d ) days? ago": "\g<1>"}, regex=True)
df2 = df.replace({r"Posted: (\d ) days? ago": r"\1"}, regex=True)
如何獲取參考的正則運算式值以將其傳遞給timedelta()?
完整代碼:
import pandas as pd
from datetime import date, timedelta
df = pd.DataFrame(
[['Posted: 1 day ago', 'xa01332cs', 101],
['Posted: 2 days ago', 'd11as99101', 630],
['Posted: 11 days ago', '12011rww1a', 301]
],
columns = ['Date', 'Code', 'Value']
)
def preprocess(df):
#df2 = df.replace({r"Posted: (\d ) days? ago": "\g<1>"}, regex=True) # this works
#df2 = df.replace({r"Posted: (\d ) days? ago": r"\1"}, regex=True) # this works identically to previous row
df2 = df.replace({r"Posted: (\d ) days? ago": str(date.today() - timedelta(int(r"\1")))}, regex=True)
return df2
preprocess(df)
uj5u.com熱心網友回復:
你不能使用date - timedelta,但你可以使用datetime - timedelta:
from datetime import datetime, timedelta
df['Date'] = datetime.datetime.today() - df.Date.str.extract('Posted: (\d ) days? ago')[0].astype(int).apply(timedelta)
輸出:
>>> df
Date Code Value
0 2021-12-22 08:33:03.396630 xa01332cs 101
1 2021-12-21 08:33:03.396630 d11as99101 630
2 2021-12-12 08:33:03.396630 12011rww1a 301
uj5u.com熱心網友回復:
您可以提取數字,將其轉換為 timedelta,然后減去:
df['New Date'] = datetime.datetime.today() - df['Date'].str.extract(r"Posted: (\d ) days? ago").astype(int) * pd.Timedelta('1D')
輸出:
Date Code Value New Date
0 Posted: 1 day ago xa01332cs 101 2021-12-22 10:36:13.361973
1 Posted: 2 days ago d11as99101 630 2021-12-21 10:36:13.361973
2 Posted: 11 days ago 12011rww1a 301 2021-12-12 10:36:13.361973
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/392038.html
