我有一個時間序列資料透視表,其中包含 struct timestamp 列,包括記錄start的end時間范圍,如下所示:
import pandas as pd
pd.set_option('max_colwidth', 400)
df = pd.DataFrame({'timestamp': ['{"start":"2022-01-19T00:00:00.000 0000","end":"2022-01-20T00:00:00.000 0000"}'],
"X1": [25],
"X2": [33],
})
df
# timestamp X1 X2
#0 {"start":"2022-01-19T00:00:00.000 0000","end":"2022-01-20T00:00:00.000 0000"} 25 33
由于稍后我將使用時間戳作為時間序列分析的索引,因此我需要將其轉換為僅使用end/的時間戳start。根據這篇文章,我嘗試使用正則運算式找到解決方案可能不成功,如下所示:
df[["start_timestamp", "end_timestamp"]] = (
df["timestamp"].str.extractall(r"(\d \.\d \.\d )").unstack().ffill(axis=1)
)
但我得到:
ValueError:列的長度必須與鍵的長度相同
所以我嘗試達到以下預期的資料框:
df = pd.DataFrame({'timestamp': ['{"start":"2022-01-19T00:00:00.000 0000","end":"2022-01-20T00:00:00.000 0000"}'],
'start_timestamp': ['2022-01-19T00:00:00.000 0000'],
'end_timestamp': ['2022-01-20T00:00:00.000 0000'],
"X1": [25],
"X2": [33]})
df
# timestamp start_timestamp end_timestamp X1 X2
#0 {"start":"2022-01-19T00:00:00.000 0000","end":"2022-01-20T00:00:00.000 0000"} 2022-01-19T00:00:00.000 0000 2022-01-20T00:00:00.000 0000 25 33
uj5u.com熱心網友回復:
您可以通過呼叫提取這兩個值extract:
df[["start_timestamp", "end_timestamp"]] = df["timestamp"].str.extract(r'"start":"([^"]*)","end":"([^"] )')
正"start":"([^"]*)","end":"([^"] )則運算式匹配,然后捕獲除第 1 組(列值)之外的"start":"任何零個或多個字符,然后匹配并捕獲除第 2 組(列值)之外的一個或多個字符。"start","end":""end
此外,如果您擁有的資料是有效的 JSON,您可以決議 JSON 而不是使用正則運算式:
def extract_startend(x):
j = json.loads(x)
return pd.Series([j["start"], j["end"]])
df[["start_timestamp", "end_timestamp"]] = df["timestamp"].apply(extract_startend)
輸出print(df.to_string()):
timestamp X1 X2 start_timestamp end_timestamp
0 {"start":"2022-01-19T00:00:00.000 0000","end":"2022-01-20T00:00:......... 25 33 2022-01-19T00:00:00.000 0000 2022-01-20T00:00:00.000 0000
uj5u.com熱心網友回復:
這可能不是最有效的方法,但它有效:
df[['start_timestamp','end_timestamp']]=df['timestamp'].str.split(',',expand=True)
df['start_timestamp']=df['start_timestamp'].str.extract('(\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{3}\ \d{4})')
df['end_timestamp']=df['end_timestamp'].str.extract('(\d{4}\-\d{2}\-\d{2}T\d{2}\:\d{2}\:\d{2}\.\d{3}\ \d{4})')
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/428713.html
下一篇:使用pandas創建變數的變化率
