我正在嘗試更改我的資料框以創建桑基圖。
我有 300 萬行是這樣的:
client_id | | start_date | end_date | position
1234 16-07-2019 27-03-2021 3
1234 18-07-2021 09-10-2021 1
1234 28-03-2021 17-07-2021 2
1234 10-10-2021 20-11-2021 2
我希望它看起來像這樣:
client_id | | start_date | end_date | position | source | target
1234 16-07-2019 27-03-2021 3 3 2
1234 18-07-2021 09-10-2021 1 1 2
1234 28-03-2021 17-07-2021 2 2 1
1234 10-10-2021 20-11-2021 2 2 4
值 4 是我用作“流程中的退出”的值。
我不知道該怎么做。
背景:源值和目標值包含基于 start_date 和 end_date 的位置值。因此,例如在第一行中,源是位置值 3,但目標是位置值 2,因為在結束日期之后客戶端從位置 3 更改為 2。
uj5u.com熱心網友回復:
因為源和目標是按每個客戶的日期順序計算的。因此可以訂購日期并找到其下一個位置。
columns = ["client_id" ,"start_date","end_date","position"]
data = [
["1234","16-07-2019","27-03-2021",3],
["1234","18-07-2021","09-10-2021",1],
["1234","28-03-2021","17-07-2021",2],
["1234","10-10-2021","20-11-2021",2],
["5678","16-07-2019","27-03-2021",3],
["5678","18-07-2021","09-10-2021",1],
["5678","28-03-2021","17-07-2021",2],
["5678","10-10-2021","20-11-2021",2],
]
df = pd.DataFrame(
data,
columns=columns
)
df = df.assign(
start_date = pd.to_datetime(df["start_date"]),
end_date = pd.to_datetime(df["end_date"])
)
sdf = df.assign(
rank=df.groupby("client_id")["start_date"].rank()
)
sdf = sdf.assign(
next_rank=sdf["rank"] 1
)
combine_result = pd.merge(sdf,
sdf[["client_id", "position", "rank"]],
left_on=["client_id", "next_rank"],
right_on=["client_id", "rank"],
how="left",
suffixes=["", "_next"]
).fillna({"position_next": 4})
combine_result[["client_id", "start_date", "end_date", "position", "position_next"]].rename(
{"position": "source", "position_next": "target"}, axis=1).sort_values(["client_id", "start_date"])
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/488336.html
標籤:Python python-3.x 熊猫 数据框 桑基图
上一篇:合并熊貓Df中的特定行
