我有一個玩具示例如下

我想將規則中的操作列合并到原始的df。合并條件如下。
- (值 >= 下限) & (值 < 上限)
- df 中的日期必須與規則中最近的前一個日期合并
預期輸出如上圖所示。這是df和規則
df = pd.DataFrame({"date": ["2022-05-15", "2022-05-20", "2022-05-25", "2022-05-30"],
"values": [10, 20, 30, 80]})
df["date"] = pd.to_datetime(df["date"])
rules = pd.DataFrame({"lower": [0, 25, 50, 75, 0],
"upper": [25, 50, 75, float("inf"), 25],
"actions": [5, 10, 15, 20, 8],
"date": ["2022-01-01", "2022-01-01", "2022-01-01", "2022-01-01", "2022-05-18"]})
rules["date"] = pd.to_datetime(rules["date"])
我可以對有效的方法提出建議嗎?
我正在嘗試使用 pandasql 以另一種方式解決這個問題,因為這樣可以在 SQL 中輕松完成連接。這是我的代碼
from pandasql import sqldf
sql = """SELECT DISTINCT on (df.date)
df.date,
df.values,
rules.actions
FROM df
LEFT JOIN rules
ON (df.date > rules.date) AND (df.values >= rules.lower) AND (df.values < rules.upper)
ORDER BY df.date, rules.date DESC"""
pysqldf = lambda x: sqldf(x)
pysqldf(sql)
即使 sql 陳述句在 postgres 中運行,但當我使用 pandasql 運行時它也不起作用。我收到以下錯誤。
PandaSQLException: (sqlite3.OperationalError) near "on": syntax error
[SQL: SELECT DISTINCT on (df.date)
df.date,
df.values,
rules.actions
FROM df
LEFT JOIN rules
ON (df.date > rules.date) AND (df.values >= rules.lower) AND (df.values < rules.upper)
ORDER BY df.date, rules.date DESC]
(Background on this error at: https://sqlalche.me/e/14/e3q8)
我忽略了什么嗎?
uj5u.com熱心網友回復:
一種選擇是使用pyjanitor的 conditional_join ,在合并之后,您可以執行 groupby 來獲得最少的行數:
# install from dev
# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import janitor
import pandas as pd
(df
.astype({'values':float})
.conditional_join(
rules.astype({'lower':float}),
# pass the conditions as a variable arguments of tuples
('values', 'lower', '>='),
('values', 'upper', '<'),
('date', 'date', '>'),
# select required columns with df_columns, and right_columns
df_columns = ['date','values'],
right_columns={'actions':'actions', 'date':'date_right'})
# get the difference and keep the smallest days
.assign(dff = lambda df: df.date.sub(df.date_right))
.sort_values(['date', 'dff'])
.drop(columns = ['dff', 'date_right'])
.groupby('date', sort = False, as_index = False)
.nth(0)
)
date values actions
0 2022-05-15 10.0 5
3 2022-05-20 20.0 8
4 2022-05-25 30.0 10
5 2022-05-30 80.0 20
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/486602.html
