我在將一些遞回 CTE 代碼從 PostgreSQL 改編到 SQL Server 時遇到了麻煩,來自“Fighting Churn with Data”一書
這是作業的 PostgreSQL 代碼:
with recursive
active_period_params as (
select interval '30 days' as allowed_gap,
'2021-09-30'::date as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= (e.start_date-allowed_gap)::date
)
select account_id, min(start_date) as start_date
from active
group by account_id
這是我嘗試轉換為 SQL Server。它陷入了一個回圈。我相信這個問題與 SQL Server 所需的 UNION ALL 有關。
with
active_period_params as (
select 30 as allowed_gap,
cast('2021-09-30' as date) as calc_date
),
active as (
-- anchor
select distinct account_id, min(start_date) as start_date
from subscription inner join active_period_params
on start_date <= calc_date
and (end_date > calc_date or end_date is null)
group by account_id
UNION ALL
-- recursive
select s.account_id, s.start_date
from subscription s
cross join active_period_params
inner join active e on s.account_id=e.account_id
and s.start_date < e.start_date
and s.end_date >= dateadd(day, -allowed_gap, e.start_date)
)
select account_id, min(start_date) as start_date
from active
group by account_id
訂閱表是屬于客戶的訂閱串列。客戶可以有多個具有重疊日期或日期間隔的訂閱。null end_date 表示訂閱當前處于活動狀態并且沒有定義的 end_date。下面是單個客戶 (account_id = 15) 的示例資料:
subscription
---------------------------------------------------
| id | account_id | start_date | end_date |
---------------------------------------------------
| 6 | 15 | 01/06/2021 | null |
| 5 | 15 | 01/01/2021 | null |
| 4 | 15 | 01/06/2020 | 01/02/2021 |
| 3 | 15 | 01/04/2020 | 15/05/2020 |
| 2 | 15 | 01/03/2020 | 15/05/2020 |
| 1 | 15 | 01/06/2019 | 01/01/2020 |
預期查詢結果(由 PostgreSQL 代碼生成):
------------------------------
| account_id | start_date |
------------------------------
| 15 | 01/03/2020 |
問題:上面的 SQL Server 代碼卡在回圈中并且不會產生結果。
PostgreSQL 代碼說明:
- anchor block finds subs that are active as at the calc_date (30/09/2021) (id 5 & 6), and returns the min start_date (01/01/2021)
- the recursion block then looks for any earlier subs that existed within the allowed_gap, which is 30 days prior to the min_start date found in 1). id 4 meets this criteria, so the new min start_date is 01/06/2020
- recursion repeats and finds two subs within the allowed_gap (01/06/2020 - 30 days). Of these subs (id 2 & 3), the new min start_date is 01/03/2020
- recursion fails to find an earlier sub within the allowed_gap (01/03/2020 - 30 days)
- query returns a start date of 01/03/2020 for account_id 15
Any help appreciated!
uj5u.com熱心網友回復:
問題似乎與 SQL Server 處理遞回 CTE 的方式有關。
這是一種間隙和孤島問題,實際上并不需要遞回。
有很多解決方案,這里是一個。根據您的要求,可能有更有效的方法,但這應該可以幫助您入門。
- 使用
LAG我們識別下一行的指定間隙內的行 - 我們使用一個運行
COUNT來給每一個連續的行集一個 ID - 我們按那個 ID 分組,取最小值
start_date,過濾掉不符合條件的組 - 再次分組以獲得每個帳戶的最小值
DECLARE @allowed_gap int = 30,
@calc_date datetime = cast('2021-09-30' as date);
WITH PrevValues AS (
SELECT *,
IsStart = CASE WHEN ISNULL(LAG(end_date) OVER (PARTITION BY account_id
ORDER BY start_date), '2099-01-01') < DATEADD(day, -@allowed_gap, start_date)
THEN 1 END
FROM subscription
),
Groups AS (
SELECT *,
GroupId = COUNT(IsStart) OVER (PARTITION BY account_id
ORDER BY start_date ROWS UNBOUNDED PRECEDING)
FROM PrevValues
),
ByGroup AS (
SELECT
account_id,
GroupId,
start_date = MIN(start_date)
FROM Groups
GROUP BY account_id, GroupId
HAVING COUNT(CASE WHEN start_date <= @calc_date
and (end_date > @calc_date or end_date is null) THEN 1 END) > 0
)
SELECT
account_id,
start_date = MIN(start_date)
FROM ByGroup
GROUP BY account_id;
資料庫<>小提琴
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/323265.html
標籤:sql sql-server tsql
