我希望能夠使用 SQL 在 BigQuery 中實作“上限”累積和。
這就是我的意思:我有一個表,其中的行包含一個值每天增加/減少的數量,但該值不能低于 0 或高于 100。我想計算更改的累積總和以跟蹤這個值。
例如,考慮下表:
day | change
--------------
1 | 70
2 | 50
3 | 20
4 | -30
5 | 10
6 | -90
7 | 20
我想創建一個具有上限累積總和的列,使其看起來像這樣:
day | change | capped cumsum
----------------------------
1 | 70 | 70
2 | 50 | 100
3 | 20 | 100
4 | -30 | 70
5 | 10 | 80
6 | -90 | 0
7 | 20 | 20
簡單地SUM (change) OVER (ORDER BY day)將值設定為 100 和 0 是行不通的。我需要某種遞回回圈,但我不知道如何在 BigQuery 中實作它。
最終我也想在磁區上做這個,所以如果我有類似的東西
day | class | change
--------------
1 | A | 70
1 | B | 12
2 | A | 50
2 | B | 83
3 | A | -30
3 | B | 17
4 | A | 10
5 | A | -90
6 | A | 20
我可以對每個類進行磁區的上限累積總和。
uj5u.com熱心網友回復:
我需要某種遞回回圈,但我不知道如何在 BigQuery 中實作它
超級幼稚/基于游標的方法
declare cumulative_change int64 default 0;
create temp table temp_table as (
select * , 0 as capped_cumsum from your_table where false
);
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
insert into temp_table (select rec.*, cumulative_change);
end for;
select * from temp_table order by day;
如果應用于您問題中的樣本資料 - 輸出是

稍微修改選項,使用陣列而不是臨時表
declare cumulative_change int64 default 0;
declare result array<struct<day int64, change int64, capped_cumsum int64>>;
for rec in (select * from your_table order by day)
do
set cumulative_change = cumulative_change rec.change;
set cumulative_change = case when cumulative_change < 0 then 0 when cumulative_change > 100 then 100 else cumulative_change end;
set result = array(select as struct * from unnest(result) union all select as struct rec.*, cumulative_change);
end for;
select * from unnest(result) order by day;
PS 到目前為止,我不喜歡上述任何一個選項 :o)
同時,這種方法可能適用于相對較小的表、資料集
uj5u.com熱心網友回復:
使用RECURSIVE CTE可以是另一種選擇:
DECLARE sample ARRAY<STRUCT<day INT64, change INT64>> DEFAULT [
(1, 70), (2, 50), (3, 20), (4, -30), (5, 10), (6, -90), (7, 20)
];
WITH RECURSIVE ccsum AS (
SELECT 0 AS n, vals[OFFSET(0)] AS change,
CASE
WHEN vals[OFFSET(0)] > 100 THEN 100
WHEN vals[OFFSET(0)] < 0 THEN 0
ELSE vals[OFFSET(0)]
END AS cap_csum
FROM sample
UNION ALL
SELECT n 1 AS n, vals[OFFSET(n 1)] AS change,
CASE
WHEN cap_csum vals[OFFSET(n 1)] > 100 THEN 100
WHEN cap_csum vals[OFFSET(n 1)] < 0 THEN 0
ELSE cap_csum vals[OFFSET(n 1)]
END AS cap_csum
FROM ccsum, sample
WHERE n < ARRAY_LENGTH(vals) - 1
),
sample AS (
SELECT ARRAY_AGG(change ORDER BY day) vals FROM UNNEST(sample)
)
SELECT * EXCEPT(n) FROM ccsum ORDER BY n;
輸出:

uj5u.com熱心網友回復:
最終我也想在磁區上做這個......
考慮以下解決方案
create temp function cap_value(value int64, lower_boundary int64, upper_boundary int64) as (
least(greatest(value, lower_boundary), upper_boundary)
);
with recursive temp_table as (
select *, row_number() over(partition by class order by day) as n from your_table
), iterations as (
select 1 as n, day, class, change, cap_value(change, 0, 100) as capped_cumsum
from temp_table
where n = 1
union all
select t.n, t.day, t.class, t.change, cap_value(i.capped_cumsum t.change, 0, 100) as capped_cumsum
from temp_table t
join iterations i
on t.n = i.n 1
and t.class = i.class
)
select * except(n) from iterations
order by class, day
如果應用于您問題中的樣本資料 - 輸出是

轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/473792.html
上一篇:如何將列中的值除以R中的條件?
