我有一張看起來像下面這樣的表
| 時間 | 團體 | 小組 | 數數 |
|---|---|---|---|
| 2022-01-01 | 一個 | 真的 | 3個 |
| 2022-01-01 | 一個 | 錯誤的 | 1個 |
| 2022-01-01 | 乙 | 真的 | 2個 |
| 2022-01-01 | 乙 | 錯誤的 | 1個 |
| 2022-01-02 | 一個 | 錯誤的 | 2個 |
| 2022-01-02 | 一個 | 真的 | 5個 |
| 2022-01-02 | 乙 | 錯誤的 | 3個 |
| 2022-01-03 | 一個 | 錯誤的 | 3個 |
| 2022-01-03 | 乙 | 錯誤的 | 4個 |
| 2022-01-03 | 乙 | 真的 | 3個 |
因此,每天每個組 子組的計數都在增加,除非在組 子組的計數沒有改變的那一天,該行丟失了。
在上面的示例中,缺少的行將是:...
| 2022-01-02 | 乙| 真 | 2 |
...
| 2022-01-03 | 一個| 真 | 5 |
...
為了便于資料處理,我需要每天為所有組 子組提供一個連續的時間戳。所以結果看起來像這樣:
| 時間 | 團體 | 小組 | 數數 |
|---|---|---|---|
| 2022-01-01 | 一個 | 真的 | 3個 |
| 2022-01-01 | 一個 | 錯誤的 | 1個 |
| 2022-01-01 | 乙 | 真的 | 2個 |
| 2022-01-01 | 乙 | 錯誤的 | 1個 |
| 2022-01-02 | 一個 | 錯誤的 | 2個 |
| 2022-01-02 | 一個 | 真的 | 5個 |
| 2022-01-02 | 乙 | 錯誤的 | 3個 |
| 2022-01-02 | 乙 | 真的 | 2個 |
| 2022-01-03 | 一個 | 錯誤的 | 3個 |
| 2022-01-03 | 一個 | 真的 | 5個 |
| 2022-01-03 | 乙 | 錯誤的 | 4個 |
| 2022-01-03 | 乙 | 真的 | 3個 |
我怎么能做到這一點?可能是一些parition by...over選擇構造,但在這種情況下,我無法理解如何按其他組的時間戳進行磁區,因為我沒有 NULL 計數來向前填充每個組作為中間值。
更新:到目前為止,我似乎已經達到了這樣的組之間填充缺失時間戳的中間狀態(基本上這里每天的頻率都可以):
with time_range as (
select min(time) as start_time, -- current_timestamp - interval '2 day'
max(time) as end_time
from my_table-- current_timestamp
),
interested_events as (
select e.group, e.sub_group, e.time, e.count
from my_table e
),
classes_having_events as (
select distinct group, sub_group
from interested_events
ORDER BY group, sub_group
),
periods as (
select ts as period_start, ts interval '1 day' as period_end
from generate_series(
(select start_time from time_range),
(select end_time from time_range) - interval '1 second',
interval '1 day') ts
), resampled as (
SELECT period_start,
period_end,
classes_having_events.group,
classes_having_events.sub_group,
interested_events.count
FROM periods
CROSS JOIN classes_having_events
LEFT JOIN interested_events
ON time >= period_start AND time < period_end
AND interested_events.group = classes_having_events.group
AND interested_events.sub_group = classes_having_events.sub_group
ORDER BY period_start DESC
)
uj5u.com熱心網友回復:
好吧,看來我已經很接近了,橡皮鴨除錯很有幫助。
這似乎做了我想要的:
WITH time_range AS (
SELECT MIN(time) AS start_time, -- current_timestamp - interval '2 day'
MAX(time) AS end_time
FROM my_table-- current_timestamp
),
interested_events AS (
SELECT e.group, e.sub_group, e.time, e.count
FROM my_table e
),
classes_having_events AS (
SELECT DISTINCT
GROUP, sub_group
FROM interested_events
ORDER BY
GROUP, sub_group
),
periods AS (
SELECT ts AS period_start, ts INTERVAL '1 day' AS period_end
FROM GENERATE_SERIES(
(
SELECT start_time
FROM time_range
),
(
SELECT end_time
FROM time_range
) - INTERVAL '1 second',
INTERVAL '1 day') ts
),
resampled AS (
SELECT period_start,
period_end,
classes_having_events.group,
classes_having_events.sub_group,
interested_events.count
FROM periods
CROSS JOIN classes_having_events
LEFT JOIN interested_events
ON time >= period_start AND time < period_end
AND interested_events.group = classes_having_events.group
AND interested_events.sub_group = classes_having_events.sub_group
ORDER BY period_start DESC
)
SELECT period_start AS time,
"group",
sub_group,
MAX(count) OVER (PARTITION BY "group", "sub_group" ORDER BY period_start) AS count
FROM resampled
ORDER BY period_start DESC, "group", sub_group;
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/536546.html
