我在 BigQuery 中有以下結構的資料:
select
1 as user_id, date('2021-01-01') as transaction_date, 1 as value
union all (select 1, '2021-01-02', 2)
union all (select 1, '2021-01-05', 2)
union all (select 1, '2021-02-01', 2)
union all (select 1, '2021-02-03', 2)
union all (select 2, '2021-01-02', 2)
union all (select 2, '2021-02-01', 2)
union all (select 2, '2021-02-03', 3)
我想為每個用戶和每個月填寫我們有資料的月份的第一天和最后一天之間的缺失值。 這對用戶的輸出應該是:
1,2021-01-01,1
1,2021-01-02,2
1,2021-01-03,null
1,2021-01-04,null
1,2021-01-05,2
1,2021-02-01,2
1,2021-02-02,null
1,2021-02-03,2
2,2021-01-02,2
2,2021-02-01,2
2,2021-02-02,null
2,2021-02-03,3
在 bigquery 中執行此操作的最簡單方法是什么?
uj5u.com熱心網友回復:
一種方法是使用GENERATE_ARRAY()和UNNEST制作模板,并LEFT OUTER JOIN使用原始資料對模板進行操作。
詢問
WITH org_data AS (
select 1 as user_id, date('2021-01-01') as transaction_date, 1 as value
union all (select 1, '2021-01-02', 2) union all (select 1, '2021-01-05', 2)
union all (select 1, '2021-02-01', 2) union all (select 1, '2021-02-03', 2)
union all (select 2, '2021-01-02', 2) union all (select 2, '2021-02-01', 2)
union all (select 2, '2021-02-03', 3)
),
candidate_user_month AS (
SELECT
user_id,
EXTRACT(YEAR FROM transaction_date) AS transaction_year,
EXTRACT(MONTH FROM transaction_date) AS transaction_month,
COUNT(1) as date_count,
MIN(EXTRACT(DAY FROM transaction_date)) as min_day,
MAX(EXTRACT(DAY FROM transaction_date)) as max_day,
FROM org_data
GROUP BY user_id, transaction_year, transaction_month
HAVING date_count >= 2
),
template AS (
WITH
date_array AS (
SELECT
user_id, transaction_year, transaction_month,
GENERATE_ARRAY(min_day, max_day, 1) as dates,
FROM candidate_user_month
)
SELECT
user_id,
DATE(transaction_year, transaction_month, dates) as transaction_date,
NULL as value_with_null,
FROM date_array, UNNEST(dates) as dates
)
SELECT user_id, transaction_date, value,
FROM template
FULL OUTER JOIN org_data USING (user_id, transaction_date)
ORDER BY user_id, transaction_date
;
結果

轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/363127.html
