如何使用嵌套列中“特殊”值的索引(例如:該嵌套列中最大值的索引)來使用該索引從另一個嵌套列中選擇一個值?
例如,考慮一個具有以下架構的表:
| 欄位名稱 | 型別 | 模式 |
|---|---|---|
| ID | 細繩 | 可空 |
| 用戶名 | 細繩 | 可空 |
| ▼ 產品 | 記錄 | 可空 |
| ▼ 串列 | 記錄 | 重復 |
| 物品 | 細繩 | 可空 |
| ▼ 訂購 | 記錄 | 可空 |
| ▼ 串列 | 記錄 | 重復 |
| 物品 | 整數 | 可空 |
| 訂單總數 | 細繩 | 可空 |
| 更新時間 | 時間戳 | 可空 |
| 更新 ID | 整數 | 可空 |
前幾行看起來像:
| 排 | ID | 用戶名 | 產品.list.item | 有序串列項 | 訂單總數 | 更新時間 | 更新 ID |
|---|---|---|---|---|---|---|---|
| 1 | 1234 | 圖靈 | 蘋果 | 1 | 3 | 2021-08-14 20:03:22.100846 UTC | 121231 |
| 橘子 | 0 | ||||||
| 梨 | 2 | ||||||
| 2 | 5678 | g_hopper | 蘋果 | 0 | 2 | 2021-08-15 09:36:48.220464 UTC | 121232 |
| 橘子 | 2 | ||||||
| 梨 | 0 | ||||||
| 3 | 1122 | a_lovelace | 蘋果 | 0 | 1 | 2021-08-15 13:59:03.441506 UTC | 121233 |
| 橘子 | 1 | ||||||
| 梨 | 0 | ||||||
| 4 | 3344 | v_nabokov | 蘋果 | 1 | 2 | 2021-08-17 17:34:53.415406 UTC | 121234 |
| 橘子 | 0 | ||||||
| 梨 | 1 |
I want to select the most ordered product for each id's most recent order and exclude orders that don't have a most ordered product (for instance if a customer ordered the same number of Apple, Orange and Pear).
The query I currently use is a chain of CTEs, one for each product type plus an extra column which is the max number of products ordered by each user (max_ordered). I then join together the CTEs using the id column:
WITH RANKED_ORDERS AS(
SELECT *, ROW_NUMBER() OVER (PARTITION BY id ORDER BY update_time DESC) AS rn
FROM mycompany.engagement.products_ordered),
LATEST_ORDERS AS(
SELECT * FROM RANKED_ORDERS WHERE rn = 1),
-- ---------------------- Apples Ordered -----------------------
APPLES_INDEXED AS(
SELECT id, username, ordered, flattened_products, offset_nk, update_time, rn
FROM LATEST_ORDERS
CROSS JOIN UNNEST(LATEST_ORDERS.products.list) AS flattened_products
WITH OFFSET as offset_nk
WHERE flattened_products.item in ('Apple')
ORDER BY offset_nk),
APPLES_ORDERED AS(
SELECT id, username, update_time, ordered.list[OFFSET(offset_nk)].item as apples_ordered
FROM APPLES_INDEXED
ORDER BY
update_time ASC),
-- ---------------------- Oranges Ordered ----------------------
ORANGES_INDEXED AS(
SELECT id, username, ordered, flattened_products, offset_nk, update_time, rn
FROM LATEST_ORDERS
CROSS JOIN UNNEST(LATEST_ORDERS.products.list) AS flattened_products
WITH OFFSET as offset_nk
WHERE flattened_products.item in ('Orange')
ORDER BY offset_nk),
ORANGES_ORDERED AS(
SELECT id, username, update_time, ordered.list[OFFSET(offset_nk)].item as oranges_ordered
FROM ORANGES_INDEXED
ORDER BY
update_time ASC),
-- ---------------------- Pears Ordered -----------------------
PEARS_INDEXED AS(
SELECT id, username, ordered, flattened_products, offset_nk, update_time, rn
FROM LATEST_ORDERS
CROSS JOIN UNNEST(LATEST_ORDERS.products.list) AS flattened_products
WITH OFFSET as offset_nk
WHERE flattened_products.item in ('Pear')
ORDER BY offset_nk),
PEARS_ORDERED AS(
SELECT id, username, update_time, ordered.list[OFFSET(offset_nk)].item as pears_ordered
FROM PEARS_INDEXED
ORDER BY
update_time ASC),
-- --------------- Max Product Ordered per Order --------------
MAX_ORDERED AS(
SELECT
id, username, MAX(orders_per_username.item) as max_ordered, total_orders
FROM
LATEST_ORDERS, UNNEST(ordered.list) as orders_per_username
GROUP BY id, username, total_orders),
-- -------------------- Orders In Columns ---------------------
ORDERS_IN_COLUMNS AS(
SELECT APPLES_ORDERED.username, APPLES_ORDERED.update_time, APPLES_ORDERED.apples_ordered,
ORANGES_ORDERED.oranges_ordered, PEARS_ORDERED.pears_ordered, MAX_ORDERED.max_ordered
FROM APPLES_ORDERED
LEFT JOIN ORANGES_ORDERED ON ORANGES_ORDERED.id = APPLES_ORDERED.id
LEFT JOIN PEARS_ORDERED ON PEARS_ORDERED.id = APPLES_ORDERED.id
LEFT JOIN MAX_ORDERED ON MAX_ORDERED.id = APPLES_ORDERED.id),
-- ------- Orders with a most ordered product -----------------
NO_CONFLICTS AS(
SELECT * FROM ORDERS_IN_COLUMNS
WHERE
max_ordered > 0 AND
(
(apples_ordered not in (oranges_ordered, pears_ordered) AND apples_ordered = max_ordered)
OR
(oranges_ordered not in (apples_ordered, pears_ordered) AND oranges_ordered = max_ordered)
OR
(pears_ordered not in (apples_ordered, oranges_ordered) AND pears_ordered = max_ordered)
)
)
SELECT * FROM NO_CONFLICTS
This returns the following table:
| Row | username | update_time | apples_ordered | oranges_ordered | pears_ordered | max_ordered |
|---|---|---|---|---|---|---|
| 1 | a_turing | 2021-08-14 20:03:22.100846 UTC | 1 | 0 | 2 | 2 |
| 2 | g_hopper | 2021-08-15 09:36:48.220464 UTC | 0 | 2 | 0 | 2 |
| 3 | a_lovelace | 2021-08-15 13:59:03.441506 UTC | 0 | 1 | 0 | 1 |
which is expected.
However, I can't figure out how to simply return a table that looks like:
| Row | username | update_time | max_product_ordered |
|---|---|---|---|
| 1 | a_turing | 2021-08-14 20:03:22.100846 UTC | Pear |
| 2 | g_hopper | 2021-08-15 09:36:48.220464 UTC | Orange |
| 3 | a_lovelace | 2021-08-15 13:59:03.441506 UTC | Orange |
I'm also fairly certain that although this query basically works (I end up doing post-processing in Python to get to the last step) it might be extremely inefficient given the extensive use of "common table expressions".
Is there a more efficient way to query my BigQuery table than what I've written or would I need to completely restructure the table to get any speedup? It currently takes ~10s to run this query on a table with ~10,000 rows and 12 columns and I believe the slowness is due to the multiple CTEs.
I've been banging my head against the wall for the past two weeks trying to improve my query and haven't made much headway. Any help is sincerely appreciated!
uj5u.com熱心網友回復:
考慮以下方法
with latest_orders as (
select * from `mycompany.engagement.products_ordered`
where true
qualify 1 = row_number() over(partition by id order by update_time desc)
), qualified_items as (
select *,
array(
select offset from t.ordered.list with offset
where true
qualify 1 = rank() over(order by item desc)
) items
from latest_orders t
)
select id, username, update_time,
products.list[offset(items[offset(0)])] as max_product_ordered,
from qualified_items
where array_length(items) = 1
如果應用于您問題中的樣本資料 - 輸出是

轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/336023.html
標籤:sql google-bigquery common-table-expression
上一篇:你能告訴我為什么我的ORDERBYDESC功能不能正常作業嗎?
下一篇:在子查詢中聯合所有
