我有一個表,其中的行看起來像這樣,其中有一列按票證 ID 在時間戳 desc 上對所有行磁區進行排序。
所有行只能有一個等于1 的標志。
ticketID | flag 1 | flag 2 | flag 3 | flag 4 | Timestamp | Rank | stringvalue |
----------------------------------------------------------------------------------------|
1 | 0 | 0 | 1 | 0 | xxxxxx | 2 | aaaaaa |
1 | 0 | 0 | 0 | 1 | xxxxxx | 1 | bbbbbb |
1 | 0 | 1 | 0 | 0 | xxxxxx | 3 | aaaaaa |
2 | 1 | 0 | 0 | 0 | xxxxxx | 2 | bbbbbb |
2 | 0 | 0 | 0 | 1 | xxxxxx | 1 | xxxxxx |
3 | 0 | 0 | 1 | 0 | xxxxxx | 4 | aaaaaa |
3 | 0 | 1 | 0 | 0 | xxxxxx | 3 | bbbbbb |
3 | 1 | 0 | 0 | 0 | xxxxxx | 1 | ssssss |
3 | 0 | 0 | 0 | 1 | xxxxxx | 2 | nnnnnn |
4 | 0 | 1 | 0 | 0 | xxxxxx | 2 | gggggg |
4 | 0 | 0 | 0 | 1 | xxxxxx | 1 | iiiiii |
對于每個ticketID,我需要根據排名獲取第一行,但特定標志除外:
當票的排名 1 是帶有標志 4 = 1 的行時,我需要將第二個排名位置作為第一個。如果票的第二個等級是標志 3 = 1,那么我需要將第一個等級(標志 = 4)的字串值與第二個等級(標志 = 3)連接起來。
如果第二個等級是 flag = 1 或 flag = 2,那么只需忘記第一個等級并將第二個作為第一個回傳。
我希望我的問題很清楚。
謝謝
編輯
樣本輸出:
----------------------------------------------------------------------------------------
ticketID | flag 1 | flag 2 | flag 3 | Timestamp | Rank | stringvalue |
---------------------------------------------------------------------------------------|
1 | 0 | 0 | 1 | xxxxxx | 1 | aaaaaa / bbbbbbb |
2 | 1 | 0 | 0 | xxxxxx | 1 | bbbbbb |
3 | 1 | 0 | 0 | xxxxxx | 1 | ssssss |
4 | 0 | 1 | 0 | xxxxxx | 1 | gggggg |
----------------------------------------------------------------------------------------
uj5u.com熱心網友回復:
我將使用一些帶有 struct group by 的子查詢。這將允許我們在不使用視窗的情況下詢問有關多行的問題。由于我們不必維護視窗狀態,因此可能會執行得更快。
create table theRanks (ticketID int, flag_1 int, flag_2 int, flag_3 int, flag_4 int, Timestamp string, Rank int, stringvalue string)
-- create some dummy data
insert into theRanks values ( 1 , 0, 0, 1, 0, 'xxxxxx', 2, 'aaaaaa')
insert into theRanks values ( 1 , 0, 0, 0, 1, 'xxxxxx', 1, 'bbbbbb')
insert into theRanks values ( 1 , 0, 1, 0, 0, 'xxxxxx', 3, 'aaaaaa')
with stuct_table as -- sub-query syntax
(
select
ticketID,
struct( -- struct will allow us to group rows together.
Rank as rawRank, -- this has to be first in strut as we use it for sorting
flag_1 ,
flag_2,
flag_3,
flag_4 ,
Timestamp ,
stringvalue
) as myRow
from
theRanks
where
rank in (1,2) -- only look at first two ranks
),
constants as -- subquery
(
select 0 as rank1, 1 as rank2 -- strictly not needed just to help make it more readable
),
grouped_rows as --subquery
(
select
ticketID,
array_sort(collect_list(myRow)) as row_list -- will sort on rank all structs into a list
from stuct_table
group by ticketID
) ,
raw_rows as (select --sub-query styntax
ticketId,
case
when
row_list[constants.rank2].flag_1 row_list[constants.rank2].flag_2 > 0 or (row_list[constants.rank1].flag_4 = 1 and row_list[constants.rank2].flag_3 = 0 )
then
row_list[constants.rank2]
when
row_list[constants.rank1].flag_4 = 1 and row_list[constants.rank2].flag_3 = 1 -- condition to concat string
then
struct( -- this struct must match the original one we created
row_list[constants.rank2].rawRank as rawRank,
row_list[constants.rank2].flag_1 as flag_1,
row_list[constants.rank2].flag_2 as flag_2,
row_list[constants.rank2].flag_3 as flag_3,
row_list[constants.rank2].flag_4 as flag_4,
row_list[constants.rank2].Timestamp as Timestamp,
concat(
row_list[constants.rank1].stringvalue,
' / ',
row_list[constants.rank2].stringvalue) as stringvalue
)
else
row_list[constants.rank1]
end as rankedRow,
1 as Rank
from grouped_rows
cross join constants) -- not strictly needed, just replace all constants.rank1 with 0 and constants.rank2 with 1. I just use it to make it more clear what I'm doing. Could be replaced in production.
select rankedRow.* , 1 as Rank from raw_rows; -- makes struct columns into table columns
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/481784.html
