假設我有表A和B,兩者都有超過 1 億行,其中包含 8 年的資料。它們都有列row_id(=唯一的行識別符號,例如行號)person_id,,dt(=日期時間)。一行A表示用戶打開電子郵件的時間。一行B代表用戶訪問特定網站的時間。
我希望創建一個 table C,它正好附加A了幾個(假設是 10 個)列C_i。這些列C_i必須衡量,在用戶打開電子郵件后(即在 A 中的一行),i 他訪問了該網站的次數(在打開電子郵件后的幾個小時內)(即在 中的幾行的計數B) . 我怎樣才能有效地構建C?
我在 Oracle PL/SQL 中嘗試僅添加一列C_1:
SELECT
any_value(A.id),
any_value(A.person_id),
any_value(A.dt),
count(1) response_1h
FROM A, B
WHERE
A.person_id = B.person_id AND
0 <= B.dt - A.dt AND B.dt-A.dt<=1/24
GROUP BY A.row_id;
我收到一個錯誤(不是因為這個,而是因為一個類似的查詢),我懷疑是因為連接的表太大:
ORA-01652:無法在表空間 TEMP_DEV 01652 中將臨時段擴展 256。00000 - “無法在表空間 %s 中將臨時段擴展 %s” *原因:無法為臨時段分配所需數量的塊的范圍在指示的表空間中。*操作:使用 ALTER TABLESPACE ADD DATAFILE 陳述句將一個或多個檔案添加到指定的表空間。
以這種方式對 10 個新列執行此操作似乎遠非最佳。有人愿意告訴我這是如何正確完成的嗎?
uj5u.com熱心網友回復:
您可以通過條件聚合計算網站訪問。下面的示例查看每條訊息后的十小時并有條件地計數。
select
a.id, a.person_id, a.dt,
count(case when b.dt <= a.dt interval '1' hour then 1 end) as response_1h,
count(case when b.dt <= a.dt interval '2' hour then 1 end) as response_2h,
count(case when b.dt <= a.dt interval '3' hour then 1 end) as response_3h,
count(case when b.dt <= a.dt interval '4' hour then 1 end) as response_4h,
count(case when b.dt <= a.dt interval '5' hour then 1 end) as response_5h,
count(case when b.dt <= a.dt interval '6' hour then 1 end) as response_6h,
count(case when b.dt <= a.dt interval '7' hour then 1 end) as response_7h,
count(case when b.dt <= a.dt interval '8' hour then 1 end) as response_8h,
count(case when b.dt <= a.dt interval '9' hour then 1 end) as response_9h,
count(b.dt) as response_10h
from a
left join b on b.person_id = a.person_id
and b.dt > a.dt
and b.dt <= a.dt interval '10' hour
group by a.id, a.person_id, a.dt
order by a.id;
與橫向連接相同:
select *
from a
outer apply
(
select
count(case when b.dt <= a.dt interval '1' hour then 1 end) as response_1h,
count(case when b.dt <= a.dt interval '2' hour then 1 end) as response_2h,
count(case when b.dt <= a.dt interval '3' hour then 1 end) as response_3h,
count(case when b.dt <= a.dt interval '4' hour then 1 end) as response_4h,
count(case when b.dt <= a.dt interval '5' hour then 1 end) as response_5h,
count(case when b.dt <= a.dt interval '6' hour then 1 end) as response_6h,
count(case when b.dt <= a.dt interval '7' hour then 1 end) as response_7h,
count(case when b.dt <= a.dt interval '8' hour then 1 end) as response_8h,
count(case when b.dt <= a.dt interval '9' hour then 1 end) as response_9h,
count(b.dt) as response_10h
from b
where b.person_id = a.person_id
and b.dt > a.dt
and b.dt <= a.dt interval '10' hour
)
order by a.id;
查看解釋計劃,看看是否應該比另一個運行得更快。
對于任一查詢,您都需要表 b 上的復合索引,以便快速找到其匹配的行。由于索引中列的順序很重要,我會添加兩個可能的索引:
create index idx1 on b (person_id, dt);
create index idx2 on b (dt, person_id);
您始終可以洗掉未使用的索引。
As to ORA-01652: I cannot answer this. Maybe you can just increase the size of the temp tablespace and be done with the problem. I don't know.
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/358408.html
