SQL查詢：對于每個用戶，一個表/時間序列對另一個表/時間序列的反應-有解無憂

假設我有表A和B，兩者都有超過 1 億行，其中包含 8 年的資料。它們都有列row_id（=唯一的行識別符號，例如行號）person_id，，dt（=日期時間）。一行A表示用戶打開電子郵件的時間。一行B代表用戶訪問特定網站的時間。

我希望創建一個 table C，它正好附加A了幾個（假設是 10 個）列C_i。這些列C_i必須衡量，在用戶打開電子郵件后（即在 A 中的一行），i 他訪問了該網站的次數（在打開電子郵件后的幾個小時內）（即在中的幾行的計數B） . 我怎樣才能有效地構建C？

我在 Oracle PL/SQL 中嘗試僅添加一列C_1：

SELECT 
    any_value(A.id), 
    any_value(A.person_id), 
    any_value(A.dt), 
    count(1) response_1h 
FROM A, B 
WHERE 
    A.person_id = B.person_id AND 
    0 <= B.dt - A.dt AND B.dt-A.dt<=1/24 
GROUP BY A.row_id;

我收到一個錯誤（不是因為這個，而是因為一個類似的查詢），我懷疑是因為連接的表太大：

ORA-01652：無法在表空間 TEMP_DEV 01652 中將臨時段擴展 256。00000 - “無法在表空間 %s 中將臨時段擴展 %s” *原因：無法為臨時段分配所需數量的塊的范圍在指示的表空間中。*操作：使用 ALTER TABLESPACE ADD DATAFILE 陳述句將一個或多個檔案添加到指定的表空間。

以這種方式對 10 個新列執行此操作似乎遠非最佳。有人愿意告訴我這是如何正確完成的嗎？

uj5u.com熱心網友回復：

您可以通過條件聚合計算網站訪問。下面的示例查看每條訊息后的十小時并有條件地計數。

select
  a.id, a.person_id, a.dt,
  count(case when b.dt <= a.dt   interval '1' hour then 1 end) as response_1h,
  count(case when b.dt <= a.dt   interval '2' hour then 1 end) as response_2h,
  count(case when b.dt <= a.dt   interval '3' hour then 1 end) as response_3h,
  count(case when b.dt <= a.dt   interval '4' hour then 1 end) as response_4h,
  count(case when b.dt <= a.dt   interval '5' hour then 1 end) as response_5h,
  count(case when b.dt <= a.dt   interval '6' hour then 1 end) as response_6h,
  count(case when b.dt <= a.dt   interval '7' hour then 1 end) as response_7h,
  count(case when b.dt <= a.dt   interval '8' hour then 1 end) as response_8h,
  count(case when b.dt <= a.dt   interval '9' hour then 1 end) as response_9h,
  count(b.dt) as response_10h
from a
left join b on b.person_id = a.person_id
            and b.dt > a.dt
            and b.dt <= a.dt   interval '10' hour
group by a.id, a.person_id, a.dt
order by a.id;

與橫向連接相同：

select *
from a
outer apply 
(
  select 
    count(case when b.dt <= a.dt   interval '1' hour then 1 end) as response_1h,
    count(case when b.dt <= a.dt   interval '2' hour then 1 end) as response_2h,
    count(case when b.dt <= a.dt   interval '3' hour then 1 end) as response_3h,
    count(case when b.dt <= a.dt   interval '4' hour then 1 end) as response_4h,
    count(case when b.dt <= a.dt   interval '5' hour then 1 end) as response_5h,
    count(case when b.dt <= a.dt   interval '6' hour then 1 end) as response_6h,
    count(case when b.dt <= a.dt   interval '7' hour then 1 end) as response_7h,
    count(case when b.dt <= a.dt   interval '8' hour then 1 end) as response_8h,
    count(case when b.dt <= a.dt   interval '9' hour then 1 end) as response_9h,
    count(b.dt) as response_10h
  from b 
  where b.person_id = a.person_id
    and b.dt > a.dt
    and b.dt <= a.dt   interval '10' hour
)
order by a.id;

查看解釋計劃，看看是否應該比另一個運行得更快。

對于任一查詢，您都需要表 b 上的復合索引，以便快速找到其匹配的行。由于索引中列的順序很重要，我會添加兩個可能的索引：

create index idx1 on b (person_id, dt);
create index idx2 on b (dt, person_id);

您始終可以洗掉未使用的索引。

As to ORA-01652: I cannot answer this. Maybe you can just increase the size of the temp tablespace and be done with the problem. I don't know.

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/358408.html

標籤：sql 甲骨文表现

上一篇：回圈串列以計算每對元素的出現（在子串列中）

下一篇：python初學者表現：for回圈Vs迭代器VsPandas