在時間框視窗中為關系計算每行的總數和百分比-有解無憂

好的，所以我有兩個表：作業和作業運行。我正在使用 Postgres。

我想看2個時期。7天前到現在，14天前到7天前。

對于每個作業，我想要運行的總數，以及每個時間段的成功和不成功運行的百分比。我已經編造了這個可怕的查詢：

WITH results AS (
select
        coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes,
        coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures,
        coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs,
        coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures,
        
        coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes,
        coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures,        
        coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs,
        coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures
FROM
        prow_job_runs JOIN prow_jobs ON prow_jobs.id = prow_job_runs.prow_job_id WHERE prow_jobs.name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10'
)
SELECT *,
        previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage,
        previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage,
        current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage,
        current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage       
FROM results;

這讓我得到了我想要的結果：

-[ RECORD 1 ]-------------------- -----------------------
previous_passes                  | 591
previous_failures                | 4
previous_total_runs              | 595
previous_infrastructure_failures | 1
current_passes                   | 67
current_failures                 | 0
current_total_runs               | 67
current_infrastructure_failures  | 0
previous_pass_percentage         | 99.3277310924369748
previous_failure_percentage      | 0.67226890756302521008
current_pass_percentage          | 100.0000000000000000
current_failure_percentage       | 0.00000000000000000000

下面是執行計劃：

                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Subquery Scan on results  (cost=661.12..661.19 rows=1 width=192)
   ->  Aggregate  (cost=661.12..661.13 rows=1 width=64)
         ->  Hash Join  (cost=8.30..650.89 rows=93 width=10)
               Hash Cond: (prow_job_runs.prow_job_id = prow_jobs.id)
               ->  Seq Scan on prow_job_runs  (cost=0.00..603.60 rows=14460 width=18)
               ->  Hash  (cost=8.29..8.29 rows=1 width=8)
                     ->  Index Scan using prow_jobs_name_key on prow_jobs  (cost=0.27..8.29 rows=1 width=8)
                           Index Cond: (name = 'promote-release-openshift-machine-os-content-e2e-aws-4.10'::text)
(8 rows)

但這僅適用于單個作業，如何在不執行 for 代碼回圈的情況下獲取每個作業的結果？

我還認為我的查詢真的很慢，僅運行一項作業就超過 8 毫秒。

泰

uj5u.com熱心網友回復：

您需要提供查詢execution plan。但是你必須確保你有必要的索引，也許你限制了 join 的行數，這會很有幫助：

WITH results AS (
        select prow_jobs.name,
                coalesce(count(case when succeeded = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_passes,
                coalesce(count(case when succeeded = false AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_failures,
                coalesce(count(case when timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_total_runs,
                coalesce(count(case when infrastructure_failure = true AND timestamp BETWEEN NOW() - INTERVAL '14 DAY' AND NOW() - INTERVAL '7 DAY' then 1 end), 0) as previous_infrastructure_failures,
                coalesce(count(case when succeeded = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_passes,
                coalesce(count(case when succeeded = false AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_failures,        
                coalesce(count(case when timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_total_runs,
                coalesce(count(case when infrastructure_failure = true AND timestamp > NOW() - INTERVAL '7 DAY' then 1 end), 0) as current_infrastructure_failures
        FROM prow_job_runs 
        JOIN prow_jobs 
                ON prow_jobs.id = prow_job_runs.prow_job_id                 
                and timestamp BETWEEN NOW() and now() - INTERVAL '14 DAY' 
        group by prow_jobs.name
)
SELECT *,
        previous_passes * 100.0 / NULLIF(previous_total_runs, 0) AS previous_pass_percentage,
        previous_failures * 100.0 / NULLIF(previous_total_runs, 0) AS previous_failure_percentage,
        current_passes * 100.0 / NULLIF(current_total_runs, 0) AS current_pass_percentage,
        current_failures * 100.0 / NULLIF(current_total_runs, 0) AS current_failure_percentage       
FROM results;

并且似乎您在 prow_job_runs 表上沒有任何索引，請在該表上添加一個帶有列的索引（id、succeeded、infrastructure_failure、timestamp、prow_job_id）

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/370188.html

標籤：sql PostgreSQL的统计数据查询优化

上一篇：如何根據不同標準的T-SQL列出重復項

下一篇：在GoogleBigQuery中計算行程