我有以下資料(所有表 DDL 和資料 DML 都可以在此處(SQL Server)和此處(PostgreSQL)的小提琴上找到:
我已經有了解決方案,這個問題是關于效率的,這樣做的最佳方法是什么?
CREATE TABLE ticket
(
ticket_id INTEGER NOT NULL,
working_time VARCHAR (30) NULL DEFAULT NULL CHECK (working_time != '')
);
和資料:
ticket_id working_time
18 20.02.2021,15:00,17:00
18 20.02.2021,15:00,17:00
18 20.02.2021,15:00,17:00
20 20.02.2021,12:00,14:15
20 _rubbish__ -- <--- deliberate
20 20.02.2021,12:00,14:15
20
20 21.02.2021,12:00,14:15
20 _rubbish__
20 21.02.2021,12:00,14:15
20
11 rows
_rubbish__資料中的條目是故意的 - 它是自由文本,我必須能夠處理糟糕的資料!
現在,我想要這樣的結果:
Ticket ID The date hrs_worked_per_ticket
18 2021-02-20 06:00:00
20 2021-02-20 04:30:00
20 2021-02-21 04:30:00
無需告訴我模式令人震驚 - 我發現將日期(以非 ISO 格式)和類似的時間存盤在一行中的想法令人厭惡!在這件事上別無選擇。
我對 PostgreSQL 和 SQL Server 都有自己的答案(見下文),但我想知道是否有更有效的方法來做到這一點?
uj5u.com熱心網友回復:
所以你有一個作業版本,盡管它很笨重,但不一定因為它的冗長而表現不佳。
但是,您詢問是否有更有效的方法,對于 SQL Server(我無法評論 Postgres),您可以通過在日期上添加持久計算列和支持索引來極大地簡化和提高性能。
這消除了查詢的不可搜索性,并允許優化器充分利用索引進行過濾和聚合,并避免決議和轉換字串值的最小開銷,因為該作業現在在插入/更新行時完成。
添加計算列:
alter table ticket add WorkingDate as Try_convert(date,Concat(Substring(working_time, 7, 4),SUBSTRING(working_time, 4, 2),SUBSTRING(working_time, 1, 2)),112) persisted
alter table ticket add WorkingDuration as DateDiff(minute,Try_convert(time,Substring (working_time, 12, 5),114 ) , Try_convert(time, Substring (working_time, 18, 5),114 )) persisted
添加支持索引
create clustered index Ix_Id_WorkingDuration on ticket(ticket_id,workingdate)
然后你的查詢變成:
with w as (
select ticket_Id, workingDate, Sum(workingDuration) d
from ticket
group by ticket_id, workingDate
)
select ticket_id,
workingdate as [The date],
format(d / 60 * 100 d % 60, '#:0#') hrs_worked_per_ticket
from w
where d>0;
見修改后的小提琴
與您的原始查詢相比,在這幾行上不會產生任何顯著的改進,但在大型資料集上的性能會明顯更好,特別是如果您需要按日期或范圍進一步過濾。
然而,估計的執行計劃建議此版本為 18%,而原始版本為 82%。
uj5u.com熱心網友回復:
我有一個PostgreSQL的解決方案在這里-try_cast_time并且try_cast_date是函式,我寫,受此啟發后:(整個主題是有幫助的!)
SELECT DISTINCT
ticket_id,
try_cast_date(working_time)::DATE,
SUM((try_cast_date(working_time) try_cast_time(working_time, 18, 5)) -
(try_cast_date(working_time) try_cast_time(working_time, 12, 5)))
OVER (PARTITION BY ticket_id, try_cast_date(working_time)::DATE)
AS ts_diff
FROM ticket
WHERE try_cast_date(working_time)::DATE IS NOT NULL
ORDER BY ticket_id, try_cast_date(working_time)::DATE
結果:
ticket_id try_cast_date ts_diff
18 2021-02-20 06:00:00
20 2021-02-20 04:30:00
20 2021-02-21 04:30:00
uj5u.com熱心網友回復:
我這里有一個 SQL Server 解決方案(這太可怕了!):
WITH cte AS
(
SELECT
ticket_id,
CAST
(
TRY_CONVERT
(
DATE,
SUBSTRING(working_time, 7, 4) '.'
SUBSTRING(working_time, 4, 2) '.'
SUBSTRING(working_time, 1, 2)
) AS DATETIME
)
CAST
(
CAST
(
SUBSTRING
(
working_time, 12, 5
) AS TIME
) AS DATETIME
) AS st_dt,
CAST
(
TRY_CONVERT
(
DATE,
SUBSTRING(working_time, 7, 4) '.'
SUBSTRING(working_time, 4, 2) '.'
SUBSTRING(working_time, 1, 2)
) AS DATETIME
)
CAST
(
CAST
(
SUBSTRING
(
working_time, 18, 5
) AS TIME
) AS DATETIME
) AS et_dt
FROM
ticket
)
SELECT
ticket_id AS "Ticket ID",
TRY_CONVERT(date, et_dt) AS "The date",
TRY_CONVERT
(
VARCHAR(8),
dateadd
(
second,
COALESCE(SUM
(
DATEDIFF(SECOND, st_dt, et_dt)
), 0),
0
),
108
) AS hrs_worked_per_ticket
FROM
cte
WHERE TRY_CONVERT(DATE, et_dt) IS NOT NULL
GROUP BY ticket_id, TRY_CONVERT(DATE, et_dt)
ORDER BY ticket_id, TRY_CONVERT(DATE, et_dt);
結果:
Ticket ID The date hrs_worked_per_ticket
18 2021-02-20 06:00:00
20 2021-02-20 04:30:00
20 2021-02-21 04:30:00
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/317022.html
標籤:sql sql-server PostgreSQL 解析
