我得到了一些維護很差的日期表來整理,其中包含有關作業人員在特定時間擔任哪些職位的資訊。在許多情況下,員工按日期列出了重疊的職位(因此資料意味著他們一次擔任兩個職位,這應該是不可能的)。正在開展更正資料的作業,因此我可以假設后來的資訊是對舊資訊的更正。
我想清理資料,以便在任何時間范圍內洗掉重復項,將最新修改的記錄作為其申請的職位記錄的開始日期和結束日期之間的“真相”。
我認為圖表會有所幫助,因此下面顯示了一些可能的臟資料,其中一名員工在一段時間內獲得了 6 個職位。每條線代表一個職位的跨度(例如,員工在 1 月 10 日和 20 日之間擔任職位#1)。y 軸表示修改日期 - 位置 #1 在位置 #2 之后被修改,依此類推。

我做了一個表格來表示上面顯示的資料:
DROP TABLE IF EXISTS #Test;
CREATE TABLE #Test
( Id INT IDENTITY(1,1)
,Person INT
,Job INT
,JobStart DATE
,JobEnd DATE
,Modified DATE
);
INSERT #Test
VALUES
(1,1,'2020-01-10','2020-01-20',GETDATE())
,(1,2,'2020-01-03','2020-01-10',DATEADD(DAY,-1,GETDATE()))
,(1,3,'2020-01-03','2020-01-13',DATEADD(DAY,-2,GETDATE()))
,(1,4,'2020-01-11','2020-01-20',DATEADD(DAY,-3,GETDATE()))
,(1,5,'2020-01-15','2020-01-25',DATEADD(DAY,-4,GETDATE()))
,(1,6,'2020-01-01','2020-01-30',DATEADD(DAY,-5,GETDATE()))
SELECT * FROM #Test;
DROP TABLE IF EXISTS #Test;
如果你想象你可以從 y 軸的頂部“向下”看第一個圖表,你只會看到任何給定時間最近修改的位置 - 這就是我想要回傳的(我不在乎關于結果中的修改日期):

正如上面試圖證明的那樣,輸出 I 如下:
| 記錄 | 人 | 作業 | 作業開始 | 作業結束 |
|---|---|---|---|---|
| 1 | 1 | 6 | 2020-01-01 | 2020-01-03 |
| 2 | 1 | 2 | 2020-01-03 | 2020-01-10 |
| 3 | 1 | 1 | 2020-01-10 | 2020-01-20 |
| 6 | 1 | 5 | 2020-01-20 | 2020-01-25 |
| 7 | 1 | 6 | 2020-01-25 | 2020-01-30 |
I have attempted to modify the answers given in this question and this question, but haven't had any luck applying the modified date priority - they work to a certain extent but always seem to give confusingly incorrect results, and always return one row for each position, whereas I want to potentially create and or discard rows depending on their dates. I even tried using an extensive CASE statement, but that went about as well as you might expect.
I would welcome any suggestions of how to clean the data in this way as I have been at it all morning and spent my lunchbreak making diagrams and still no closer to a solution!
I am working in SQL Server V13.0.4
P.S. The closest I have come is using a Date Tally table that just contains a list of dates:
;WITH AllDays AS
( SELECT
Person
,Dated.[Date]
,MAX(Modified) AS LastModified
FROM #Test AS Test
INNER JOIN ListOfDates AS Dated ON Test.JobStart <= Dated.[Date]
AND Test.JobEnd >= Dated.[Date]
GROUP BY Person,[Date])
,Thing AS
(
SELECT
Test.Person
,Test.Job
,MIN([Date]) OVER (PARTITION BY Test.Job ORDER BY [Date]) AS JobStart
,MAX([Date]) OVER (PARTITION BY Test.Job ORDER BY [Date]) AS JobEnd
FROM
#Test AS Test
INNER JOIN AllDays AS Limit ON Test.Person = Limit.Person
AND Test.Modified = Limit.LastModified
)
The problem in this case is Position #6, which is still listed as only starting on the first... It feels really close, though!
I have several ways of getting to that same result, but the real sticking problem seems to be with Position #6, which needs to appear twice in the results...
;WITH Thing AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY T1.Person, [Date] ORDER BY [Modified] DESC) AS MY_Rank
,*
FROM
#Test AS T1
CROSS APPLY ListOfDates AS Dated
WHERE
Dated.[Date] >= (SELECT MIN(JobStart) FROM #Test)
AND Dated.[Date] <= (SELECT MAX(JobEnd) FROM #Test)
AND Dated.[Date] >= T1.JobStart AND Dated.[Date] <= T1.JobEnd
)
SELECT DISTINCT Id, Person, Job, JobStart, JobEnd
FROM Thing
WHERE
MY_Rank = 1
uj5u.com熱心網友回復:
這是一個嘗試。但我對這個 SQL 謎題的理解就到此為止。
這個想法是為所有日期生成一個 Tally CTE。
將日期加入范圍。
使用 row_number 為日期分配優先級作業。
過濾優先級 1 并生成排名。
將所有內容與排名組合在一起。
WITH RCTE_DATES AS (
SELECT MIN(JobStart) AS JobDate, MAX(JobEnd) AS LastDate
FROM #Test
UNION ALL
SELECT DATEADD(day, 1, JobDate), LastDate
FROM RCTE_DATES
WHERE JobDate < LastDate
)
, CTE_RANGES AS
(
SELECT JobDate, Person, Job, JobStart, JobEnd
, ROW_NUMBER() OVER (PARTITION BY Person, JobDate ORDER BY Modified DESC, JobEnd) rn
FROM RCTE_DATES d
JOIN #Test t ON d.JobDate BETWEEN t.JobStart AND t.JobEnd
)
, CTE_RANKED AS
(
SELECT JobDate, Person, Job
, ROW_NUMBER() OVER (PARTITION BY Person, Job ORDER BY JobDate DESC)
ROW_NUMBER() OVER (ORDER BY JobDate) AS Rnk
FROM CTE_RANGES
WHERE rn = 1
GROUP BY JobDate, Person, Job
)
SELECT Person, Job
, MIN(JobDate) AS JobStart
, MAX(JobDate) AS JobEnd
FROM CTE_RANKED
GROUP BY Person, Job, Rnk
ORDER BY JobStart
人 | 作業 | 作業開始 | 作業結束
-----: | --: | :--------- | :---------
1 | 6 | 2020-01-01 | 2020-01-02
1 | 2 | 2020-01-03 | 2020-01-09
1 | 1 | 2020-01-10 | 2020-01-20
1 | 5 | 2020-01-21 | 2020-01-25
1 | 6 | 2020-01-26 | 2020-01-30
db<>在這里擺弄
旁注:如果第一張圖的線放在不同的高度上,您可以獲得不同的第二張俯視圖。在可以通過更改 CTE_RANGES 中的順序來完成的 sql
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/374196.html
標籤:sql-server tsql
下一篇:LINQ查詢表
