我有一個按代碼和描述存盤工資獎勵的維度。Award_Code 和 Award_Desc 組合形成一個自然鍵。每個代碼應該只有一個描述,每個描述應該只有一個代碼,但多年來人們添加了相同的獎勵代碼但具有不同的描述或相同的描述但具有不同的獎勵代碼,結果如下表所示。在此示例中,一個獎勵代碼使用不同的描述(Award_SK 6 和 Award_SK 2270)被發現兩次,并且獎勵描述之一被發現兩次具有不同的代碼(Award_SK 6 和 Award_SK 2209)。只有 Award_SK 6 是正確的 Award_Code/Award_Desc 組合,我需要從維度中洗掉其他組合。
| 獎項_SK | 獎項_代碼 | Award_Desc |
|---|---|---|
| 6 | AWDTEA | 教師獎 |
| 2209 | TEAAWD | 教師獎 |
| 2270 | AWDTEA | 教師獎 |
為了確定上表中哪些描述和代碼相互關聯,我運行了以下代碼,該代碼獲取在 Award_Code 或 Award_Description 上連接不止一次的行。
--get the list of awards that are associated either by code or description, and put them in a temporary table
SELECT * INTO #DuplicatedAwards
FROM
(
SELECT Dim_Award_SK,AWARD_CODE, AWARD_DESC
FROM
(
--Type 1: different Award codes, same award description
SELECT Dim_Award_SK, award_code,AWD.Award_Desc FROM
DM.DIM_AWARD AWD
INNER JOIN
(SELECT Award_Desc, COUNT(Dim_Award_SK) as total_of_Same_Description_different_code FROM DM.DIM_AWARD
GROUP BY Award_Desc, Award_Class_Desc
HAVING count(Award_Desc)>1
) A ON AWD.Award_Desc=A.Award_Desc
UNION ALL
--Type 2: different award description, same award code
SELECT Dim_Award_SK, A.Award_Code,AWD.Award_Desc FROM
DM.DIM_AWARD AWD
INNER JOIN
(SELECT Award_Code,COUNT(Dim_Award_SK) as Total_of_Same_Code_Different_Description FROM DM.DIM_AWARD
GROUP BY Award_Code
HAVING count(DISTINCT Award_Desc)>1
) A ON AWD.Award_Code=A.Award_Code
)B
)C
--Join the temporary table to the dimension on award code OR award description. This will show an Award_SK in the first column
--and its matched Award_SK's in the second column
--When a new SK starts in the first column we are looking at a new group of matched awards
SELECT DISTINCT
AW.Dim_Award_SK as Award_SK,
DIM.Dim_Award_SK as Matching_Award_SK
FROM #DuplicatedAwards AW
INNER JOIN DM.DIM_AWARD DIM
ON DIM.Award_Code=AW.Award_Code OR DIM.Award_Desc=AW.Award_Desc
--exclude rows where the affected SK is matched with itself
WHERE DIM.Dim_Award_SK <> AW.Dim_Award_SK
ORDER BY AW.Dim_Award_SK, DIM.Dim_Award_SK
DROP TABLE #DuplicatedAwards
這給了我這樣的結果:
| 獎項_SK | 配對獎_SK |
|---|---|
| 6 | 2209 |
| 6 | 2270 |
| 8 | 1853年 |
| 8 | 2278 |
| 17 | 2052 |
| 17 | 2442 |
| 22 | 1895年 |
| 22 | 2282 |
| 22 | 2428 |
| 1853年 | 8 |
| 1853年 | 2278 |
| 1895年 | 22 |
| 1895年 | 2282 |
| 1895年 | 2428 |
| 2052 | 17 |
| 2052 | 2442 |
| 2209 | 6 |
| 2209 | 2270 |
| 2270 | 6 |
| 2270 | 2209 |
| 2278 | 8 |
| 2278 | 1853年 |
| 2282 | 22 |
| 2282 | 1895年 |
| 2282 | 2428 |
| 2428 | 22 |
| 2428 | 1895年 |
| 2428 | 2282 |
| 2442 | 17 |
| 2442 | 2052 |
The first two values in the left column are the same, so I know I need to look at the details of Award_SK 6, 2209 and 2270 in the dimension to work out business-wise, which is the right Award_SK to keep and which other two can be discarded. Next, rows 3 and 4 both show Award_SK 8 so I know that I need to look at Award_SK 8, 1853 and 2278 together, and so on. However, these combinations will show up more than once in a different arrangement as I move through the table. Award_SK 1853 eventually shows up again in the first column with Award_SK 8 and Award_SK 2278 in the second column. I have 8000 rows in my table but if I stop the combinations recurring, the table will be significantly smaller and I'll end up with a table like this. I'm not sure what to add to my code to make that happen. Maybe I could even do it in Excel, but again, not sure how.
| 獎項_SK | 配對獎_SK |
|---|---|
| 6 | 2209 |
| 6 | 2270 |
| 8 | 1853年 |
| 8 | 2278 |
| 17 | 2052 |
| 17 | 2442 |
| 22 | 1895年 |
| 22 | 2282 |
| 22 | 2428 |
我真的很感激這方面的任何幫助。謝謝。
uj5u.com熱心網友回復:
您可以將其用作與更高的匹配,而不是避免與相同的匹配。
SELECT DISTINCT
AW.Dim_Award_SK as Award_SK,
DIM.Dim_Award_SK as Matching_Award_SK
FROM #DuplicatedAwards AW
JOIN DM.DIM_AWARD DIM
ON ( DIM.Award_Code = AW.Award_Code OR
DIM.Award_Desc = AW.Award_Desc
)
AND AW.Dim_Award_SK < DIM.Dim_Award_SK
ORDER BY AW.Dim_Award_SK, DIM.Dim_Award_SK
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/382463.html
標籤:sql sql-server 查询语句 数据仓库 奥拉普
下一篇:C 計算器檢查
