洗掉兩列中重復出現的值-有解無憂

我有一個按代碼和描述存盤工資獎勵的維度。Award_Code 和 Award_Desc 組合形成一個自然鍵。每個代碼應該只有一個描述，每個描述應該只有一個代碼，但多年來人們添加了相同的獎勵代碼但具有不同的描述或相同的描述但具有不同的獎勵代碼，結果如下表所示。在此示例中，一個獎勵代碼使用不同的描述（Award_SK 6 和 Award_SK 2270）被發現兩次，并且獎勵描述之一被發現兩次具有不同的代碼（Award_SK 6 和 Award_SK 2209）。只有 Award_SK 6 是正確的 Award_Code/Award_Desc 組合，我需要從維度中洗掉其他組合。

獎項_SK	獎項_代碼	Award_Desc
6	AWDTEA	教師獎
2209	TEAAWD	教師獎
2270	AWDTEA	教師獎

為了確定上表中哪些描述和代碼相互關聯，我運行了以下代碼，該代碼獲取在 Award_Code 或 Award_Description 上連接不止一次的行。

--get the list of awards that are associated either by code or description, and put them in a temporary table
    SELECT * INTO #DuplicatedAwards
    FROM
    (
        SELECT Dim_Award_SK,AWARD_CODE, AWARD_DESC
        FROM
        (
            --Type 1: different Award codes, same award description
            SELECT  Dim_Award_SK, award_code,AWD.Award_Desc FROM    
            DM.DIM_AWARD AWD    
            INNER JOIN  
                (SELECT Award_Desc, COUNT(Dim_Award_SK) as total_of_Same_Description_different_code FROM DM.DIM_AWARD
                GROUP BY Award_Desc, Award_Class_Desc
                HAVING count(Award_Desc)>1 
                ) A ON AWD.Award_Desc=A.Award_Desc 
            
            UNION ALL
    
            --Type 2: different award description, same award code
            SELECT  Dim_Award_SK, A.Award_Code,AWD.Award_Desc FROM
            DM.DIM_AWARD AWD
            INNER JOIN
                (SELECT Award_Code,COUNT(Dim_Award_SK) as Total_of_Same_Code_Different_Description FROM DM.DIM_AWARD
                GROUP BY Award_Code
                HAVING count(DISTINCT Award_Desc)>1 
            ) A ON AWD.Award_Code=A.Award_Code 
        )B
    )C
    
    --Join the temporary table to the dimension on award code OR award description.  This will show an Award_SK in the first column 
    --and its matched Award_SK's in the second column
 --When a new SK starts in the first column we are looking at a new group of matched awards
    
    SELECT DISTINCT
    AW.Dim_Award_SK as Award_SK,
    DIM.Dim_Award_SK as Matching_Award_SK 
    FROM #DuplicatedAwards AW
    INNER JOIN DM.DIM_AWARD DIM
    ON DIM.Award_Code=AW.Award_Code OR DIM.Award_Desc=AW.Award_Desc
    --exclude rows where the affected SK is matched with itself
    WHERE DIM.Dim_Award_SK <> AW.Dim_Award_SK
    ORDER BY  AW.Dim_Award_SK, DIM.Dim_Award_SK
    
    DROP TABLE #DuplicatedAwards

這給了我這樣的結果：

獎項_SK	配對獎_SK
6	2209
6	2270
8	1853年
8	2278
17	2052
17	2442
22	1895年
22	2282
22	2428
1853年	8
1853年	2278
1895年	22
1895年	2282
1895年	2428
2052	17
2052	2442
2209	6
2209	2270
2270	6
2270	2209
2278	8
2278	1853年
2282	22
2282	1895年
2282	2428
2428	22
2428	1895年
2428	2282
2442	17
2442	2052

The first two values in the left column are the same, so I know I need to look at the details of Award_SK 6, 2209 and 2270 in the dimension to work out business-wise, which is the right Award_SK to keep and which other two can be discarded. Next, rows 3 and 4 both show Award_SK 8 so I know that I need to look at Award_SK 8, 1853 and 2278 together, and so on. However, these combinations will show up more than once in a different arrangement as I move through the table. Award_SK 1853 eventually shows up again in the first column with Award_SK 8 and Award_SK 2278 in the second column. I have 8000 rows in my table but if I stop the combinations recurring, the table will be significantly smaller and I'll end up with a table like this. I'm not sure what to add to my code to make that happen. Maybe I could even do it in Excel, but again, not sure how.

獎項_SK	配對獎_SK
6	2209
6	2270
8	1853年
8	2278
17	2052
17	2442
22	1895年
22	2282
22	2428

我真的很感激這方面的任何幫助。謝謝。

uj5u.com熱心網友回復：

您可以將其用作與更高的匹配，而不是避免與相同的匹配。

SELECT DISTINCT
  AW.Dim_Award_SK as Award_SK,
  DIM.Dim_Award_SK as Matching_Award_SK 
FROM #DuplicatedAwards AW
JOIN DM.DIM_AWARD DIM
  ON ( DIM.Award_Code = AW.Award_Code OR
       DIM.Award_Desc = AW.Award_Desc 
     ) 
 AND AW.Dim_Award_SK < DIM.Dim_Award_SK
ORDER BY AW.Dim_Award_SK, DIM.Dim_Award_SK

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/382463.html

標籤：sql sql-server 查询语句数据仓库奥拉普

上一篇：在SSMS中匯入平面檔案時的小數分隔符

下一篇：C 計算器檢查