在where子句中匹配多個列-有解無憂

我在 SQL Server 2019 中有數百萬條記錄的表中有四列（Col1、Col2、Col3、Col4）。

在存盤程序中，我必須傳遞四個輸入引數@Col1、@Col2、@Col3、@Col4，它應該回傳成功/失敗是否找到所有四個值，而與列順序無關。例如@Col1 可以與 Col2 匹配。

Col2、Col3、Col4 中的某些值可以為 null，但 Col1 中總會有一些資料。

我準備了一些我測驗過的樣本資料和場景。

CREATE TABLE SampleData(Id INT IDENTITY(1,1), Col1 VARCHAR(20), Col2 VARCHAR(20), Col3 VARCHAR(20), Col4 VARCHAR(20))

INSERT INTO SampleData(Col1, Col2, Col3, Col4)
SELECT 'ABC','DEF','GHI','JKL' UNION 
SELECT '123','456','789','100' UNION 
SELECT 'ABC','XYZ','','' UNION 
SELECT '9898','6565',NULL,NULL UNION 
SELECT '989844','D656555','','' UNION 
SELECT '8888','9999','7777','6666' UNION 
SELECT '1234','5678','4321',NULL UNION
SELECT '465456465',NULL,NULL,NULL

存盤程序

CREATE PROC dbo.ValidateSampleData(
 @Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
)
AS
BEGIN
    
    Declare @a as bit = 0, @Message VARCHAR(50) = 'Data Not Matched'
    if(@Col1 is NULL or @Col2 is NULL or @Col3 is NULL or @Col4 is NULL )
    begin
        set @a = 1
    end
 
    SELECT  @Message = 'Data Matched ' 
    FROM    SampleData SD
    where   (Col1 in (@Col1,@Col2,@Col3,@Col4) or (Col1 is null and @a = 1))
    and     (Col2 in (@Col1,@Col2,@Col3,@Col4) or (Col2 is null and @a = 1))
    and     (Col3 in (@Col1,@Col2,@Col3,@Col4) or (Col3 is null and @a = 1))
    and     (Col4 in (@Col1,@Col2,@Col3,@Col4) or (Col4 is null and @a = 1))

    and     (select sum(    (case when Col1 is null then 1 else 0 end)
                        (case when Col2 is null then 1 else 0 end)
                        (case when Col3 is null then 1 else 0 end)
                        (case when Col4 is null then 1 else 0 end)
                    ) from SampleData where Id = SD.Id) = 
            (select sum(    (case when @Col1 is null then 1 else 0 end)
                    (case when @Col2 is null then 1 else 0 end)
                    (case when @Col3 is null then 1 else 0 end)
                    (case when @Col4 is null then 1 else 0 end))) 

    SELECT @Message
END

這是我嘗試過的一些樣本資料集

DECLARE 
@Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL 



--Case 0  - Should not matched 
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'testtest' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 1 
SELECT @Col1 = 'ABC', @Col2 = 'DEF' , @Col3 = 'GHI' , @Col4 = 'JKL'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 2
SELECT @Col1 = 'DEF', @Col2 = 'JKL' , @Col3 = 'ABC' , @Col4 = 'GHI' 
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 3 
SELECT @Col1 = '123', @Col2 = '456' , @Col3 = '789' , @Col4 = '100'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 4 
SELECT @Col1 = '1234', @Col2 = '5678' , @Col3 = '4321' , @Col4 = NULL  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 5 
SELECT @Col1 = '465456465', @Col2 = NULL , @Col3 = NULL , @Col4 = NULL  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 6 
SELECT @Col1 = '8888', @Col2 = '9999' , @Col3 = '7777' , @Col4 = '6666'  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 7 
SELECT @Col1 = '9898', @Col2 = '6565' , @Col3 = NULL , @Col4 = NULL
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 8 
SELECT @Col1 = '989844', @Col2 = 'D656555' , @Col3 = '' , @Col4 = '' 
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  

--Case 9   
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = '' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4  
 
 
--Case 10  - Should not matched
SELECT @Col1 = 'ABC', @Col2 = 'XYZ' , @Col3 = 'tet' , @Col4 = ''  
EXEC ValidateSampleData @Col1=@Col1, @Col2=@Col2, @Col3=@Col3, @Col4=@Col4

在某些情況下，我已經達到了預期的結果，但它與表中的 NULL 值不正確。案例 0 不應匹配

此外，它在大量資料上看起來沒有優化并且非常慢。

In simple words, I want to check if all 4 values appear in the table, with NULL matching NULL? If they do, return a match, if not, a non match?

uj5u.com熱心網友回復：

我的方法是首先創建一個包含 4 個值的所有可能組合的表。這可以通過創建一個有 4 行的派生表來完成，然后將其連接到自身 4 次，每次連接確保您沒有選擇已經選擇的行，即

DECLARE @Values TABLE
(
    Col1 VARCHAR(20),
    Col2 VARCHAR(20),
    Col3 VARCHAR(20),
    Col4 VARCHAR(20)
    UNIQUE (Col1, Col2, Col3, Col4)
);

WITH Data AS
(   SELECT  Value, v.Ordinal
    FROM    (VALUES (1, @Col1), (2, @Col2), (3, @Col3), (4, @Col4)) AS v (Ordinal, Value)
)
INSERT @Values(Col1, Col2, Col3, Col4)
SELECT  DISTINCT d1.Value, d2.Value, d3.Value, d4.Value
FROM    Data AS d1
        INNER JOIN Data AS d2
            ON d2.Ordinal NOT IN (d1.Ordinal)
        INNER JOIN Data AS d3
            ON d3.Ordinal NOT IN (d1.Ordinal, d2.Ordinal)
        INNER JOIN Data AS d4
            ON d4.Ordinal NOT IN (d1.Ordinal, D2.Ordinal, d3.Ordinal);

有一個非常簡單的例子 where @Col1= 'A', and all others areNULL你最終會得到類似的東西：

Col1	Col2	Col3	Col4
空值	空值	空值	一個
空值	空值	一個	空值
空值	一個	空值	空值
一個	空值	空值	空值

然后，您可以使用INTERSECT來檢查您的表。INTERSECT這樣做的好處是NULL匹配，即SELECT NULL INTERSECT SELECT NULL會回傳一條記錄，而SELECT NULL WHERE NULL = NULL不會。

IF EXISTS
    (   SELECT  Col1, Col2, Col3, Col4
        FROM    SampleData
        INTERSECT
        SELECT  Col1, Col2, Col3, Col4
        FROM    @Values
    )
BEGIN
    SELECT 'Data Matched';
END
ELSE
BEGIN
    SELECT 'Data not Matched';
END

此方法的主要優點（除了回傳您的預期結果）是它可以利用 SampleData 上的索引，所以如果您添加：

CREATE INDEX IX_SampleData ON dbo.SampleData (Col1, Col2, Col3, Col4);

然后這將在使用時使用INTERSECT，但不符合您的邏輯：

在 where 子句中匹配多個列

就目前的小資料集而言，這并沒有太大的區別，但是對于更大的資料集，它會起作用。以上是 SampleData 中的 200k 行。

因此，您的完整程式將類似于：

CREATE OR ALTER PROC dbo.ValidateSampleData(
 @Col1 VARCHAR(20) = NULL
,@Col2 VARCHAR(20) = NULL
,@Col3 VARCHAR(20) = NULL
,@Col4 VARCHAR(20) = NULL
)
AS
BEGIN

    DECLARE @Values TABLE
    (
        Col1 VARCHAR(20),
        Col2 VARCHAR(20),
        Col3 VARCHAR(20),
        Col4 VARCHAR(20)
        UNIQUE (Col1, Col2, Col3, Col4)
    );

    WITH Data AS
    (   SELECT  Value, v.Ordinal
        FROM    (VALUES (1, @Col1), (2, @Col2), (3, @Col3), (4, @Col4)) AS v (Ordinal, Value)
    )
    INSERT @Values(Col1, Col2, Col3, Col4)
    SELECT  DISTINCT d1.Value, d2.Value, d3.Value, d4.Value
    FROM    Data AS d1
            INNER JOIN Data AS d2
                ON d2.Ordinal NOT IN (d1.Ordinal)
            INNER JOIN Data AS d3
                ON d3.Ordinal NOT IN (d1.Ordinal, d2.Ordinal)
            INNER JOIN Data AS d4
                ON d4.Ordinal NOT IN (d1.Ordinal, D2.Ordinal, d3.Ordinal);

    IF EXISTS
        (   SELECT  Col1, Col2, Col3, Col4
            FROM    SampleData
            INTERSECT
            SELECT  Col1, Col2, Col3, Col4
            FROM    @Values
        )
    BEGIN
        SELECT 'Data Matched';
    END
    ELSE
    BEGIN
        SELECT 'Data not Matched';
    END
END

加號

To answer the question in the comment:

I got this warning while creating Index on actual table. Can you explain ? Warning! The maximum key length for a nonclustered index is 1700 bytes. The index 'IX_SampleData' has maximum length of 2000 bytes. For some combination of large values, the insert/update operation will fail.

Presumably your actual columns are larger than specified in the question (VARCHAR(20))? As the warning says, the maximum key length is 1700bytes., so the required storage for all 4 columns must be less than that. Each VARCHAR character requires one byte of storage, so the maximum for VARCHAR(20) is 20 bytes, for 4 columns your maximum is 80 bytes. If however your columns are VARCHAR(500) then your maximum size is 2000 bytes. Since VARCHAR is variable, each value will only take as much space as needed. So if you enter A into a column defined as VARCHAR(500) it will still only take ` byte.

Assuming that your columns are VARCHAR(500) and you try and add the following:

INSERT INTO SampleData(Col1, Col2, Col3, Col4)
SELECT REPLICATE('A', 500), REPLICATE('B', 500), REPLICATE('C', 500), REPLICATE('D', 500)

In summary, if you are going to use all 500 characters in all 4 columns, then your insert will fail. If you have been very liberal with your column length, and the total length in all 4 columns is always going to be less than 1700 characters, then you will be fine (although, if you have been liberal with your column length, then perhaps resize them accordingly - although there is no cost in terms of additional storage there are other costs with defining columns larger than you need, so best to be as accurate as possible with column lengths).

uj5u.com熱心網友回復：

一種解決方案是將四列拆分為行，對它們進行排序和編號；對四個值執行相同的操作。然后，您可以比較這兩組：

with cols as (
    select id, value, row_number() over (partition by id order by value) as rn
    from t
    cross apply (values (col1), (col2), (col3), (col4) ) as x(value)
), vals as (
    select value, row_number() over (order by value) as rn
    from (values (@col1), (@col2), (@col3), (@col4) ) as x(value)
)
select id
from cols
join vals on cols.rn = vals.rn and (cols.value = vals.value or cols.value is null and vals.value is null)
group by id
having count(*) = 4

DB<>小提琴

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/425848.html

標籤：sql sql-server tsql stored-procedures null

上一篇：我需要洗掉Magento2的螢屏截圖中提到的兩個欄位

下一篇：在SQL查詢中決議XML元素的屬性