字串匹配和排序邏輯-有解無憂

我想知道這里是否有人可以為我提出一些解決方案，該解決方案將采用 2 個字串，然后逐字比較，以給出整個字串的匹配百分比。

示例：如果我想比較這兩個字串

架子上的精靈：圣誕音樂劇（巡回演出）
書架上的精靈音樂劇巴爾的摩

在 SQL 中，如果我進行like比較，它將不匹配。但是如果我可以分解每個單詞并說它是否匹配，我們會看到 7 個單詞中有 6 個從字串 2 匹配到字串 1。然后可以說 85% 匹配

謝謝！

uj5u.com熱心網友回復：

您需要計算兩個字串之間的相似度。有許多演算法可以實作這一點。讓我們試試Levenshtein distance 和Longest Common Subsequence；各有各的優勢。

-- Sample strings
DECLARE 
 @string1 VARCHAR(100) = 'The Elf on the Shelf: A Christmas Musical (Touring)',
 @string2 VARCHAR(100) = 'The Elf on the Shelf Musical Baltimore';

--uncomment to test:
--SELECT @string1 = 'Their', @string2 = 'Theirs'

-- Longest Common Subsequence Solution
SELECT Similarity = 1.*LEN(dbo.LongestCommonSubsequence(@string1,@string2))/L2
FROM
(
  SELECT MIN(f.S), MAX(f.S)
  FROM  (VALUES(LEN(@string1)),(LEN(@string2))) AS f(S)
) f(L1,L2);

-- Levenshtein
SELECT Similarity = (1.*L1-Lev)/L2
FROM
(
  SELECT MIN(f.S), MAX(f.S), dbo.LEVENSHTEIN(@string1,@string2)
  FROM  (VALUES(LEN(@string1)),(LEN(@string2))) AS f(S)
) f(L1,L2,Lev);

每次回傳：

Similarity
---------------------------------------
0.62745098039


Similarity
---------------------------------------
0.31372549019

對于“他們的”和“他們的”，您會得到：

Similarity
---------------------------------------
0.83333333333

Similarity
---------------------------------------
0.66666666666

uj5u.com熱心網友回復：

這是獲取匹配單詞百分比的一種可能解決方案，它假設兩個字串之間的單詞匹配而不管位置如何。

我很欣賞它可能不是你所追求的，也沒有做“相似”的詞，但希望符合百分比匹配的標準。如果它不是完全需要的，則有很多調整空間。

在去除常見的標點符號后，這兩個字串被分成行并合并到一個表中。然后一個row_number 視窗函式通過匹配單詞對它們進行磁區并對每對進行計數。最后，這僅針對匹配對進行計數，并與兩者中常見的重復單詞的計數相加，然后作為較短字串的百分比。

declare 
    @s1 varchar(100)='The Elf on the Shelf: A Christmas Musical. (Touring)',
    @s2 varchar(100)='The Elf on the Shelf Musical, Baltimore';

with words as (
    select 1 s, Replace(Replace(Replace(Replace(Replace(value,':',''),',',''),'(',''),')',''),'.','') word
    from String_Split(@s1,' ')
    union all
    select 2, Replace(Replace(Replace(Replace(Replace(value,':',''),',',''),'(',''),')',''),'.','') 
from String_Split(@s2,' ')
), matching as (
  select *, Row_Number() over(partition by word order by s) rn
  from words
), final as (
  select * , Count(*) over(partition by word, s) repeating, Count(*) over() * 1.0 totwords, sum(Iif(s=1,1,0)) over() s1words
  from matching
  outer apply(values(Iif(rn=2 and rn=s,1,0)))x(p)
)
select (Sum (p)   max(case when s=1 and repeating>1 then repeating end))
        / Max(Iif(totwords/s1words>0.5, totwords-s1words, s1words)) * 100 [Matching Words %]
from final

這里，每個字串中有 6 個單詞匹配，因此結果是所需的 6 是較短的 7 個單詞字串的 85.7%。

示例資料庫<>小提琴

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/313286.html

標籤：sql sql-server 查询语句

上一篇：SQLLEFTJOIN到許多類別

下一篇：根據匹配另一列識別符號內所有行的列內容來選擇行？