如何獲得最長的“重疊”單詞-有解無憂

前提：假設您有一個包含單詞的表格，其中一些可能不同，而一些“可能重疊”，這意味著較長的單詞以較短的單詞開頭，例如：

---------------
|    word     |
---------------
| dog         | *
| games       | *
| stat        |
| state       | 
| statement   | *
| fulfill     |
| fulfilled   | *
| fulfillment | *
---------------

問題：在這種情況下，如何撰寫回傳非重疊串列最長重疊單詞的查詢？

在上面的示例中，所需的單詞由 a 標識*，根據以下擴展說明：

dog并且games不與任何東西重疊，因此它們是“獨奏/獨特”類別中最長的
statement與stateand重疊stat并且是最長的
fulfilled與重疊fulfill并且更長（不與重疊fulfillment）
fulfillment與重疊fulfill并且更長（不與重疊fulfilled）

注意：請注意，為簡單起見，資料樣本有所減少。實際上有幾百萬條記錄要查詢，并且事先沒有已知的搜索詞，因此不可能直接使用諸如WHERE word LIKE 'stat%'. 不確定是否相關，但單詞的最大長度相對較短，例如 20。

uj5u.com熱心網友回復：

就像是

select word
from   your_table t1
where  not exists (
                    select word
                    from   your_table t2
                    where  t2.word like t1.word || '_%'
                  )
;

查詢將受益于上索引的存在word。但即便如此，也可能需要很長時間。無論如何，您都可以嘗試一下，讓我們知道會發生什么。

uj5u.com熱心網友回復：

只要您比較前綴，并且如果一個詞完全作為另一個詞的前綴包含在內，您可以嘗試match_recognize在一次通過中順序檢查前綴匹配。

但exists更清楚的是，盡管您應該檢查索引為 on 的真實資料集的性能word。

with a as (
  select
    column_value as word
  from sys.odcivarchar2list(
    'dog'
    , 'games'
    , 'stat'
    , 'state'
    , 'statement'
    , 'fulfill'
    , 'fulfilled'
    , 'fulfillment'
  )
)
select *
from a 
match_recognize(
  order by word desc /*Longest first*/
  measures
    a.word as word
  one row per match
  /*Exclude any number of words
    each of which is a prefix of the previous one
  */
  pattern (a {- b* -})
  define
    /*Current word is a prefix of the previous*/
    b as prev(word) like word || '%'
) t

| 詞 |
| :---------- |
| 宣告 |
| 游戲 |
| 履行|
| 完成 |
| 狗 |

db<>在這里擺弄

uj5u.com熱心網友回復：

在這種情況下，如何撰寫一個回傳非重疊串列最長重疊單詞的查詢？

您可以使用分層查詢在單個表掃描中執行此操作：

SELECT word,
       MAX(PRIOR word) AS substring
FROM   words
WHERE  CONNECT_BY_ISLEAF = 1
AND    LEVEL IN (1,2)
CONNECT BY word LIKE PRIOR word || '%'
AND        PRIOR ROWID != ROWID
GROUP BY word;

其中，對于樣本資料：

CREATE TABLE words (word) AS
SELECT 'dog' FROM DUAL UNION ALL
SELECT 'games' FROM DUAL UNION ALL
SELECT 'stat' FROM DUAL UNION ALL
SELECT 'state' FROM DUAL UNION ALL
SELECT 'statement' FROM DUAL UNION ALL
SELECT 'fulfill' FROM DUAL UNION ALL
SELECT 'fulfilled' FROM DUAL UNION ALL
SELECT 'fulfillment' FROM DUAL;

輸出：

單詞子串

狗

游戲

陳述狀態

完成實作

履行實作

單詞	子串
狗
游戲
陳述	狀態
完成	實作
履行	實作

db<>在這里擺弄

更新

您還可以使用MATCH_RECOGNIZE以下方法獲得最長的子字串匹配：

SELECT word,
       substring
FROM   words
MATCH_RECOGNIZE (
  ORDER BY word DESC
  MEASURES
    PREV(word.word) AS prev_word,
    word.word AS word,
    substring.word AS substring
  ONE ROW PER MATCH
  AFTER MATCH SKIP TO NEXT ROW
  PATTERN (word (non_substring*? substring)?)
  DEFINE
    substring     AS word.word LIKE word || '_%'
)
WHERE  prev_word IS NULL
OR     prev_word NOT LIKE word || '%'

對于樣本資料，其輸出與上述相同。

db<>在這里擺弄

uj5u.com熱心網友回復：

你可以簡單排序的話，并丟棄所有的話那是前綴的的下列單詞。

你必須考慮最后一個詞（見條件中的OR謂詞where）

with words as (
select WORD,
lead(WORD) over (order by WORD) WORD_LEAD
from tabs )
select word 
from words
where word != substr( WORD_LEAD, 1, length(word)) or WORD_LEAD is NULL /* last word */
order by word;

結果

WORD 
-----------
dog
fulfilled
fulfillment
games
statement

該解決方案也涵蓋了重復項，并且不包含FILTER像其他建議那樣會降低大表性能的禁止性操作（match_recognize解決方案除外）。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/346161.html

標籤：sql 甲骨文

上一篇：為什么子查詢在Snowflake中不起作用？

下一篇：如何在oracle中根據組中的最小和最大日期獲取值？