前提:假設您有一個包含單詞的表格,其中一些可能不同,而一些“可能重疊”,這意味著較長的單詞以較短的單詞開頭,例如:
---------------
| word |
---------------
| dog | *
| games | *
| stat |
| state |
| statement | *
| fulfill |
| fulfilled | *
| fulfillment | *
---------------
問題:在這種情況下,如何撰寫回傳非重疊串列 最長重疊單詞的查詢?
在上面的示例中,所需的單詞由 a 標識*,根據以下擴展說明:
dog并且games不與任何東西重疊,因此它們是“獨奏/獨特”類別中最長的statement與stateand重疊stat并且是最長的fulfilled與 重疊fulfill并且更長(不與 重疊fulfillment)fulfillment與 重疊fulfill并且更長(不與 重疊fulfilled)
注意:請注意,為簡單起見,資料樣本有所減少。實際上有幾百萬條記錄要查詢,并且事先沒有已知的搜索詞,因此不可能直接使用諸如WHERE word LIKE 'stat%'. 不確定是否相關,但單詞的最大長度相對較短,例如 20。
uj5u.com熱心網友回復:
就像是
select word
from your_table t1
where not exists (
select word
from your_table t2
where t2.word like t1.word || '_%'
)
;
查詢將受益于 上索引的存在word。但即便如此,也可能需要很長時間。無論如何,您都可以嘗試一下,讓我們知道會發生什么。
uj5u.com熱心網友回復:
只要您比較前綴,并且如果一個詞完全作為另一個詞的前綴包含在內,您可以嘗試match_recognize在一次通過中順序檢查前綴匹配。
但exists更清楚的是,盡管您應該檢查索引為 on 的真實資料集的性能word。
with a as ( select column_value as word from sys.odcivarchar2list( 'dog' , 'games' , 'stat' , 'state' , 'statement' , 'fulfill' , 'fulfilled' , 'fulfillment' ) ) select * from a match_recognize( order by word desc /*Longest first*/ measures a.word as word one row per match /*Exclude any number of words each of which is a prefix of the previous one */ pattern (a {- b* -}) define /*Current word is a prefix of the previous*/ b as prev(word) like word || '%' ) t| 詞 | | :---------- | | 宣告 | | 游戲 | | 履行| | 完成 | | 狗 |
db<>在這里擺弄
uj5u.com熱心網友回復:
在這種情況下,如何撰寫一個回傳非重疊串列 最長重疊單詞的查詢?
您可以使用分層查詢在單個表掃描中執行此操作:
SELECT word,
MAX(PRIOR word) AS substring
FROM words
WHERE CONNECT_BY_ISLEAF = 1
AND LEVEL IN (1,2)
CONNECT BY word LIKE PRIOR word || '%'
AND PRIOR ROWID != ROWID
GROUP BY word;
其中,對于樣本資料:
CREATE TABLE words (word) AS
SELECT 'dog' FROM DUAL UNION ALL
SELECT 'games' FROM DUAL UNION ALL
SELECT 'stat' FROM DUAL UNION ALL
SELECT 'state' FROM DUAL UNION ALL
SELECT 'statement' FROM DUAL UNION ALL
SELECT 'fulfill' FROM DUAL UNION ALL
SELECT 'fulfilled' FROM DUAL UNION ALL
SELECT 'fulfillment' FROM DUAL;
輸出:
單詞 子串 狗 游戲 陳述 狀態 完成 實作 履行 實作
db<>在這里擺弄
更新
您還可以使用MATCH_RECOGNIZE以下方法獲得最長的子字串匹配:
SELECT word,
substring
FROM words
MATCH_RECOGNIZE (
ORDER BY word DESC
MEASURES
PREV(word.word) AS prev_word,
word.word AS word,
substring.word AS substring
ONE ROW PER MATCH
AFTER MATCH SKIP TO NEXT ROW
PATTERN (word (non_substring*? substring)?)
DEFINE
substring AS word.word LIKE word || '_%'
)
WHERE prev_word IS NULL
OR prev_word NOT LIKE word || '%'
對于樣本資料,其輸出與上述相同。
db<>在這里擺弄
uj5u.com熱心網友回復:
你可以簡單排序的話,并丟棄所有的話那是前綴的的 下列單詞。
你必須考慮最后一個詞(見條件中的OR謂詞where)
with words as (
select WORD,
lead(WORD) over (order by WORD) WORD_LEAD
from tabs )
select word
from words
where word != substr( WORD_LEAD, 1, length(word)) or WORD_LEAD is NULL /* last word */
order by word;
結果
WORD
-----------
dog
fulfilled
fulfillment
games
statement
該解決方案也涵蓋了重復項,并且不包含FILTER像其他建議那樣會降低大表性能的禁止性操作(match_recognize解決方案除外)。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/346161.html
