我一直在思考將以下 Sphinx 搜索查詢轉換為典型 Web 搜索或門戶中常用的查詢的最簡單方法是什么,例如布爾搜索字串,反之亦然
(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R
需要轉換為
(A OR B) AND "C D" AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
出于示例目的也略有變化
(A | B) C D (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R
應該
(A OR B) AND C AND D AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
為清楚起見,“A”可以是任何單詞和任何大小寫,不區分大小寫。除非在引號內,否則空格表示起始語法中的 AND。所以AB 只是一個詞,例如Java。(A|B) 之間的空格不重要 (A|B) 與 (A|B) 或 (A|B) 等相同。每個字母表示一個單詞。
其中一些查詢會很長 - 多達 500 個術語。雖然這不是一個巨大的處理開銷,但我在想什么是轉換它的最佳(最有效)方法。標記化、正則運算式/模式匹配、簡單替換、遞回等。你們有什么推薦的嗎?
uj5u.com熱心網友回復:
讀者可能正在尋找一種優雅的,至少不是駭人聽聞的解決方案來解決這個問題。這也是我的目標,但是,唉,這是我能想出的最好的。
代碼
def convert(str)
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end.gsub(/ /, ' AND ').
gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
end
例子
puts convert(%Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND "C D" AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
puts convert(%Q{(A|B) C D (E| "F G" |"H I J") ("K L" ("M N" | "O P")) Q R})
#-> (A OR B) AND C AND D AND (E OR "F G" OR "H I J") AND ("K L" AND ("M N" OR "O P")) AND Q AND R
請注意,在此示例中,在某些管道之前和/或之后沒有空格,并且在雙引號字串之外的某些地方有多個空格。
puts convert(%Q{(Ant | Bat) Cat Dog (Emu | "Frog Gorilla" | "Hen Ibex Jackel") ("Khawla Lynx" ("Magpie Newt" | "Ocelot Penguin")) Quail Rabbit})
#-> (Ant OR Bat) AND Cat AND Dog AND (Emu OR "Frog Gorilla" OR "Hen Ibex Jackel") AND ("Khawla Lynx" AND ("Magpie Newt" OR "Ocelot Penguin")) AND Quail AND Rabbit
在這里,我用單詞替換了大寫字母。
解釋
要了解這是如何作業的,讓
str = %Q{(A | B) "C D" (E | "F G" | "H I J") ("K L" ("M N" | "O P")) Q R}
#=> "(A | B) \"C D\" (E | \"F G\" | \"H I J\") (\"K L\" (\"M N\" | \"O P\")) Q R"
然后
subs = []
str.gsub(/"[^"]*"| *\| */) do |s|
if s.match?(/ *\| */)
'|'
else
subs << s
'*'
end
end
#=> "(A|B) * (E|*|*) (* (*|*)) Q R"
subs
#=> ["\"C D\"", "\"F G\"", "\"H I J\"", "\"K L\"", "\"M N\"", "\"O P\""]
如您所見,我洗掉了管道周圍的空格并用星號替換了所有帶引號的字串,將這些字串保存在陣列中subs,以便以后可以用它們的原始值替換星號。星號的選擇當然是任意的。
The regular expression reads, "match a double-quoted string of zero or more characters or a pipe ('|') optionally preceded and/or followed by spaces".
As a result of these substitutions, all remaining strings of spaces are to be replaced by ' AND ':
s2 = s1.gsub(' ', ' AND ')
#=> "(A|B) AND * AND (E|*|*) AND (* AND (*|*)) AND Q AND R"
It remains to replace '|' with ' OR ' and each asterisk by its original value:
s2.gsub(/[*|]/) { |s| s == '|' ? ' OR ' : subs.shift }
#=> "(A OR B) AND \"C D\" AND (E OR \"F G\" OR \"H I J\") AND (\"K L\" AND (\"M N\" OR \"O P\")) AND Q AND R"
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/359988.html
