我正在使用彈性 7.15.0。我想按優先級組織搜索,帶/不帶空格。這是什么意思?
查詢 - “沖浪咖啡”
- 我想查看包含或以“surf coff”開頭的記錄 - (SURF COFFEE, SURF CAFFETERIA, SURFCOFFEE MAN)
- 當記錄包含或以“surf”開頭時 - (SURF, SURF LOVE, ENDLESS SURF)
- 當記錄包含或以“coff”開頭時 - (LOVE COFFEE, COFFEE MAN)
查詢 - “surfcoff”
- 我想查看包含或以“surfcoff”開頭的記錄 - 僅限(SURF COFFEE、SURF CAFFETERIA、SURFCOFFEE MAN)。
我用過濾器創建了分析器:
- 小寫
- word_delimiter_graph
- 卵石
- 邊 n 克
- 替換空格的模式
{
"settings":{
"index": {
"max_shingle_diff" : 9,
"max_ngram_diff": 9
},
"analysis":{
"analyzer":{
"word_join_analyzer":{
"tokenizer":"standard",
"filter":[
"lowercase",
"word_delimiter_graph",
"my_shingle",
"my_edge_ngram",
"my_char_filter"
]
}
},
"filter":{
"my_shingle":{
"type":"shingle",
"min_shingle_size": 2,
"max_shingle_size": 10
},
"my_edge_ngram": {
"type": "edge_ngram",
"min_gram": 2,
"max_gram": 10,
"token_chars": ["letter", "digit"]
},
"my_char_filter": {
"type": "pattern_replace",
"pattern": " ",
"replacement": ""
}
}
}
}
}
所以當我分析 text = "SURF COFFEE" 時,我得到了這個結果
{
"tokens": [
{
"token": "su",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "sur",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "surf",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "su",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "sur",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surf",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surf",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surfc",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surfco",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surfcof",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surfcoff",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "surfcoffe",
"start_offset": 0,
"end_offset": 11,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "co",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "cof",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "coff",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "coffe",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "coffee",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
}
]
}
如您所見,有令牌“surfcoff”。
我的搜索應該如何組織?
我嘗試將 bool 應該查詢的方法與 - query_string、match_phrase_prefix、match_prefix 等結合起來。
但他們都沒有給出正確的結果。
你能幫我么。
我的查詢應該如何構建?或者也許我應該嘗試其他分析儀過濾器。
例如查詢
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "surf coff",
"default_field": "text",
"default_operator": "AND"
}
},
{
"query_string": {
"query": "surf",
"default_field": "text"
}
},
{
"query_string": {
"query": "coff",
"default_field": "text"
}
}
]
}
}
}
或者這個查詢
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "(surf coff) OR (surf) OR (coff)",
"default_field": "text"
}
}
]
}
}
}
或者這個查詢
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "((surf AND coff)^3 OR (surf)^2 OR (coff)^1)",
"default_field": "text"
}
}
]
}
}
}
或者
{
"query": {
"match_bool_prefix" : {
"text" : "surf coff"
}
}
}
給
- SURF COFFEE 沖浪從不孤單
- CONOSUR COLCHAGUA CONO SUR
- SUNRISE CONCHA TORO SUNRISE 300 天
- 太陽咖啡
- 沖浪咖啡宣傳......
但這對我來說很奇怪,我想我誤解了一些東西。
uj5u.com熱心網友回復:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "(surf* AND coff*)^3 OR (surf*)^2 OR (coff*)^1",
"default_field": "text"
}
}
]
}
}
}
{
"settings":{
"index": {
"max_shingle_diff" : 9,
"max_ngram_diff": 9
},
"analysis":{
"analyzer":{
"word_join_analyzer":{
"tokenizer":"standard",
"filter":[
"lowercase",
"word_delimiter_graph",
"my_shingle",
"my_char_filter"
]
}
},
"filter":{
"my_shingle":{
"type":"shingle",
"min_shingle_size": 2,
"max_shingle_size": 10
},
"my_char_filter": {
"type": "pattern_replace",
"pattern": " ",
"replacement": ""
}
}
}
}
}
洗掉 edge-n-gram 并添加優先級通配符查詢解決了我的問題。但我仍然不明白為什么 edge n gram 不起作用。
uj5u.com熱心網友回復:
終于解決了
"filter":[
"lowercase",
"word_delimiter_graph",
"my_shingle",
"my_edge_ngram",
"my_char_filter"
]
問題出在 search_analyzer,因為檔案說“有時,在搜索時使用不同的分析器是有意義的,例如在使用標記器進行自動完成或使用搜索時同義詞時。” edge_ngram
所以我在我的文本欄位中添加了標準的 search_analyzer:
"text": { "type": "text", "analyzer": "word_join_analyzer", "search_analyzer": "standard" }
搜索查詢:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "surf coff",
"default_field": "text"
}
}
]
}
}
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/414871.html
標籤:
