ElasticSearch-按優先級組織搜索，帶/不帶空格-有解無憂

我正在使用彈性 7.15.0。我想按優先級組織搜索，帶/不帶空格。這是什么意思？

查詢 - “沖浪咖啡”

我想查看包含或以“surf coff”開頭的記錄 - (SURF COFFEE, SURF CAFFETERIA, SURFCOFFEE MAN)
當記錄包含或以“surf”開頭時 - (SURF, SURF LOVE, ENDLESS SURF)
當記錄包含或以“coff”開頭時 - (LOVE COFFEE, COFFEE MAN)

查詢 - “surfcoff”

我想查看包含或以“surfcoff”開頭的記錄 - 僅限（SURF COFFEE、SURF CAFFETERIA、SURFCOFFEE MAN）。

我用過濾器創建了分析器：

小寫
word_delimiter_graph
卵石
邊 n 克
替換空格的模式

{
   "settings":{
       "index": {
            "max_shingle_diff" : 9,
            "max_ngram_diff": 9
       },
      "analysis":{
         "analyzer":{
            "word_join_analyzer":{
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "word_delimiter_graph",
                  "my_shingle",
                  "my_edge_ngram",
                  "my_char_filter"
               ]
            }
         },
         "filter":{
            "my_shingle":{
               "type":"shingle",
               "min_shingle_size": 2,
                "max_shingle_size": 10
            },
            "my_edge_ngram": { 
                "type": "edge_ngram",
                "min_gram": 2,
                "max_gram": 10,
                "token_chars": ["letter", "digit"]
            },
            "my_char_filter": {
                "type": "pattern_replace",
                "pattern": " ",
                "replacement": ""
            }
         }
      }
   }
}

所以當我分析 text = "SURF COFFEE" 時，我得到了這個結果

{
    "tokens": [
        {
            "token": "su",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "sur",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "surf",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "su",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "sur",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surf",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surf",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surfc",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surfco",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surfcof",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surfcoff",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "surfcoffe",
            "start_offset": 0,
            "end_offset": 11,
            "type": "shingle",
            "position": 0,
            "positionLength": 2
        },
        {
            "token": "co",
            "start_offset": 5,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "cof",
            "start_offset": 5,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "coff",
            "start_offset": 5,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "coffe",
            "start_offset": 5,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "coffee",
            "start_offset": 5,
            "end_offset": 11,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

如您所見，有令牌“surfcoff”。

我的搜索應該如何組織？

我嘗試將 bool 應該查詢的方法與 - query_string、match_phrase_prefix、match_prefix 等結合起來。

但他們都沒有給出正確的結果。

你能幫我么。

我的查詢應該如何構建？或者也許我應該嘗試其他分析儀過濾器。

例如查詢

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
                "query": "surf coff",
                "default_field": "text",
                "default_operator": "AND"
            }
        },
        {
          "query_string": {
                "query": "surf",
                "default_field": "text"
            }
        },
        {
          "query_string": {
                "query": "coff",
                "default_field": "text"
            }
        }
      ]
    }
  }
}

或者這個查詢

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
                "query": "(surf coff) OR (surf) OR (coff)",
                "default_field": "text"
            }
        }
      ]
    }
  }
}

或者這個查詢

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
                "query": "((surf AND coff)^3 OR (surf)^2 OR (coff)^1)",
                "default_field": "text"
            }
        }
      ]
    }
  }
}

或者

{
  "query": {
    "match_bool_prefix" : {
      "text" : "surf coff"
    }
  }
}

給

SURF COFFEE 沖浪從不孤單
CONOSUR COLCHAGUA CONO SUR
SUNRISE CONCHA TORO SUNRISE 300 天
太陽咖啡
沖浪咖啡宣傳......

但這對我來說很奇怪，我想我誤解了一些東西。

uj5u.com熱心網友回復：

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
                "query": "(surf* AND coff*)^3 OR (surf*)^2 OR (coff*)^1",
                "default_field": "text"
            }
        }
      ]
    }
  }
}

{
   "settings":{
       "index": {
            "max_shingle_diff" : 9,
            "max_ngram_diff": 9
       },
      "analysis":{
         "analyzer":{
            "word_join_analyzer":{
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "word_delimiter_graph",
                   "my_shingle",
                   "my_char_filter"
               ]
            }
         },
         "filter":{
            "my_shingle":{
               "type":"shingle",
               "min_shingle_size": 2,
                "max_shingle_size": 10
            },
            "my_char_filter": {
                "type": "pattern_replace",
                "pattern": " ",
                "replacement": ""
            }
         }
      }
   }
}

洗掉 edge-n-gram 并添加優先級通配符查詢解決了我的問題。但我仍然不明白為什么 edge n gram 不起作用。

uj5u.com熱心網友回復：

終于解決了

"filter":[
                  "lowercase",
                  "word_delimiter_graph",
                  "my_shingle",
                   "my_edge_ngram",
                  "my_char_filter"
               ]

問題出在 search_analyzer，因為檔案說“有時，在搜索時使用不同的分析器是有意義的，例如在使用標記器進行自動完成或使用搜索時同義詞時。” edge_ngram

所以我在我的文本欄位中添加了標準的 search_analyzer：

"text": { "type": "text", "analyzer": "word_join_analyzer", "search_analyzer": "standard" }

搜索查詢：

{
  "query": {
    "bool": {
      "should": [
        {
          "query_string": {
                "query": "surf coff",
                "default_field": "text"
            }
        }
      ]
    }
  }
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/qianduan/414871.html

標籤：

上一篇：為什么我的csssvg影片在Firefox上卡頓而不在其他瀏覽器上卡頓？

下一篇：elasticsearch查詢后如何僅選擇或檢索特定欄位？