Elasticsearch停止令牌過濾器不起作用-有解無憂

我在 Elasticsearch 7.10 中創建了一個如下所示的索引：

{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      },
      "description": {
        "type": "text"
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "lowercase",
            "stemmer",
            "stop"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

如您所見，我配置了一個名為的自定義分析器my_analyzer，該分析器stop應用了令牌過濾器。根據檔案，我希望這個過濾器text在索引時從檔案的所有型別屬性中洗掉英語停用詞。

實際上，如果我POST使用此請求正文向 http://localhost:30200/my_index/_analyze 發送請求：

{
  "analyzer": "my_analyzer",
  "text": "If you are a horse, I do not want that cake"
}

我得到證明令牌的回應if，a，not，并that從所提供的文本中洗掉：

{
    "tokens": [
        {
            "token": "you",
            "start_offset": 3,
            "end_offset": 6,
            "type": "<ALPHANUM>",
            "position": 1
        },
        {
            "token": "ar",
            "start_offset": 7,
            "end_offset": 10,
            "type": "<ALPHANUM>",
            "position": 2
        },
        {
            "token": "hors",
            "start_offset": 13,
            "end_offset": 18,
            "type": "<ALPHANUM>",
            "position": 4
        },
        {
            "token": "i",
            "start_offset": 20,
            "end_offset": 21,
            "type": "<ALPHANUM>",
            "position": 5
        },
        {
            "token": "do",
            "start_offset": 22,
            "end_offset": 24,
            "type": "<ALPHANUM>",
            "position": 6
        },
        {
            "token": "want",
            "start_offset": 29,
            "end_offset": 33,
            "type": "<ALPHANUM>",
            "position": 8
        },
        {
            "token": "cake",
            "start_offset": 39,
            "end_offset": 43,
            "type": "<ALPHANUM>",
            "position": 10
        }
    ]
}

However, if I index a document whose description attribute contains the string "If you are a horse, I do not want that cake", and then query the index by making a GET request to http://localhost:30200/my_index/_search with this request body:

{
  "query": {
    "multi_match" : {
      "query": "that", 
      "fields": ["description"]
    }
  }
}

The document is returned, even though the word "that" was supposed to have been removed by the analyzer:

{
    "took": 3,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.2876821,
        "hits": [
            {
                "_index": "my_index",
                "_type": "_doc",
                "_id": "27ibobulhqhc7s96jbz6653ud",
                "_score": 0.2876821,
                "_source": {
                    "id": "27ibobulhqhc7s96jbz6653ud",
                    "title": "muscular yak",
                    "description": "If you are a horse, I do not want that cake"
                }
            }
        ]
    }
}

So what gives? If the stop filter is stripping English-language stopwords from indexed text attributes, I would expect querying one of those stop words to return zero results. Do I have to explicitly tell Elasticsearch to use my_analyzer when indexing documents or when processing queries?

值得一提的是，我配置的其他過濾器（lowercase和stemmer）似乎按預期作業。這只是stop給我帶來麻煩。

uj5u.com熱心網友回復：

你快到了。您只需要description使用您創建的客戶分析器映射您的欄位，如下所示。這將確保該description欄位的內容my_analyzer在索引和搜索時使用。

{
  "mappings": {
    "properties": {
      "id": {
        "type": "keyword"
      },
      "title": {
        "type": "text"
      },
      "description": {
        "type": "text",
        "analyzer": "my_analyzer"          // note this
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "filter": [
            "lowercase",
            "stemmer",
            "stop"
          ],
          "type": "custom",
          "tokenizer": "standard"
        }
      }
    }
  }
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/383818.html

標籤：弹性搜索

上一篇：logstash輸出帶有序列號的彈性搜索索引

下一篇：什么樣的JSONJOLTSpec來獲取鍵值輸出，其中鍵是資料值，值是陣列