我在 Elasticsearch 7.10 中創建了一個如下所示的索引:
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"title": {
"type": "text"
},
"description": {
"type": "text"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"stemmer",
"stop"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
如您所見,我配置了一個名為的自定義分析器my_analyzer,該分析器stop應用了令牌過濾器。根據檔案,我希望這個過濾器text在索引時從檔案的所有型別屬性中洗掉英語停用詞。
實際上,如果我POST使用此請求正文向 http://localhost:30200/my_index/_analyze 發送請求:
{
"analyzer": "my_analyzer",
"text": "If you are a horse, I do not want that cake"
}
我得到證明令牌的回應if,a,not,并that從所提供的文本中洗掉:
{
"tokens": [
{
"token": "you",
"start_offset": 3,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ar",
"start_offset": 7,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "hors",
"start_offset": 13,
"end_offset": 18,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "i",
"start_offset": 20,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "do",
"start_offset": 22,
"end_offset": 24,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "want",
"start_offset": 29,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "cake",
"start_offset": 39,
"end_offset": 43,
"type": "<ALPHANUM>",
"position": 10
}
]
}
However, if I index a document whose description attribute contains the string "If you are a horse, I do not want that cake", and then query the index by making a GET request to http://localhost:30200/my_index/_search with this request body:
{
"query": {
"multi_match" : {
"query": "that",
"fields": ["description"]
}
}
}
The document is returned, even though the word "that" was supposed to have been removed by the analyzer:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "my_index",
"_type": "_doc",
"_id": "27ibobulhqhc7s96jbz6653ud",
"_score": 0.2876821,
"_source": {
"id": "27ibobulhqhc7s96jbz6653ud",
"title": "muscular yak",
"description": "If you are a horse, I do not want that cake"
}
}
]
}
}
So what gives? If the stop filter is stripping English-language stopwords from indexed text attributes, I would expect querying one of those stop words to return zero results. Do I have to explicitly tell Elasticsearch to use my_analyzer when indexing documents or when processing queries?
值得一提的是,我配置的其他過濾器(lowercase和stemmer)似乎按預期作業。這只是stop給我帶來麻煩。
uj5u.com熱心網友回復:
你快到了。您只需要description使用您創建的客戶分析器映射您的欄位,如下所示。這將確保該description欄位的內容my_analyzer在索引和搜索時使用。
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"title": {
"type": "text"
},
"description": {
"type": "text",
"analyzer": "my_analyzer" // note this
}
}
},
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"filter": [
"lowercase",
"stemmer",
"stop"
],
"type": "custom",
"tokenizer": "standard"
}
}
}
}
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/383818.html
標籤:弹性搜索
