我有一個包含城市名稱的索引。我嘗試對我的參賽作品進行正確評分,但沒有得到想要的結果。我嘗試在沒有指定任何設定的情況下使用 edge-n-gram 和 n-gram 分析器創建索引。城市名稱的語言是德語,我在這里讀到,這應該是一個很好的分析器。以下是我為分析儀嘗試的設定:
{
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "1"
},
"analysis": {
"analyzer": {
"e_ngram_token": {
"tokenizer": "edge_ngram_tokenizer"
}
},
"tokenizer": {
"edge_ngram_tokenizer": {
"type": "edge_ngram", // exchanged to ngram the other time
"min_gram": 2,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
以下是批量創建的一些示例資料 (/cities/_bulk):
{ "create": { } }
{"name": "Münster"}
{ "create": { } }
{"name": "München"}
{ "create": { } }
{"name": "Bad-Münster Fake 2"}
{ "create": { } }
{"name": "Bad Münster Fake"}
{ "create": { } }
{"name": "Munddort fake"}
{ "create": { } }
{"name": "Stolpmünde"}
{ "create": { } }
{"name": "Swinemünde"}
{ "create": { } }
{"name": "Dortmund"}
{ "create": { } }
{"name": "Müden (Mosel)"}
{ "create": { } }
{"name": "Mannheim"}
{ "create": { } }
{"name": "Marburg"}
{ "create": { } }
{"name": "Magdeburg"}
{ "create": { } }
{"name": "Montreux"}
{ "create": { } }
{"name": "Sankt Moritz"}
所以當我運行這樣的查詢時:
{
"from": 0,
"size": 100,
"query": {
"match": {
"name": {
"query": "mun",
"analyzer": "e_ngram_token",
"fuzziness": "2",
"fuzzy_transpositions": true,
"operator": "or",
"max_expansions": 50,
"boost": 5
}
}
}
}
我希望得到像“München”、“Münster”等城市,基本上每個城市都有“mun”,或者由于模糊性,城市有“mün”、“man”、“tan”等等。我得到的是:
{
"took": 10,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.0,
"hits": [
{
"_index": "cities",
"_type": "_doc",
"_id": "7jX2ioQBc3BSm-EXMB2V",
"_score": 0.0,
"_source": {
"name": "Bad-Münster Fake 2"
}
}
]
}
}
有人可以向我解釋我所缺少的嗎?在我的理解中,令牌是在索引時創建的,類似于“München”的 `["Mü", "ün", "nc"..."Mün"]。因為我要求模糊度為 2,術語“mun”應該與標記“mün”匹配,從而回傳結果。
非常感謝!
uj5u.com熱心網友回復:
您必須在現場添加分析器。
"name": {
"type": "text",
"analyzer": "e_ngram_token" <----------,
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/537896.html
標籤:弹性搜索lucene搜索引擎开放搜索amazon-opensearch
