這是標記器 -
"tokenizer": {
"filename" : {
"pattern" : "[^\\p{L}\\d] ",
"type" : "pattern"
}
},
映射 -
"name": {
"type": "string",
"analyzer": "filename_index",
"include_in_all": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
},
"lower_case_sort": {
"type": "string",
"analyzer": "naturalsort"
}
}
},
分析儀 -
"filename_index" : {
"tokenizer" : "filename",
"filter" : [
"word_delimiter",
"lowercase",
"russian_stop",
"russian_keywords",
"russian_stemmer",
"czech_stop",
"czech_keywords",
"czech_stemmer"
]
},
我想通過搜索獲得索引項 - mclaren,但索引的名稱是 McLaren。我想堅持使用 query_string,因為許多其他功能都基于此。這是我無法獲得預期結果的查詢 -
{
"query": {
"filtered": {
"query": {
"query_string" : {
"query" : "mclaren",
"default_operator" : "AND",
"analyze_wildcard" : true,
}
}
}
},
"size" :50,
"from" : 0,
"sort": {}
}
我怎么能做到這一點?謝謝!
uj5u.com熱心網友回復:
我得到了它 !問題肯定與word_delimiter令牌過濾器有關。默認情況下:
在字母大小寫轉換處拆分標記。例如:PowerShot → Power、Shot
參考檔案
所以 macLaren 生成兩個令牌 -> [mac, Laren] 當 maclaren 只生成一個令牌 ['maclaren'] 時。
分析示例:
POST _analyze
{
"tokenizer": {
"pattern": """[^\p{L}\d] """,
"type": "pattern"
},
"filter": [
"word_delimiter"
],
"text": ["macLaren", "maclaren"]
}
回復:
{
"tokens" : [
{
"token" : "mac",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "Laren",
"start_offset" : 3,
"end_offset" : 8,
"type" : "word",
"position" : 1
},
{
"token" : "maclaren",
"start_offset" : 9,
"end_offset" : 17,
"type" : "word",
"position" : 102
}
]
}
所以我認為一個選項是配置你的 word_delimiter 選項split_on_case_change為 false (見引數檔案)
ps:記得去掉之前添加的設定(cf comments),因為有了這個設定,你的查詢字串查詢只會針對name不存在的欄位。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/473334.html
標籤:弹性搜索
