我想為我的博客創建一個搜索 API,我將所有資料以 HTML 格式存盤在 elasticsearch 中,以便盡可能快地在全文搜索中使用它,但是 HTML 標簽讓我在我的內容中進行搜索。通過多次搜索,我找到了一個關于如何在搜索中忽略它們的答案,但我無法將它們過濾掉以不顯示在結果中是否有任何方法可以做到這一點?
現在我搜索并獲得以下結果:
POST /test/_search HTTP/1.1
Content-Type: application/json
Content-Length: 68
{
"query": {
"match": {
"html": "more"
}
}
}
回復:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"html": "<html><body><h1 style=\"font-family: Arial\">Test</h1> <span>More test</span></body></html>"
}
}
]
}
}
但我想得到這樣的東西:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0.2876821,
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"html": "Test More test"
}
}
]
}
}
uj5u.com熱心網友回復:
您需要在映射中使用HTML 帶字符過濾器。通過它,您將從檔案中洗掉 HTML 元素。我用這篇文章試圖接近你的結果。
PUT idx_test
{
"settings": {
"analysis": {
"filter": {
"my_pattern_replace_filter": {
"type": "pattern_replace",
"pattern": "\n",
"replacement": ""
}
},
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"filter": [
"lowercase"
],
"char_filter": [
"html_strip"
]
},
"parsed_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": [
"html_strip"
],
"filter": [
"my_pattern_replace_filter"
]
}
}
}
},
"mappings": {
"properties": {
"html": {
"type": "text",
"analyzer": "my_analyzer",
"fields": {
"raw": {
"type": "text",
"fielddata": true,
"analyzer": "parsed_analyzer"
}
}
}
}
}
}
POST idx_test/_doc
{
"html": """<html><body><h1 style="font-family: Arial">Test</h1> <span>More test</span></body></html>"""
}
GET idx_test/_search
{
"script_fields": {
"html_raw": {
"script": "doc['html.raw']"
}
},
"query": {
"match": {
"html": "more"
}
}
}
結果:
"hits": [
{
"_index": "idx_test",
"_id": "0b-UqoMBCzQxtx05B-WH",
"_score": 0.2876821,
"fields": {
"html_raw": [
"Test More test"
]
}
}
]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/513920.html
標籤:弹性搜索
