Elasticsearch阻止索引Markdown超鏈接-有解無憂

我正在使用 Elasticsearch 構建 Markdown 檔案內容搜索。目前 MD 檔案中的全部內容都在 Elasticsearch 中建立了索引。但問題是，它顯示的結果是這樣[Mylink](https://link-url-here.org)，[Mylink2](another_page.md) 在搜索結果中。

我想防止索引超鏈接和參考其他頁面。當有人搜索“Mylink”時，它應該只回傳沒有 URL 的文本。如果有人能幫我找到正確的解決方案，那就太好了。

uj5u.com熱心網友回復：

您需要在索引應用程式中呈現 Markdown，然后洗掉 HTML 標簽并將其與 Markdown 源代碼一起保存。

uj5u.com熱心網友回復：

我認為你有兩個主要的解決方案來解決這個問題。第一：在將資料索引到 Elasticsearch 之前清理源代碼中的資料。第二：使用Elasticsearch過濾器為你清理資料。第一個解決方案很簡單，但如果您需要在 Elasticsearch 中執行此程序，則需要創建一個攝取管道。

然后您可以使用腳本處理器通過 ruby?? 腳本清理您需要的資料，該腳本可以找到您的正則運算式并將其洗掉

uj5u.com熱心網友回復：

您可以使用帶有腳本處理器的攝取管道來提取鏈接文本：

1.設定管道

PUT _ingest/pipeline/clean_links
{
  "description": "...",
  "processors": [
    {
      "script": {
        "source": """
          if (ctx["content"] == null) {
            // nothing to do here
            return
          }
          
          def content = ctx["content"];
          
          Pattern pattern = /\[([^\]\[] )\](\(((?:[^\()] ) )\))/;
          Matcher matcher = pattern.matcher(content);
          def purged_content = matcher.replaceAll("$1");
          
          ctx["purged_content"] = purged_content;
        """
      }
    }
  ]
}

正則運算式可以在這里測驗，并通過激發此。

2. 攝取檔案時包括管道

POST my-index/_doc?pipeline=clean_links
{
  "content": "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
}

POST my-index/_doc?pipeline=clean_links
{
  "content": "[Mylink2](another_page.md)"
}

python 檔案在這里。

3. 驗證

GET my-index/_search?filter_path=hits.hits._source

應該屈服

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "purged_content" : "Mylink anotherLink",
          "content" : "[Mylink](https://link-url-here.org) [anotherLink](http://dot.com)"
        }
      },
      {
        "_source" : {
          "purged_content" : "Mylink2",
          "content" : "[Mylink2](another_page.md)"
        }
      }
    ]
  }
}

你可以代替代替原來的content，如果你想從你完全放棄他們_source。

相反，您可以在另一個方向上更進一步，將文本鏈接對存盤在表單的嵌套欄位中：

{
  "content": "...",
  "links": [
    {
      "text": "Mylink",
      "href": "https://link-url-here.org"
    },
    ...
  ]
}

這樣當您以后決定使它們可搜索時，您就可以精確地進行搜索。

無恥的插件：您可以在我的Elasticsearch 手冊中找到其他動手攝取指南。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/384922.html

標籤：Python 弹性搜索搜索索引

上一篇：根據一個表中的唯一ID連接多個表

下一篇：Kotlin-推斷兩個泛型引數之一的型別