我正在尋找一種從 url 生成單詞和特殊字符作為標記的方法。
例如。我有一個網址https://www.google.com/
我想在彈性中生成令牌,如 https、www、google、com、:、/、/、.、.、/
uj5u.com熱心網友回復:
您可以使用標記器定義自定義分析器,letter如下所示:
PUT index3
{
"settings": {
"analysis": {
"analyzer": {
"my_email": {
"tokenizer": "letter",
"filter": [
"lowercase"
]
}
}
}
}
}
測驗 API:
POST index3/_analyze
{
"text": [
"https://www.google.com/"
],
"analyzer": "my_email"
}
輸出:
{
"tokens" : [
{
"token" : "https",
"start_offset" : 0,
"end_offset" : 5,
"type" : "word",
"position" : 0
},
{
"token" : "www",
"start_offset" : 8,
"end_offset" : 11,
"type" : "word",
"position" : 1
},
{
"token" : "google",
"start_offset" : 12,
"end_offset" : 18,
"type" : "word",
"position" : 2
},
{
"token" : "com",
"start_offset" : 19,
"end_offset" : 22,
"type" : "word",
"position" : 3
}
]
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/444306.html
