首先,對不起,如果我問的是愚蠢的,但我對彈性搜索很陌生。這是我需要做的:我有一組關鍵字,我需要在索引的每個檔案中搜索它們。這是映射:
{
"resumes": {
"mappings": {
"properties": {
"text": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"timestamp": {
"type": "date"
}
}
}
}
}
知道了這一點,我需要在每個檔案中搜索關鍵字陣列中的所有單詞,并且對于簡歷索引中的每個檔案,如果在檔案中找不到該單詞,它將回傳一個向量,如果是,則回傳 1成立。
例如。
keywords = ["javascript", "html", "python"]
doc1 = "Hello there, I've only programmed in python."
doc2 = "Hello there, I've only programmed in python and javascript."
doc3 = "Hello there, I've only programmed in python and javascript. Im now learning html"
搜索結果將類似于:
{
"doc1": [0, 0, 1], // because it contains the word python
"doc2": [1, 0, 1], // because it contains both python and javascript
"doc3": [1, 1, 1] // because it contains all words in the keyword vector
}
甚至可以單獨使用彈性搜索來做到這一點嗎?我正在用 Python 撰寫所有這些代碼,但我認為如果我用 Python 本身填充這些代碼,那么效率會比彈性搜索能做到的要低得多。
還沒有嘗試太多,因為我什至不太了解 Elastic Search 的功能。我已經搜索了很多,但我什至不知道從哪里開始......
uj5u.com熱心網友回復:
在 elasticsearch 中使用腳本并不健康,因為它們不具備執行性。我設法做你想做的事,但我警告你性能問題。
在“vector_field”欄位中,您有您的矩陣。
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python."
}
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python and javascript."
}
POST idx_teste/_doc
{
"description": "Hello there, I've only programmed in python and javascript. Im now learning html"
}
GET idx_teste/_search
{
"_source": "*",
"query": {
"terms": {
"description": [
"javascript","html","python"
]
}
},
"script_fields": {
"custom_field": {
"script": {
"source": """
def vector = new ArrayList();
for(int i=0; i< params.keywords.size(); i ){
String text = doc['description.keyword'].value;
if(text.contains(params.keywords[i])) {
vector.add(1);
} else {
vector.add(0);
}
}
return vector;
""",
"params": {
"keywords" :[
"javascript","html","python"
]
}
}
}
}
}
回復:
"hits": [
{
"_index": "idx_teste",
"_id": "oIyiQ4QBgXg8h_rc0Ny3",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python."
},
"fields": {
"vector_field": [
0,
0,
1
]
}
},
{
"_index": "idx_teste",
"_id": "oYypQ4QBgXg8h_rcH9wU",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python and javascript."
},
"fields": {
"vector_field": [
1,
0,
1
]
}
},
{
"_index": "idx_teste",
"_id": "ooypQ4QBgXg8h_rcJ9y2",
"_score": 1,
"_source": {
"description": "Hello there, I've only programmed in python and javascript. Im now learning html"
},
"fields": {
"vector_field": [
1,
1,
1
]
}
}
]
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/530383.html
