ElasticSearch7.6.1入門到整合，就看這9個關鍵點-有解無憂

1、Elasticsearch是什么？

Elasticsearch是基于Lucence的一款分布式實時搜索引擎（僅支持JDK1.8以上），其跟Lucence一樣是基于Java開發的，并以Lucence作為核心包來實作索引和搜索的功能，但是由于Lucence在開發使用時稍偏復雜，所以Elasticsearch通過Restful API的方式進行呼叫，直接隱藏了Lucence的復雜性，

2、Elasticsearch對比Solr、Lucence有什么區別？

Elasticsearch、Solr都是基于Java和Lucence的，Solr主要是通過POST請求的方式向服務器發出請求，Solr主要是通過xml的方式來進行索引的添加、洗掉、更新操作；而Solr的搜索是通過GET的方式來實作的，其對查詢回傳的Xml、Json等資料格式的查詢結果進行決議，

Elasticsearch對比Solr

單純只對已經存在的資料進行查詢的話Solr的查詢速度更快，
在實時建立索引方面，Solr由于會產生IO阻塞，所以其查詢性能較差，Elasticsearch建立索引的速度幾乎是實時的，
隨著資料量的指數性增加，Solr的查詢性能會大打折扣，而Elasticsearch的查詢性能幾乎不變，可以說是實時的，
Solr是通過Zookeeper實作分布式管理的，而Elasticsearch自身攜帶分布式協調管理能力，
Solr支持Xml、Json、Csv，而Elasticsearch僅支持Json，
Solr查詢快，但是洗掉、新增慢；Elasticsearch實時查詢快，

3、安裝，搭建Elasticsearch7.6.1平臺

Elasticsearch相關（kibana、elasticsearch、ik-analyze、head）壓縮包：
鏈接：https://pan.baidu.com/s/1aFEuBgR3VqM9O5KzfK2mFA    提取碼：llhm

步驟

解壓kibana（資料可視化操作平臺）、elasticsearch、ik-analyze（分詞器）、head（elasticsearch查詢插件）壓縮包到本地
由于每個組件都有不同埠號，所以需要配置elasticsearch支持跨域請求，es的config目錄下找到elasticsearch.yml，新增 http.cors.enabled: true 和 http.cors.allow-origin: “*” 以支持跨域請求；kibana可以改為支持中文的方式 i18n.locale: "zh-CN"
分別啟動相關組件即可，kibana默認5601埠，elasticsearch默認9200埠，head插件默認9100埠

注意，Kibana版本號一定要和Elasticsearch版本號一致

4、 Elasticsearch核心概念

Elasticsearch主要是面向檔案的操作，以下為關系型資料庫和Elasticsearch的客觀對比

關系型資料庫	Elasticsearch
資料庫	索引（indicies）
表	型別（type，基本被棄用了）
行	檔案（document）
欄位	fields

一個ES服務就是一個集群，其服務名為elasticsearch

elasticsearch是面向檔案的操作，索引和搜索的最小單位是檔案；
elasticsearch主要采用的是倒排索引的方式進行查詢的；

5、了解倒排索引 ( 倒排索引有利于快速全文搜索 )

舉例，以下有兩個檔案

檔案1：Good good study day day up！forever and go on！
檔案2：Study makes me happy！forever！

term（關鍵詞）	doc1	doc2
good	√	×
study	√	×
forever	√	×
makes	×	√
me	×	√

例如，現在要查詢good good study！這里倒排索引就會挑選 √ 出現次數最多的檔案

term（關鍵詞）	doc1	doc2
good	√	×
study	√	×

因此，在doc1檔案中的 √ 次數最多，在權重方面doc1大于doc2，其score值越高，優先搜索的就是doc1；而與MySQL中的正排索引不同，正排索引的查詢方式是先掃描索引庫中的所有內容，然后找到所有包含指定關鍵字的檔案，然后根據打分模型對資料進行打分，排名后再呈現結果給用戶，

總結：倒排索引不會查詢無關的檔案內容，而正排索引先是查詢所有檔案，再進行過濾；因此前者查詢速度更快

6、IK分詞器 ( ik-analyze是基于Lucence的中文分詞器 )

ik-analyze區分為兩種演算法，ik_smart和ik_max_word

（1）ik_smart演算法（最少劃分）

滿打滿算，按斷點打開，盡量把句子拆分成詞庫里面的每一個詞語

美麗的山河和世文 => 美麗的、山河和世文

（2）ik_max_word（最細粒度劃分）

細度劃分，窮盡詞庫的所有可能進行劃分

美麗的山河和世文 => 美麗、美麗的、山河、山河和、世文

7、配置ik-analyze的中文詞庫

首先將ik-analyze檔案解壓，并將檔案放到Elasticsearch目錄下的 plugins 檔案下，
在ik目錄下的config檔案中創建一個 xxx.dic 作為自己的分詞檔案
修改config目錄下的 IKAnalyzer.cfg.xml 檔案，將自己的分詞檔案配置進來，隨后自行在 xxx.dic 檔案中添加想要

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
	<comment>IK Analyzer 擴展配置</comment>
	<!--用戶可以在這里配置自己的擴展字典 -->
	<entry key="ext_dict">yang.dic</entry>
	 <!--用戶可以在這里配置自己的擴展停止詞字典-->
	<entry key="ext_stopwords"></entry>
	<!--用戶可以在這里配置遠程擴展字典 -->
	<!-- <entry key="remote_ext_dict">words_location</entry> -->
	<!--用戶可以在這里配置遠程擴展停止詞字典-->
	<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

配置詳情圖示

8、kibana索引操作命令

請求方式	url地址	描述
PUT	PUT /索引名/_doc/id	創建檔案
POST	POST /索引名/_doc/id	創建檔案
POST	POST /索引名/_doc/id/_update	修改檔案
DELETE	DELETE /索引名/_doc/id	洗掉檔案
GET	GET /索引名/_doc/id	通過id查詢檔案
POST	POST /索引名/_doc/_search	查詢所有資料

新增檔案

新增索引

PUT /test1/type1/1
{
  "name":"世文",
  "age": 18
}

指定檔案型別

PUT /test3
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text"
      },
      "age": {
        "type": "integer"
      },
      "birthday": {
        "type": "date"
      }
    }
  }
}

修改檔案

修改索引依舊可以使用PUT，不過直接使用的話會直接覆寫的，一般最好用POST，并結合使用_update

POST /test2/_doc/2/_update
{
  "doc": {
    "name": "世文33333"
  }
}

洗掉索引

DELETE /test2/_doc/2

花樣新增檔案

PUT /test4/_doc/4
{
  "name": "世文1",
  "age": 18,
  "desc": "這里是描述內容",
  "tags": ["開心","快樂","嗨皮"]
}

PUT /test4/_doc/5
{
  "name": "阿姨5",
  "age": 40,
  "desc": "阿姨的天下",
  "tags": ["嗯嗯","哦哦","嘿嘿"]
}

條件查詢

GET /test4/_doc/_search?q=name:5

花樣條件查詢1

匹配查詢

GET /yang_index_bulk/_doc/_search
{
  "query": {
      "match": {
        "age" : 1	// 精確查詢age = 1
    }
  },
  "from": 0,		// 分頁pageNo
  "size": 1,		// 分頁pageSize
  "_source": ["userName", "age"],  	// 指定展示userName
  "sort": [{
    "age": {
      "order": "desc"		// 指定使用age進行降序排序
    }
  }]
}

花樣條件查詢2

must相當于and條件查詢，must_not就是not條件查詢

GET /yang_index_bulk/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "userName": "DD"
          }
        },
        {
          "match": {
            "age": 1
          }
        }
      ]
    }
  }
}

花樣條件查詢3

相當于or查詢

GET /yang_index_bulk/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "userName": "DD"
          }
        },
        {
          "match": {
            "age": 2
          }
        }
      ]
    }
  }
}

花樣條件查詢4

filter過濾器

GET /yangshiwen/_doc/_search
{
  "query": {
      "bool": {
        "must": [
          {
            "match": {
              "name": "文文"    // 姓名包含文文
            }
          }
        ],
        "filter": {
          "range": {
            "age": {
              "gte": 10,        // 年齡大于等于10歲
              "lte": 20         // 年齡小于等于20歲
            }
          }
        }
    }
  }
}

花式查詢5

多條件精確查詢

GET testdb/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "t1": {
              "value": "22"
            }
          }
        },
        {
          "term": {
            "t1": {
              "value": "33"
            }
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 1
}

高亮查詢

GET /test3/_search
{
  "query": {
    "match": {
      "name": "世文"
    }
  },
  "highlight": {
    "pre_tags": "<p class = 'key' style = 'color:red'>",        // 前綴
    "post_tags": "</p>",                                        // 后綴
    "fields": {
      "name": {}
    }
  }
}

9、Elasticsearch整合springboot

第一步，匯入maven依賴

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

<!-- elasticsearch依賴, 其實它也是基于lucene的, 一定要和本地版本一致, 這里我用7.6.1 -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.62</version>
</dependency>

<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <optional>true</optional>
</dependency>

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
    <exclusions>
        <exclusion>
            <groupId>org.junit.vintage</groupId>
            <artifactId>junit-vintage-engine</artifactId>
        </exclusion>
    </exclusions>
</dependency>

第二步，創建ES中的 RestHighLevelClient 物件，放入spring容器

@Configuration
public class ElasticsearchConfig {
    // spring = <bean id = "highLevelClient" class = "RestHighLevelClient">
    // 這里需要將物件注入spring
    @Bean
    public RestHighLevelClient highLevelClient() {
        // 如果是集群就配置多個, 如果不是集群就只配置一個
        RestHighLevelClient client = new RestHighLevelClient(
                RestClient.builder(
                        new HttpHost("localhost", 9200, "http")));
        return client;
    }
}

第三步，進行編碼

package com.yang.elasticsearch;

import com.alibaba.fastjson.JSON;
import com.yang.elasticsearch.pojo.User;
import org.elasticsearch.action.admin.indices.create.CreateIndexRequest;
import org.elasticsearch.action.admin.indices.create.CreateIndexResponse;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.unit.TimeValue;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.*;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.fetch.subphase.FetchSourceContext;
import org.junit.jupiter.api.Test;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.boot.test.context.SpringBootTest;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

@SpringBootTest
class ElasticsearchApplicationTests {

    // 默認注入, 這里需要注意物件型別和名字都需要對應, 否則需要用 @Qualifier("")
    @Autowired
    @Qualifier("highLevelClient")
    private RestHighLevelClient client;

    // 測驗索引的創建 PUT 這里都是用rest請求 [ PUT yang_index/_doc ]
    @Test
    void createIndex() throws IOException {
        // 1、創建索引請求
        CreateIndexRequest request = new CreateIndexRequest("yang_index");
        // 2、客戶端執行創建請求, 請求完了獲得回應, 判斷是否存在
        CreateIndexResponse createIndexResponse = client.indices().create(request, RequestOptions.DEFAULT);
        System.out.println("createIndexResponse = " + createIndexResponse.index());
    }

    // 測驗獲取索引 GET [GET yang_index] , 只能判斷存不存在
    @Test
    void existIndex() throws IOException {
        GetIndexRequest request = new GetIndexRequest("yang_index");
        boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
        System.out.println("getIndexResponse = " + exists);
    }

    // 測驗洗掉索引 DELETE [DELETE yang_index]
    @Test
    void deleteIndex() throws IOException {
        DeleteIndexRequest request = new DeleteIndexRequest("yang_index");
        AcknowledgedResponse response = client.indices().delete(request, RequestOptions.DEFAULT);
        // 判斷是否洗掉成功
        System.out.println(response.isAcknowledged());
    }

    // 測驗, 新增檔案
    @Test
    void testAddDocument() throws IOException {
        // 創建物件
        User user = new User("世文", 3);
        // 創建請求
        IndexRequest request = new IndexRequest("yang_index");
        // PUT /索引/_doc/1 不默認id為1的話會產生一個隨機的id
        request.id("1");
        // 設定請求超時時間 request.timeout("1s");
        request.timeout(TimeValue.timeValueSeconds(1));
        // 將我們的資料放入請求 json 發出請求
        request.source(JSON.toJSONString(user), XContentType.JSON);
        // 客戶端發出請求 獲取回應結果
        IndexResponse indexResponse = client.index(request, RequestOptions.DEFAULT);
        System.out.println("indexResponse.toString() = " + indexResponse.toString());
        // 對應命令回傳狀態
        System.out.println("indexResponse.status() = " + indexResponse.status());
    }

    // 獲取檔案 判斷是否存在 get /index/_doc/1
    @Test
    void existDocument() throws IOException {
        GetRequest request = new GetRequest("yang_index", "1");
        // 不獲取 _source 背景關系了
        request.fetchSourceContext(new FetchSourceContext(false));
        // 根據指定欄位排序
        request.storedFields("_none_");
        // 檔案id
        System.out.println("request.id() = " + request.id());
        // 命令
        System.out.println("request.toString() = " + request.toString());
        System.out.println("client.exists(request, RequestOptions.DEFAULT) = " + client.exists(request, RequestOptions.DEFAULT));
    }

    // 獲取檔案內容
    @Test
    void getDocument() throws IOException {
        GetRequest request = new GetRequest("yang_index", "1");
        GetResponse response = client.get(request, RequestOptions.DEFAULT);
        // 獲取指定欄位 userName
        response.getField("userName");
        // 獲取內容 或者可以封裝成map或者其他內容
        System.out.println("response.getSourceAsString() = " + response.getSourceAsString());
        // 這里回傳的內容跟是命令完全相同的
        System.out.println("response = " + response);
    }

    // 更新檔案
    @Test
    void updateDocument() throws IOException {
        UpdateRequest request = new UpdateRequest("yang_index", "1");
        // 請求最大時長1s
        request.timeout(TimeValue.timeValueSeconds(1));
        User user = new User("世文說java", 18);
        // 更新檔案 POST /索引/_doc/檔案/_update  doc [field = ""]
        request.doc(JSON.toJSONString(user), XContentType.JSON);
        // 執行請求
        UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
        System.out.println("response = " + response);
    }

    // 洗掉檔案
    @Test
    void deleteDocument() throws IOException {
        DeleteRequest request = new DeleteRequest("yang_index", "1");
        // 請求時間1s, 超時不請求了
        request.timeout(TimeValue.timeValueSeconds(1));
        DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
        System.out.println("response = " + response.status());
    }

    // 批量插入資料
    @Test
    void batchAdd() throws IOException {
        BulkRequest request = new BulkRequest();
        request.timeout(TimeValue.timeValueSeconds(10));
        List<User> userList = new ArrayList<>();
        userList.add(new User("DD", 1));
        userList.add(new User("CC", 2));
        userList.add(new User("BB", 3));
        userList.add(new User("EE", 4));
        userList.add(new User("FF", 5));
        userList.add(new User("GG", 6));
        userList.add(new User("HH", 7));
        // 批處理請求
        for (int i = 0; i < userList.size(); i++) {
            // 對應的批量更新和批量洗掉都在這里執行
            request.add(
                new IndexRequest("yang_index_bulk")
                        .id("" + (i + 1))
                        .source(JSON.toJSONString(userList.get(i)), XContentType.JSON));
        }
        // 執行請求
        BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT);
        System.out.println("bulk = " + bulk.hasFailures());     // 是否失敗, false代表成功
    }

    // 批量查詢
    @Test
    void bulkGet() throws IOException {
        SearchRequest searchRequest = new SearchRequest("yang_index_bulk");
        // 構建搜索條件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        // 分頁
        searchSourceBuilder.from(0);
        searchSourceBuilder.size(10);
        // 高亮構建
        // searchSourceBuilder.highlighter();
        // 構建條件查詢
        // 模糊查詢
        // MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("userName", "AA");
        // 查詢所有
        // MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
        // 精確查詢
        MatchPhraseQueryBuilder matchPhraseQueryBuilder = QueryBuilders.matchPhraseQuery("userName", "CC");
        // 這個失效了, 可以使用上面的
        // TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("userName", "AA");
        searchSourceBuilder.query(matchPhraseQueryBuilder);

        // 時間不能超過10s
        // searchSourceBuilder.timeout(TimeValue.timeValueSeconds(10));
        searchSourceBuilder.timeout(new TimeValue(10, TimeUnit.SECONDS));
        // 放入請求
        searchRequest.source(searchSourceBuilder);
        SearchResponse response = client.search(searchRequest, RequestOptions.DEFAULT);
        // 所有的資料都在hits里面
        System.out.println("response.getHits() = " + JSON.toJSONString(response.getHits()));
        System.out.println(" ============== ");
        for (SearchHit documentFields : response.getHits().getHits()) {
            // 列印資料
            System.out.println(documentFields.getSourceAsMap());
        }
    }

}

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/281616.html

標籤：其他

上一篇：實驗一：大資料可視化工具—Excel

下一篇：解決org.apache.spark.SparkException: Couldn‘t find leader offsets for Set