【Elasticsearch】Elasticsearch知识点

Elasticsearch介绍

Elasticsearch是一个分布式数据存储,能够近实时地导入、索引和管理各种类型的数据,使其既可搜索又可分析。凭借专业的用户界面和工具,它提供了创建、部署和运行各种应用的灵活性,从搜索到分析再到 AI 驱动的解决方案。

以下是Elasticsearch的使用案例:

用例 业务目标 技术要求
向量搜索/混合搜索 运行最近邻搜索,结合文本生成混合结果 密集嵌入、稀疏嵌入,结合文本/BM25
电子商务/产品目录搜索 提供快速、相关且最新的结果,分面导航 库存同步、用户行为跟踪、结果缓存
工作场所/知识库搜索 跨数据源搜索,执行权限 第三方连接器、文档级安全、角色映射
网站搜索 提供相关且最新的结果 网页爬取、增量索引、查询缓存
客户支持搜索 呈现相关解决方案,管理访问控制,跟踪指标 知识图谱、基于角色的访问、分析
聊天机器人/RAG 促进自然对话,提供背景,保持知识 矢量搜索、机器学习模型、知识库集成
地理空间搜索 进程位置查询,按距离排序,按区域筛选 地理映射、空间索引、距离计算

最近工作内容和Elasticsearch相关,包括Elasticsearch查询压测以及性能优化,因此写一篇博客记录一下相关知识点。

索引基础

索引是Elasticsearch中存储的基本单位。它是一组通过名字或别名唯一标识的文件。这个唯一名称很重要,因为它用于搜索查询和其他作中针对该索引。

索引由以下组成部分组成:

  • Documents(文档)
  • Metadata fields(元数据字段)
  • Mappings and data types(映射与数据类型)

Documents

ElasticsearchJSON文档的形式序列化和存储数据。文档是一组字段,这些字段是包含你数据的键值对。每个文档都有唯一的ID,你可以创建ID,也可以让Elasticsearch自动生成。

一个简单的 Elasticsearch 文档可能如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
{
  "_index": "my-first-elasticsearch-index",
  "_id": "DyFpo5EBxE8fzbb95DOa",
  "_version": 1,
  "_seq_no": 0,
  "_primary_term": 1,
  "found": true,
  "_source": {
    "email": "[email protected]",
    "first_name": "John",
    "last_name": "Smith",
    "info": {
      "bio": "Eco-warrior and defender of the weak",
      "age": 25,
      "interests": [
        "dolphins",
        "whales"
      ]
    },
    "join_date": "2024/05/01"
  }
}

Metadata fields

索引文档包含数据和元数据。 元数据字段是存储文档信息的系统字段。在 Elasticsearch 中,元数据字段前加下划线。例如,以下字段是元数据字段:

  • _index:文档存储的索引名称。
  • _id:文档的 ID。ID 必须在每个索引中唯一。

Mappings and data types

每个索引都有一个mappingschema,用于描述文档中字段的索引方式。映射定义了每个字段的数据类型 、字段应如何索引以及存储方式。

创建索引

1
PUT /books

添加单一文档

请使用以下请求将单一文档添加到图书索引中。如果索引还不存在,这个请求会自动生成它。

1
2
3
4
5
6
7
POST books/_doc
{
  "name": "Snow Crash",
  "author": "Neal Stephenson",
  "release_date": "1992-06-01",
  "page_count": 470
}

添加多份文档

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
POST /_bulk
{ "index" : { "_index" : "books" } }
{"name": "Revelation Space", "author": "Alastair Reynolds", "release_date": "2000-03-15", "page_count": 585}
{ "index" : { "_index" : "books" } }
{"name": "1984", "author": "George Orwell", "release_date": "1985-06-01", "page_count": 328}
{ "index" : { "_index" : "books" } }
{"name": "Fahrenheit 451", "author": "Ray Bradbury", "release_date": "1953-10-15", "page_count": 227}
{ "index" : { "_index" : "books" } }
{"name": "Brave New World", "author": "Aldous Huxley", "release_date": "1932-06-01", "page_count": 268}
{ "index" : { "_index" : "books" } }
{"name": "The Handmaids Tale", "author": "Margaret Atwood", "release_date": "1985-06-01", "page_count": 311}

QueryDSL

全文搜索需要的是_search API 和 Query DSL

先创建索引

1
PUT /cooking_blog

创建映射

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
PUT /cooking_blog/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "analyzer": "standard",
      "fields": { 
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "description": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "author": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "date": {
      "type": "date",
      "format": "yyyy-MM-dd"
    },
    "category": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "tags": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "rating": {
      "type": "float"
    }
  }
}

批量添加数据

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
POST /cooking_blog/_bulk?refresh=wait_for
{"index":{"_id":"1"}}
{"title":"Perfect Pancakes: A Fluffy Breakfast Delight","description":"Learn the secrets to making the fluffiest pancakes, so amazing you won't believe your tastebuds. This recipe uses buttermilk and a special folding technique to create light, airy pancakes that are perfect for lazy Sunday mornings.","author":"Maria Rodriguez","date":"2023-05-01","category":"Breakfast","tags":["pancakes","breakfast","easy recipes"],"rating":4.8}
{"index":{"_id":"2"}}
{"title":"Spicy Thai Green Curry: A Vegetarian Adventure","description":"Dive into the flavors of Thailand with this vibrant green curry. Packed with vegetables and aromatic herbs, this dish is both healthy and satisfying. Don't worry about the heat - you can easily adjust the spice level to your liking.","author":"Liam Chen","date":"2023-05-05","category":"Main Course","tags":["thai","vegetarian","curry","spicy"],"rating":4.6}
{"index":{"_id":"3"}}
{"title":"Classic Beef Stroganoff: A Creamy Comfort Food","description":"Indulge in this rich and creamy beef stroganoff. Tender strips of beef in a savory mushroom sauce, served over a bed of egg noodles. It's the ultimate comfort food for chilly evenings.","author":"Emma Watson","date":"2023-05-10","category":"Main Course","tags":["beef","pasta","comfort food"],"rating":4.7}
{"index":{"_id":"4"}}
{"title":"Vegan Chocolate Avocado Mousse","description":"Discover the magic of avocado in this rich, vegan chocolate mousse. Creamy, indulgent, and secretly healthy, it's the perfect guilt-free dessert for chocolate lovers.","author":"Alex Green","date":"2023-05-15","category":"Dessert","tags":["vegan","chocolate","avocado","healthy dessert"],"rating":4.5}
{"index":{"_id":"5"}}
{"title":"Crispy Oven-Fried Chicken","description":"Get that perfect crunch without the deep fryer! This oven-fried chicken recipe delivers crispy, juicy results every time. A healthier take on the classic comfort food.","author":"Maria Rodriguez","date":"2023-05-20","category":"Main Course","tags":["chicken","oven-fried","healthy"],"rating":4.9}

match 单字段查询

match query是全文搜索的标准查询。查询文本将根据每个字段(或查询时)指定的分析器配置进行分析。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
GET /cooking_blog/_search
{
  "query": {
    "match": {
      "description": {
        "query": "fluffy pancakes"
      }
    }
  }
}

包含查询中所有匹配的术语

指定and算符,使描述字段包含这两个术语。这种更严格的搜索对样本数据没有结果 ,因为没有文档同时包含“fluffy”和“pancakes”

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GET /cooking_blog/_search
{
  "query": {
    "match": {
      "description": {
        "query": "fluffy pancakes",
        "operator": "and"
      }
    }
  }
}

指定匹配的最低词数

使用minimum_should_match参数指定文档搜索结果中应包含的最少词数

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GET /cooking_blog/_search
{
  "query": {
    "match": {
      "title": {
        "query": "fluffy pancakes breakfast",
        "minimum_should_match": 2
      }
    }
  }
}

multi_match 跨多个字段搜索

当你输入搜索查询时,可能不知道搜索词是否出现在某个特定字段中。multi_match查询可以让你同时搜索多个字段。

1
2
3
4
5
6
7
8
9
GET /cooking_blog/_search
{
  "query": {
    "multi_match": {
      "query": "vegetarian curry",
      "fields": ["title", "description", "tags"]
    }
  }
}

使用字段增强来调整每个字段的重要性

1
2
3
4
5
6
7
8
9
GET /cooking_blog/_search
{
  "query": {
    "multi_match": {
      "query": "vegetarian curry",
      "fields": ["title^3", "description^2", "tags"]
    }
  }
}

filter 筛选查询

filter查询是bool查询的子查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
GET /cooking_blog/_search
{
  "query": {
    "bool": {
      "filter": [
        { "term": { "category.keyword": "Breakfast" } }
      ]
    }
  }
}

range 范围查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GET /cooking_blog/_search
{
  "query": {
    "range": {
      "date": {
        "gte": "2023-05-01",
        "lte": "2023-05-31"
      }
    }
  }
}

term 查询找到完全匹配的

1
2
3
4
5
6
7
8
GET /cooking_blog/_search
{
  "query": {
    "term": {
      "author.keyword": "Maria Rodriguez"
    }
  }
}

结合多个搜索条件

你可以用bool查询组合多个查询子句,创建复杂的搜索。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
GET /cooking_blog/_search
{
  "query": {
    "bool": {
      "must": [
        { "term": { "tags": "vegetarian" } },
        {
          "range": {
            "rating": {
              "gte": 4.5
            }
          }
        }
      ],
      "should": [
        {
          "term": {
            "category.keyword": "Main Course"
          }
        },
        {
          "multi_match": {
            "query": "curry spicy",
            "fields": [
              "title^2",
              "description"
            ]
          }
        },
        {
          "range": {
            "date": {
              "gte": "now-1M/d"
            }
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "category.keyword": "Dessert"
          }
        }
      ]
    }
  }
}

Compound queries 复合查询

boolean query

由一个或多个布尔子句构建,每个子句都带有类型出现。发生类型包括

  • must
  • must_not
  • should
  • filter

其中mustshould对得分有影响,不会被缓存。must_notfilter对得分没影响,会被缓存。

示例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user.id" : "kimchy" }
      },
      "filter": {
        "term" : { "tags" : "production" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tags" : "env1" } },
        { "term" : { "tags" : "deployed" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

Boosting query

提升positive或者降低negative相关性得分,可以使用提升查询来降级某些文档,同时不排除它们在搜索结果中。

示例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

Constant score query

filter query进行封装,返回所有相关性评分等于boost参数值的匹配文档

示例查询:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
GET /_search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": { "user.id": "kimchy" }
      },
      "boost": 1.2
    }
  }
}

Disjunction max query

返回匹配一个或多个包裹查询的文档,称为查询子句或子句。

如果返回的文档匹配多个查询子句,dis_max query将赋予该文档任何匹配子句中最高的相关性分数,并对任何额外的匹配子查询加成平局破坏分数。

示例查询:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "term": { "title": "Quick pets" } },
        { "term": { "body": "Quick pets" } }
      ],
      "tie_breaker": 0.7
    }
  }
}

查询参数:

  • queries
  • tie_breaker

Function score query

function_score 允许你修改查询检索到的文档分数

使用function_score时,用户必须定义一个查询和一个或多个函数,这些函数为查询返回的每个文档计算新的分数

查询示例:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
GET /_search
{
  "query": {
    "function_score": {
      "query": { "match_all": {} },
      "boost": "5",
      "functions": [
        {
          "filter": { "match": { "test": "bar" } },
          "random_score": {},
          "weight": 23
        },
        {
          "filter": { "match": { "test": "cat" } },
          "weight": 42
        }
      ],
      "max_boost": 42,
      "score_mode": "max",
      "boost_mode": "multiply",
      "min_score": 42
    }
  }
}
Licensed under CC BY-NC-SA 4.0
comments powered by Disqus
使用 Hugo 构建
主题 StackJimmy 设计