[Elasticsearch] 多字段搜索 (四) – 跨字段实体搜索

安全运维
21年12月11日
编辑

aqzt

释放双眼，带上耳机，听听看~！

跨字段实体搜索(Cross-fields Entity Search)

现在让我们看看一个常见的模式：跨字段实体搜索。类似person，product或者address这样的实体，它们的信息会分散到多个字段中。我们或许有一个person实体被索引如下：


1
2
3
4
5
1{

2    &quot;firstname&quot;:  &quot;Peter&quot;,

3    &quot;lastname&quot;:   &quot;Smith&quot;

4}

5

而address实体则是像下面这样：


1
2
3
4
5
6
7
1{

2    &quot;street&quot;:   &quot;5 Poland Street&quot;,

3    &quot;city&quot;:     &quot;London&quot;,

4    &quot;country&quot;:  &quot;United Kingdom&quot;,

5    &quot;postcode&quot;: &quot;W1V 3DG&quot;

6}

7

这个例子也许很像在多查询字符串中描述的，但是有一个显著的区别。在多查询字符串中，我们对每个字段都使用了不同的查询字符串。在这个例子中，我们希望使用一个查询字符串来搜索多个字段。

用户也许会搜索名为"Peter Smith"的人，或者名为"Poland Street W1V"的地址。每个查询的单词都出现在不同的字段中，因此使用dis_max/best_fields查询来搜索单个最佳匹配字段显然是不对的。

一个简单的方法

实际上，我们想要依次查询每个字段然后将每个匹配字段的分值进行累加，这听起来很像bool查询能够胜任的工作：


1
2
3
4
5
6
7
8
9
10
11
12
13
1{

2  &quot;query&quot;: {

3    &quot;bool&quot;: {

4      &quot;should&quot;: [

5        { &quot;match&quot;: { &quot;street&quot;:    &quot;Poland Street W1V&quot; }},

6        { &quot;match&quot;: { &quot;city&quot;:      &quot;Poland Street W1V&quot; }},

7        { &quot;match&quot;: { &quot;country&quot;:   &quot;Poland Street W1V&quot; }},

8        { &quot;match&quot;: { &quot;postcode&quot;:  &quot;Poland Street W1V&quot; }}

9      ]

10    }

11  }

12}

13

对每个字段重复查询字符串很快就会显得冗长。我们可以使用multi_match查询进行替代，然后将type设置为most_fields来让它将所有匹配字段的分值合并：


1
2
3
4
5
6
7
8
9
10
1{

2  &quot;query&quot;: {

3    &quot;multi_match&quot;: {

4      &quot;query&quot;:       &quot;Poland Street W1V&quot;,

5      &quot;type&quot;:        &quot;most_fields&quot;,

6      &quot;fields&quot;:      [ &quot;street&quot;, &quot;city&quot;, &quot;country&quot;, &quot;postcode&quot; ]

7    }

8  }

9}

10

使用most_fields存在的问题

使用most_fields方法执行实体查询有一些不那么明显的问题：

它被设计用来找到匹配任意单词的多数字段，而不是找到跨越所有字段的最匹配的单词。
它不能使用operator或者minimum_should_match参数来减少低相关度结果带来的长尾效应。
每个字段的词条频度是不同的，会互相干扰最终得到较差的排序结果。

{{userData.name}}已认证

[Elasticsearch] 多字段搜索 (四) – 跨字段实体搜索

跨字段实体搜索(Cross-fields Entity Search)

一个简单的方法

使用most_fields存在的问题

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

{{userData.name}}已认证

跨字段实体搜索(Cross-fields Entity Search)

一个简单的方法

使用most_fields存在的问题

Related posts:

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

nginx+nginx-rtmp-module+ffmpeg搭建流媒体服务器

Spark集群搭建

centos7搭建postfix邮件服务器

Elasticsearch中 重新索引数据脚本(可用作数据迁移)

Elasticsearch中重新索引数据脚本(可用作数据迁移)