SpringBoot+Lucene第四篇——入门代码

释放双眼，带上耳机，听听看~！

Lucene的学习第四篇——入门代码

需求：
通过关键字搜索文件，凡是文件名或文件内容包括关键字的文件都需要找出来：下图（是一堆文件列表）
SpringBoot+Lucene第四篇——入门代码
本人使用版本与环境：
lucene4.10.2
Jdk：1.8（Jdk要求：1.7以上）
SpringBoot：2.1.3
IDE：IntelliJ IDEA
Pom.xml


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
1 &lt;dependency&gt;

2            &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

3            &lt;artifactId&gt;lucene-core&lt;/artifactId&gt;

4            &lt;version&gt;4.10.2&lt;/version&gt;

5        &lt;/dependency&gt;

6        &lt;dependency&gt;

7            &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

8            &lt;artifactId&gt;lucene-analyzers-common&lt;/artifactId&gt;

9            &lt;version&gt;4.10.2&lt;/version&gt;

10        &lt;/dependency&gt;

11        &lt;dependency&gt;

12            &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

13            &lt;artifactId&gt;lucene-queryparser&lt;/artifactId&gt;

14            &lt;version&gt;4.10.2&lt;/version&gt;

15        &lt;/dependency&gt;

16        &lt;dependency&gt;

17            &lt;groupId&gt;com.janeluo&lt;/groupId&gt;

18            &lt;artifactId&gt;ikanalyzer&lt;/artifactId&gt;

19            &lt;version&gt;2012_u6&lt;/version&gt;

20        &lt;/dependency&gt;

21

22        &lt;!--中文分词器--&gt;

23        &lt;dependency&gt;

24            &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

25            &lt;artifactId&gt;lucene-analyzers-smartcn&lt;/artifactId&gt;

26            &lt;version&gt;7.6.0&lt;/version&gt;

27        &lt;/dependency&gt;

28        &lt;!--文件IO操作--&gt;

29        &lt;dependency&gt;

30            &lt;groupId&gt;commons-io&lt;/groupId&gt;

31            &lt;artifactId&gt;commons-io&lt;/artifactId&gt;

32            &lt;version&gt;2.6&lt;/version&gt;

33        &lt;/dependency&gt;

34

35

代码


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
1package com.example.test;

2

3import ch.qos.logback.core.net.SyslogOutputStream;

4import org.apache.commons.io.FileUtils;

5import org.apache.lucene.analysis.Analyzer;

6import org.apache.lucene.analysis.standard.StandardAnalyzer;

7import org.apache.lucene.document.*;

8import org.apache.lucene.index.*;

9import org.apache.lucene.search.IndexSearcher;

10import org.apache.lucene.search.ScoreDoc;

11import org.apache.lucene.search.TermQuery;

12import org.apache.lucene.search.TopDocs;

13import org.apache.lucene.store.Directory;

14import org.apache.lucene.store.FSDirectory;

15import org.apache.lucene.util.Version;

16import org.junit.Test;

17

18import java.io.File;

19

20public class FileTest {

21    /**

22     * 创建索引

23     * @throws Exception

24     */

25    @Test

26    public void createIndex() throws Exception{

27        //索引库存放的位置，也可以放在硬盘

28        Directory directory= FSDirectory.open(new File(&quot;./index&quot;));

29        //标准的分词器

30        Analyzer analyzer =new StandardAnalyzer();

31        //创建输出流write

32        IndexWriterConfig config =new IndexWriterConfig(Version.LUCENE_4_10_2,analyzer);

33        IndexWriter indexWriter = new IndexWriter(directory,config);

34

35

36        //创建Filed域

37        File f=new File(&quot;F:\\a&quot;);

38        //找到下面的所有待搜索的文件

39        File[] listFiles=f.listFiles();

40        for (File file:listFiles){

41            //创建文档对象

42            Document document=new Document();

43            //文件名称

44            String file_name=file.getName();

45            Field fileNameFiled=new TextField(&quot;fileName&quot;,file_name, Field.Store.YES);

46            //文件大小

47            long file_size= FileUtils.sizeOf(file);

48            Field fileSizeField=new LongField(&quot;fileSize&quot;,file_size,Field.Store.YES);

49            //文件路径

50            String file_path=file.getPath();

51            Field filePathField=new StoredField(&quot;filePath&quot;,file_path);

52            //文件内容

53            String file_content = FileUtils.readFileToString(file,&quot;utf8&quot;);

54            Field fileContentField=new TextField(&quot;fileContent&quot;,file_content, Field.Store.YES);

55

56            //保存到文件对象里

57            document.add(fileNameFiled);

58            document.add(fileSizeField);

59            document.add(filePathField);

60            document.add(fileContentField);

61

62            //写到索引库

63            indexWriter.addDocument(document);

64        }

65        //关闭

66        indexWriter.close();

67    }

68

69    /**

70     * 查询索引

71     * @throws Exception

72     */

73    @Test

74    public void searchIndex() throws Exception{

75        //第一步，查询准备工作，创建Directory对象

76        Directory dir = FSDirectory.open(new File(&quot;./index&quot;));

77        //创建IndexReader对象

78        IndexReader reader= DirectoryReader.open(dir);

79        //创建IndexSearch对象

80        IndexSearcher search =new IndexSearcher(reader);

81

82        //第二步，闯将查询条件对象

83        TermQuery query=new TermQuery(new Term(&quot;fileContent&quot;,&quot;what&quot;));

84        //第三步：执行查询，参数（1：查询条件对象，2：查询结果返回的最大值）

85        TopDocs topDocs=search.search(query,10);

86        //第四步：处理查询结果

87        //输出结果数量

88        System.out.print(&quot;查询结果数量：&quot;+topDocs.totalHits);

89        //取得结果集

90        ScoreDoc[] scoreDocs=topDocs.scoreDocs;

91        for (ScoreDoc scoreDoc:scoreDocs){

92            System.out.println(&quot;当前doc得分:&quot;+scoreDoc.score);

93            //根据文档对象ID取得文档对象

94            Document doc=search.doc(scoreDoc.doc);

95            System.out.println(&quot;文件名称：&quot;+doc.get(&quot;fileName&quot;));

96            System.out.println(&quot;文件路径：&quot;+doc.get(&quot;filePath&quot;));

97            System.out.println(&quot;文件大小：&quot;+doc.get(&quot;fileSize&quot;));

98            System.out.println(&quot;=======================================&quot;);

99        }

100        //关闭IndexReader对象

101        reader.close();

102    }

103}

104

105

106

searchIndex()方法运行后出现类似的索引库，则表示成功
SpringBoot+Lucene第四篇——入门代码
searchIndex执行相应的搜索条件之后：

通过以上的两段代码我们实现了创建索引与查询索引。
第一段代码做了这么几个事：
将我们要查询的每个文档，构建了了文档对象。文档对象里面存放的就是该文档的信息。（文件名，大小，内容，路径等）
将该文档对象扔进索引库（自动创建了索引）
索引库存放在./index 目录下

第二段代码：
就是到索引库的目录下找fileContent里面有：whatt的文档。然后输出了该文档的信息。
更换查询条件，如查询名称为aaabbb.txt,aaabbb,汪浩斌.txt的文档，再去看上一篇文章开篇的疑问

中文分词器：
我们还是面临一个问题：
如何通过“全文” 搜到我们想要的“全文检索.txt”文档？
我们通过lukeall查看索引，找到了原因。那就是没有正确的分词，是因为我们在代码中使用的是官方推荐的标准分词器，而这个分词器，是老外的，不能对中文进行分词，所以我们要使用中文分词器。而现在lucene的中文分词器：CJK词器，smartChinese分词器。
CJK分词器：是二分法：举例：我爱写代码：分成：我爱，爱写，写代，代码。
smartChinese：扩展性不太好，
市场用的有:庖丁解牛，mmseg4j。但是这两个作者多年没有更新了。这里主要介绍IK 分词器。
这里仅仅介绍IK分词器的使用：


1
2
3
4
5
6
7
1 &lt;dependency&gt;

2            &lt;groupId&gt;com.janeluo&lt;/groupId&gt;

3            &lt;artifactId&gt;ikanalyzer&lt;/artifactId&gt;

4            &lt;version&gt;2012_u6&lt;/version&gt;

5 &lt;/dependency&gt;

6

7

之前的代码里用的是标准分词器，老外的，不支持中文分词，下面换Ik分词器


1
2
3
4
5
6
1       //标准的分词器

2        //Analyzer analyzer =new StandardAnalyzer();

3        //下面替换为ik分词器

4        Analyzer analyzer =new IKAnalyzer();

5

6

再执行查询方法，可以看到中文查询条件也可以的到结果 SpringBoot+Lucene第四篇——入门代码

{{userData.name}}已认证

SpringBoot+Lucene第四篇——入门代码

Lucene的学习第四篇——入门代码

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

{{userData.name}}已认证

Lucene的学习第四篇——入门代码

Related posts:

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

Step into Redis- 06 - 数据库

dubbo2.5-spring4-mybastis3.2-springmvc4-mongodb3.4-redis3.2整合（五）Spring中spring-data-redis的使用

SpringBoot+Mysql+Redis+RabbitMQ队列+多线程模拟并发-实现请求并发下的商城秒杀系统

springboot整合redis实现消息队列