Lucene全文检索

释放双眼，带上耳机，听听看~！

Demo地址：https://github.com/UserFengFeng/Lucene-Maven.git

伸手党======>> Luke、IKAnalyzer7.2.0.jar：


1
2
3
4
1链接：https://pan.baidu.com/s/1vaifZeSG5Uj5HmSYU89GXQ 

2提取码：dbnm 

3复制这段内容后打开百度网盘手机App，操作更方便哦

4

关于它的介绍，请自行百度，不过多解释。

全文检索首先将要查询的目标文档中的词提取出来，组成索引（相当于书的目录），通过查询索引达到搜索目标文档的目的，这种先建立索引，在对索引进行搜索的过程叫做全文检索（Full-textSearch）。

有两个概念叫：正排索引，倒排索引

Lucene是apache下的一个开源的全文检索引擎工具包，提供了完整的查询引擎和索引引擎，部分文本分析引擎。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能。

Lucene和搜索引擎不同，Lucene是一套用java或其他语言写的全文检索的工具包，为应用程序提供了很多个api接口去调用，可以简单理解为是一套实现全文检索的类库，搜索引擎是一个全文检索系统，它是一个单独运行的软件。

Lucene和luke结合使用。

Demo: Maven+Lucene

1.目录结构：

Lucene全文检索

2.pom.xml文件


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
1&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;

2

3&lt;project xmlns=&quot;http://maven.apache.org/POM/4.0.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot;

4  xsi:schemaLocation=&quot;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd&quot;&gt;

5  &lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt;

6

7  &lt;groupId&gt;day01&lt;/groupId&gt;

8  &lt;artifactId&gt;day01&lt;/artifactId&gt;

9  &lt;version&gt;1.0-SNAPSHOT&lt;/version&gt;

10  &lt;packaging&gt;war&lt;/packaging&gt;

11

12  &lt;name&gt;day01 Maven Webapp&lt;/name&gt;

13  &lt;!-- FIXME change it to the project&#x27;s website --&gt;

14  &lt;url&gt;http://www.example.com&lt;/url&gt;

15

16  &lt;properties&gt;

17    &lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;

18    &lt;maven.compiler.source&gt;1.7&lt;/maven.compiler.source&gt;

19    &lt;maven.compiler.target&gt;1.7&lt;/maven.compiler.target&gt;

20  &lt;/properties&gt;

21

22  &lt;dependencies&gt;

23    &lt;dependency&gt;

24      &lt;groupId&gt;junit&lt;/groupId&gt;

25      &lt;artifactId&gt;junit&lt;/artifactId&gt;

26      &lt;version&gt;4.11&lt;/version&gt;

27      &lt;scope&gt;test&lt;/scope&gt;

28    &lt;/dependency&gt;

29

30    &lt;dependency&gt;

31      &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

32      &lt;artifactId&gt;lucene-core&lt;/artifactId&gt;

33      &lt;version&gt;7.2.0&lt;/version&gt;

34    &lt;/dependency&gt;

35    &lt;!--一般分词器，适用于英文分词--&gt;

36    &lt;dependency&gt;

37      &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

38      &lt;artifactId&gt;lucene-analyzers-common&lt;/artifactId&gt;

39      &lt;version&gt;7.2.0&lt;/version&gt;

40    &lt;/dependency&gt;

41    &lt;!--中文分词器--&gt;

42    &lt;dependency&gt;

43      &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

44      &lt;artifactId&gt;lucene-analyzers-smartcn&lt;/artifactId&gt;

45      &lt;version&gt;7.2.0&lt;/version&gt;

46    &lt;/dependency&gt;

47

48    &lt;!--对分词索引查询解析--&gt;

49    &lt;dependency&gt;

50      &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

51      &lt;artifactId&gt;lucene-queryparser&lt;/artifactId&gt;

52      &lt;version&gt;7.2.0&lt;/version&gt;

53    &lt;/dependency&gt;

54    &lt;!--检索关键字高亮显示--&gt;

55    &lt;dependency&gt;

56      &lt;groupId&gt;org.apache.lucene&lt;/groupId&gt;

57      &lt;artifactId&gt;lucene-highlighter&lt;/artifactId&gt;

58      &lt;version&gt;7.2.0&lt;/version&gt;

59    &lt;/dependency&gt;

60

61    &lt;!-- https://mvnrepository.com/artifact/com.janeluo/ikanalyzer --&gt;

62    &lt;dependency&gt;

63      &lt;groupId&gt;com.janeluo&lt;/groupId&gt;

64      &lt;artifactId&gt;ikanalyzer&lt;/artifactId&gt;

65      &lt;version&gt;2012_u6&lt;/version&gt;

66    &lt;/dependency&gt;

67  &lt;/dependencies&gt;

68

69  &lt;build&gt;

70    &lt;finalName&gt;day01&lt;/finalName&gt;

71    &lt;pluginManagement&gt;&lt;!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) --&gt;

72      &lt;plugins&gt;

73        &lt;plugin&gt;

74          &lt;artifactId&gt;maven-clean-plugin&lt;/artifactId&gt;

75          &lt;version&gt;3.1.0&lt;/version&gt;

76        &lt;/plugin&gt;

77        &lt;!-- see http://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_war_packaging --&gt;

78        &lt;plugin&gt;

79          &lt;artifactId&gt;maven-resources-plugin&lt;/artifactId&gt;

80          &lt;version&gt;3.0.2&lt;/version&gt;

81        &lt;/plugin&gt;

82        &lt;plugin&gt;

83          &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;

84          &lt;version&gt;3.8.0&lt;/version&gt;

85        &lt;/plugin&gt;

86        &lt;plugin&gt;

87          &lt;artifactId&gt;maven-surefire-plugin&lt;/artifactId&gt;

88          &lt;version&gt;2.22.1&lt;/version&gt;

89        &lt;/plugin&gt;

90        &lt;plugin&gt;

91          &lt;artifactId&gt;maven-war-plugin&lt;/artifactId&gt;

92          &lt;version&gt;3.2.2&lt;/version&gt;

93        &lt;/plugin&gt;

94        &lt;plugin&gt;

95          &lt;artifactId&gt;maven-install-plugin&lt;/artifactId&gt;

96          &lt;version&gt;2.5.2&lt;/version&gt;

97        &lt;/plugin&gt;

98        &lt;plugin&gt;

99          &lt;artifactId&gt;maven-deploy-plugin&lt;/artifactId&gt;

100          &lt;version&gt;2.8.2&lt;/version&gt;

101        &lt;/plugin&gt;

102      &lt;/plugins&gt;

103    &lt;/pluginManagement&gt;

104  &lt;/build&gt;

105&lt;/project&gt;

106

3.Test类


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
1package zhou;

2

3import org.apache.lucene.analysis.Analyzer;

4import org.apache.lucene.analysis.standard.StandardAnalyzer;

5import org.apache.lucene.document.Document;

6import org.apache.lucene.document.Field;

7import org.apache.lucene.document.StringField;

8import org.apache.lucene.document.TextField;

9import org.apache.lucene.index.IndexWriter;

10import org.apache.lucene.index.IndexWriterConfig;

11import org.apache.lucene.queries.function.valuesource.LongFieldSource;

12import org.apache.lucene.store.FSDirectory;

13import org.junit.Before;

14import org.junit.Test;

15

16import java.awt.*;

17import java.io.*;

18import java.nio.file.Path;

19import java.nio.file.Paths;

20

21import static org.apache.lucene.document.Field.Store.YES;

22

23public class LuceneTest {

24    @Before

25    public void setUp() throws Exception{

26    }

27

28    /*

29    *   导入索引

30    * */

31    @Test

32    public void importIndex() throws IOException {

33        // 获得索引库的位置

34        // 项目路径下创建索引库的文件夹index_loc

35        Path path = Paths.get(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\index_loc&quot;);

36        //  打开索引库

37        FSDirectory dir = FSDirectory.open(path);

38        // 创建分词器

39        Analyzer al = new StandardAnalyzer();

40        // 创建索引的写入的配置对象

41        IndexWriterConfig iwc = new IndexWriterConfig(al);

42        // 创建索引的Writer

43        IndexWriter iw = new IndexWriter(dir, iwc);

44        /*

45        * 采集原始文档

46        * 创建searchsource文件，放入原始文档文件

47        * */

48        File sourceFile = new File(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\searchsource&quot;);

49        // 获得文件夹下的所有文件

50        File[] files = sourceFile.listFiles();

51        // 遍历每一个文件

52        for(File file : files) {

53            // 获得file的属性

54            String fileName = file.getName();

55

56            FileInputStream inputStream = new FileInputStream(file);

57            InputStreamReader streamReader = new InputStreamReader(inputStream);

58            BufferedReader reader  = new BufferedReader(streamReader);

59            String line;

60            // StringBuilder builder = new StringBuilder();

61            String content = null;

62            while ((line =reader.readLine()) != null) {

63                //  builder.append(line);

64                content += line;

65            }

66            reader.close();

67            inputStream.close();

68            String path1 = file.getPath();

69

70            //  StringField不分词

71            Field fName = new StringField(&quot;fileName&quot;, fileName, YES);

72            Field fcontent = new TextField(&quot;content&quot;, content, YES);

73            Field fsize = new TextField(&quot;size&quot;, &quot;1024&quot;, YES);

74            Field fpath = new TextField(&quot;path&quot;, path1, YES);

75            // 创建文档对象

76            Document document = new Document();

77            // 把域加入到文档中

78            document.add(fName);

79            document.add(fcontent);

80            document.add(fsize);

81            document.add(fpath);

82            // 把文档写入到索引库

83            iw.addDocument(document);

84        }

85        // 提交

86        iw.commit();

87        iw.close();

88    }

89}

90

4.Field常用类型

Lucene全文检索

注意：LongField分析会乱码，也能被分析，但是意义不大。

5.分词器介绍

Lucene全文检索

6.分词器案例

Test类：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
1package zhou;

2

3import org.apache.lucene.analysis.Analyzer;

4import org.apache.lucene.analysis.TokenStream;

5import org.apache.lucene.analysis.standard.StandardAnalyzer;

6import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

7import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;

8import org.apache.lucene.document.Document;

9import org.apache.lucene.document.Field;

10import org.apache.lucene.document.StringField;

11import org.apache.lucene.document.TextField;

12import org.apache.lucene.index.IndexWriter;

13import org.apache.lucene.index.IndexWriterConfig;

14import org.apache.lucene.store.FSDirectory;

15import org.junit.Before;

16import org.junit.Test; 

17

18import java.io.*;

19import java.nio.file.Path;

20import java.nio.file.Paths;

21

22import static org.apache.lucene.document.Field.Store.YES;

23

24public class LuceneTest1 {

25    @Before

26    public void setUp() throws Exception {

27    }

28

29    @Test

30    public void importIndex() throws IOException {

31        // 创建分词器（对中文分词不太良好）

32        // StandardAnalyzer al = new StandardAnalyzer(); 

33        Analyzer al = new CJKAnalyzer();

34        // 分词

35        TokenStream stream = al.tokenStream(&quot;content&quot;, &quot;Serving web content with spring mvc&quot;);

36        // 分词对象的重置

37        stream.reset();

38        // 获得每一个语汇的偏移量属性对象

39        OffsetAttribute oa = stream.addAttribute(OffsetAttribute.class);

40        // 获得分词的语汇属性

41        CharTermAttribute ca = stream.addAttribute(CharTermAttribute.class);

42        // 遍历分词的语汇流

43        while (stream.incrementToken()) {

44            System.out.println(&quot;------------------&quot;);

45            System.out.println(&quot;开始索引&quot; + oa.startOffset() + &quot;结束索引&quot; + oa.endOffset());

46            System.out.println(ca);

47        }

48    }

49}

50

Lucene全文检索

** 7.IKAnalyzer中文分词器**

注意：该jar需要另外导入，试了好几个版本的Maven在线导入都不行，只能换离线了

引入IKAnalyzer的jar包及配置文件：https://pan.baidu.com/s/1SrKHlv_YSKy8ffb28ZFtbQ

目录结构：

Lucene全文检索

配置文件的名称不能随便改，因为它的源码里面是写死的，不然会抛出异常。


1
2
3
4
5
6
7
8
9
10
11
12
13
1java.lang.RuntimeException: Main Dictionary not found!!!

2

3   at org.wltea.analyzer.dic.Dictionary.loadMainDict(Dictionary.java:200)

4   at org.wltea.analyzer.dic.Dictionary.&lt;init&gt;(Dictionary.java:69)

5   at org.wltea.analyzer.dic.Dictionary.initial(Dictionary.java:86)

6   at org.wltea.analyzer.core.IKSegmenter.init(IKSegmenter.java:85)

7   at org.wltea.analyzer.core.IKSegmenter.&lt;init&gt;(IKSegmenter.java:65)

8   at org.wltea.analyzer.lucene.IKTokenizer.&lt;init&gt;(IKTokenizer.java:78)

9   at org.wltea.analyzer.lucene.IKTokenizer.&lt;init&gt;(IKTokenizer.java:64)

10  at org.wltea.analyzer.lucene.IKAnalyzer.createComponents(IKAnalyzer.java:64)

11  at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:198)

12  at zhou.LuceneTest1.importAnalyzer(LuceneTest1.java:29)...

13

源码固定写死位置（也可进行更改源码自行定义）：

Lucene全文检索

Test类


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
1package zhou;

2

3import org.apache.lucene.analysis.Analyzer;

4import org.apache.lucene.analysis.TokenStream;

5import org.apache.lucene.analysis.cjk.CJKAnalyzer;

6import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;

7import org.apache.lucene.analysis.tokenattributes.OffsetAttribute;

8import org.junit.Before;

9import org.junit.Test;

10import org.wltea.analyzer.core.IKSegmenter;

11import org.wltea.analyzer.core.Lexeme;

12import org.wltea.analyzer.lucene.IKAnalyzer;

13

14import java.io.*;

15

16public class LuceneTest1 {

17    @Before

18    public void setUp() throws Exception {

19    }

20

21    @Test

22    public void importAnalyzer() throws IOException {

23        // 创建分词器

24        // StandardAnalyzer al = new StandardAnalyzer();

25        // Analyzer al = new CJKAnalyzer();

26        Analyzer al = new IKAnalyzer();

27

28        // 分词

29        TokenStream stream = al.tokenStream(&quot;content&quot;, &quot;当前市场不稳定，得赶紧稳盘抛出。&quot;);

30        // 分词对象的重置

31        stream.reset();

32        // 获得每一个语汇的偏移量属性对象

33        OffsetAttribute oa = stream.addAttribute(OffsetAttribute.class);

34        // 获得分词的语汇属性

35        CharTermAttribute ca = stream.addAttribute(CharTermAttribute.class);

36        // 遍历分词的语汇流

37        while (stream.incrementToken()) {

38            System.out.println(&quot;------------------&quot;);

39            System.out.println(&quot;开始索引&quot; + oa.startOffset() + &quot;结束索引&quot; + oa.endOffset());

40            System.out.println(ca);

41        }

42    }

43}

44

8.Luke的使用(管理索引库)

Lucene全文检索

9.添加索引


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
1public class LuceneTest2 {

2    @Before

3    public void setUp() throws Exception {

4    }

5

6    /*

7     *   导入索引

8     * */

9    @Test

10    public void importIndex() throws IOException {

11        IndexWriter iw = getIndexWriter();

12        /*

13         * 采集原始文档

14         * 创建searchsource文件，放入原始文档文件

15         * */

16        File file = new File(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\searchsource\\test.txt&quot;);

17        String content = readFileContent(file);

18        String fileName = file.getName();

19        String filePath = file.getPath();

20        //  StringField不分词

21        Field fName = new StringField(&quot;fileName&quot;, fileName, YES);

22        Field fcontent = new TextField(&quot;content&quot;, content, YES);

23        // 此处1024是要获取文件大小，本人偷懒请忽略

24        Field fsize = new TextField(&quot;size&quot;, &quot;1024&quot;, YES);

25        Field fpath = new TextField(&quot;path&quot;, filePath, YES);

26        // 创建文档对象

27        Document document = new Document();

28        // 把域加入到文档中

29        document.add(fName);

30        document.add(fcontent);

31        document.add(fsize);

32        document.add(fpath);

33        // 把文档写入到索引库

34        iw.addDocument(document);

35        // 提交

36        iw.commit();

37        iw.close();

38    }

39

40    public IndexWriter getIndexWriter() throws IOException {

41        // 获得索引库的位置

42        // 项目路径下创建索引库的文件夹index_loc

43        Path path = Paths.get(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\index_loc&quot;);

44        //  打开索引库

45        FSDirectory dir = FSDirectory.open(path);

46        // 创建分词器

47        Analyzer al = new IKAnalyzer();

48        // 创建索引的写入的配置对象

49        IndexWriterConfig iwc = new IndexWriterConfig(al);

50        // 创建索引的Writer

51        IndexWriter iw = new IndexWriter(dir, iwc);

52        return iw;

53    }

54

55    //  获取文件内容

56    public String readFileContent(File file) {

57        BufferedReader reader = null;

58        StringBuffer sbf = new StringBuffer();

59        try {

60            reader = new BufferedReader(new FileReader(file));

61            String tempStr;

62            while ((tempStr = reader.readLine()) != null) {

63                sbf.append(tempStr);

64            }

65            reader.close();

66            return sbf.toString();

67        } catch (IOException e) {

68            e.printStackTrace();

69        } finally {

70            if (reader != null) {

71                try {

72                    reader.close();

73                } catch (IOException e1) {

74                    e1.printStackTrace();

75                }

76            }

77        }

78        return sbf.toString();

79    }

80}

81

10.删除索引

（1）删除全部


1
2
3
4
5
6
7
8
9
1// 删除索引

2@Test

3public void deleteIndex() throws IOException {

4    IndexWriter iw = getIndexWriter();

5    iw.deleteAll();

6    iw.commit();

7    iw.close();

8}

9

（2）删除符合条件的索引


1
2
3
4
5
6
7
8
9
10
11
12
1@Test

2public void deleteIndexByQuery() throws IOException {

3    IndexWriter iw = getIndexWriter();

4    // 创建语汇单元项

5    Term term = new Term(&quot;content&quot;, &quot;三&quot;);

6    // 创建根据语汇单元的查询对象

7    TermQuery query = new TermQuery(term);

8    iw.deleteDocuments(query);

9    iw.commit();

10    iw.close();

11}

12

11.分词语汇单元查询

创建查询

对要搜索的信息创建Query查询对象，Lucene会根据Query查询对象生成最终的查询语法，类似关系数据库Sql语法一样Lucene也有自己的查询语法，比如：“name:lucene”表示查询Field的name为“Lucene”的文档信息。

可通过两种方法创建查询对象：

（1）使用Lucene提供Query子类

Query是一个抽象类，Lucene提供了很多查询对象，比如TermQuery项精确查询。NumericRangeQuery数字范围查询等。


1
2
1Query query = new TermQuery(new Term(&quot;name&quot;, &quot;lucene&quot;));

2

（2）使用QueryParse解析插叙表达式

QueryParse会将用户输入的查询表达式解析成Query对象实例


1
2
3
1QueryParse queryParse = new QueryParse(&quot;name&quot;, new IKAnalyzer());

2Query query = queryParse.parse(&quot;name:lucene&quot;);

3

Test类


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
1package zhou;

2

3import org.apache.lucene.analysis.Analyzer;

4import org.apache.lucene.document.Document;

5import org.apache.lucene.document.Field;

6import org.apache.lucene.document.StringField;

7import org.apache.lucene.document.TextField;

8import org.apache.lucene.index.DirectoryReader;

9import org.apache.lucene.index.IndexWriter;

10import org.apache.lucene.index.IndexWriterConfig;

11import org.apache.lucene.index.Term;

12import org.apache.lucene.search.IndexSearcher;

13import org.apache.lucene.search.ScoreDoc;

14import org.apache.lucene.search.TermQuery;

15import org.apache.lucene.search.TopDocs;

16import org.apache.lucene.store.FSDirectory;

17import org.junit.Before;

18import org.junit.Test;

19import org.wltea.analyzer.lucene.IKAnalyzer;

20

21import java.io.BufferedReader;

22import java.io.File;

23import java.io.FileReader;

24import java.io.IOException;

25import java.nio.file.Path;

26import java.nio.file.Paths;

27

28import static org.apache.lucene.document.Field.Store.YES;

29

30public class LuceneTest3 {

31    @Before

32    public void setUp() throws Exception {

33    }

34

35    @Test

36    public void queryIndex() throws IOException {

37        Path path = Paths.get(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\index_loc&quot;);

38        FSDirectory open = FSDirectory.open(path);

39        // 创建索引的读取对象

40        DirectoryReader reader = DirectoryReader.open(open);

41        // 创建索引库的所有对象

42        IndexSearcher is = new IndexSearcher(reader);

43        // 创建语汇单元的对象(查询语汇单元文件名称为test.txt的文件)

44        Term term = new Term(&quot;fileName&quot;, &quot;test.txt&quot;);

45        // 创建分词的语汇查询对象

46        TermQuery tq = new TermQuery(term);

47        // 查询（前多少条）

48        TopDocs result = is.search(tq, 100);

49        // 总记录数

50        int total = (int) result.totalHits;

51        System.out.println(&quot;总记录数是：&quot; + total);

52

53        for (ScoreDoc sd : result.scoreDocs) {

54            // 获得文档的id

55            int id = sd.doc;

56            // 获得文档对象

57            Document doc = is.doc(id);

58            String fileName = doc.get(&quot;fileName&quot;);

59            String size = doc.get(&quot;size&quot;);

60            String content = doc.get(&quot;content&quot;);

61            String path1 = doc.get(&quot;path&quot;);

62

63            System.out.println(&quot;文件名:&quot; + fileName);

64            System.out.println(&quot;大小:&quot; + size);

65            System.out.println(&quot;内容:&quot; + content);

66            System.out.println(&quot;路径:&quot; + path);

67            System.out.println(&quot;-------------------------&quot;);

68        }

69

70    }

71}

72

12.数值范围查询对象

1.NumericRangeQuery

指定数字范围查询，如下：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
1 // 文件大小在0到1024的文件

2@Test

3public void rangeQuery() throws IOException {

4    IndexSearcher is = getDirReader();

5    // 创建数值范围查询对象

6    Query tq = NumericRangeQuery.newLongRange(&quot;size&quot;, 01, 1001, true, true);

7    printDoc(is, tq);

8}

9

10public IndexSearcher getDirReader() throws IOException {

11    Path path = Paths.get(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\index_loc&quot;);

12    FSDirectory open = FSDirectory.open(path);

13    // 创建索引的读取对象

14    DirectoryReader reader = DirectoryReader.open(open);

15    // 创建索引库的所有对象

16    IndexSearcher is = new IndexSearcher(reader);

17    return is;

18}

19

20// 打印结果

21public static void printDoc(IndexSearcher is, Query tq) throws IOException {

22    // 查询（前多少条）

23    TopDocs result = is.search(tq, 100);

24    // 总记录数

25    int total = (int) result.totalHits;

26    System.out.println(&quot;总记录数是：&quot; + total);

27

28    for (ScoreDoc sd : result.scoreDocs) {

29        // 获得文档的id

30        int id = sd.doc;

31        // 获得文档对象

32        Document doc = is.doc(id);

33        String fileName = doc.get(&quot;fileName&quot;);

34        String size = doc.get(&quot;size&quot;);

35        String content = doc.get(&quot;content&quot;);

36        String path1 = doc.get(&quot;path&quot;);

37

38        System.out.println(&quot;文件名:&quot; + fileName);

39        System.out.println(&quot;大小:&quot; + size);

40        System.out.println(&quot;内容:&quot; + content);

41        System.out.println(&quot;路径:&quot; + path1);

42        System.out.println(&quot;-------------------------&quot;);

43    }

44}

45

13.多查询对象联合查询

Lucene全文检索


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
1/*

2 * 多个条件的组合查询

3 * */

4@Test

5public void queryIndex2() throws IOException {

6    IndexSearcher is = getDirReader();

7    // 创建BooleanQuery查询对象,这种查询对象可以控制是&amp; | !

8    BooleanQuery bq = new BooleanQuery();

9    // 创建一个分词的语汇查询对象

10    Query query = new TermQuery(new Term(&quot;fileName&quot;, &quot;test.txt&quot;));

11    Query query1 = new TermQuery(new Term(&quot;content&quot;, &quot;test.txt&quot;));

12    bq.add(query, BooleanClause.Occur.MUST);

13    // SHOULD 可有可无

14    bq.add(query1, BooleanClause.Occur.SHOULD);

15    System.out.println(&quot;查询条件&quot; + bq);

16    printDoc(is, bq);

17}

18

19public IndexSearcher getDirReader() throws IOException {

20    Path path = Paths.get(&quot;D:\\个人文件\\java后端\\Lucene_Demo\\day01\\index_loc&quot;);

21    FSDirectory open = FSDirectory.open(path);

22    // 创建索引的读取对象

23    DirectoryReader reader = DirectoryReader.open(open);

24    // 创建索引库的所有对象

25    IndexSearcher is = new IndexSearcher(reader);

26    return is;

27}

28

29// 打印结果

30public static void printDoc(IndexSearcher is, Query tq) throws IOException {

31    // 查询（前多少条）

32    TopDocs result = is.search(tq, 100);

33    // 总记录数

34    int total = (int) result.totalHits;

35    System.out.println(&quot;总记录数是：&quot; + total);

36

37    for (ScoreDoc sd : result.scoreDocs) {

38        // 获得文档的id

39        int id = sd.doc;

40        // 获得文档对象

41        Document doc = is.doc(id);

42        String fileName = doc.get(&quot;fileName&quot;);

43        String size = doc.get(&quot;size&quot;);

44        String content = doc.get(&quot;content&quot;);

45        String path1 = doc.get(&quot;path&quot;);

46

47        System.out.println(&quot;文件名:&quot; + fileName);

48        System.out.println(&quot;大小:&quot; + size);

49        System.out.println(&quot;内容:&quot; + content);

50        System.out.println(&quot;路径:&quot; + path1);

51        System.out.println(&quot;-------------------------&quot;);

52    }

53}

54

14.解析查询

(1) QueryParse查询


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
1/*

2*  查询条件的解析查询

3*   第一种

4* */

5@Test

6public void queryIndex3() throws IOException, ParseException {

7    IndexSearcher is = getDirReader();

8    IKAnalyzer ik = new IKAnalyzer();

9    // 创建查询解析对象

10    QueryParser parser = new QueryParser(&quot;content&quot;, ik);

11    // 解析查询对象(换言之就是根据如下这句话解析出来后，查询在fileName这个域中的内容)

12    Query query = parser.parse(&quot;我在学习全文检索技术Lucene&quot;);

13    System.out.println(&quot;打印查询条件&quot; + query);

14    printDoc(is, query);

15}

16

17/*

18 *   解析查询

19 *   第二种

20 * */

21@Test

22public void queryIndex4() throws IOException, ParseException {

23    IndexSearcher is = getDirReader();

24    IKAnalyzer ik = new IKAnalyzer();

25    // 创建查询解析对象

26    QueryParser parser = new QueryParser(&quot;content&quot;, ik);

27    // 自己写查询对象条件 AND OR || !

28    Query query = parser.parse(&quot;content: 我 AND 你是 ! 好的&quot;);

29    System.out.println(&quot;打印查询条件&quot; + query);

30    printDoc(is, query);

31}

32

Lucene全文检索

**(2)**多域条件解析查询

MultiFieldQueryParse组合域查询。

通过MultiFieldQueryParse对多个域查询，比如商品信息查询，输入关键字需要从商品名称和商品内容中查询。


1
2
3
4
5
6
7
1// 设置组合查询域

2String[] fields = {&quot;fileName&quot;, &quot;fileContent&quot;};

3// 创建查询解析器

4QueryParse queryParse = new MultiFieldQueryParse(fields, new IKAnalyzer());

5// 查询文件名、文件内容包括“java”关键字的文档

6Query query = queryParse.parse(&quot;java&quot;);

7

Test类：


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1/*

2 *   多域条件解析查询

3 * */

4@Test

5public void multiFieldQuery() throws IOException, ParseException {

6    IndexSearcher is = getDirReader();

7    IKAnalyzer ik = new IKAnalyzer();

8

9    String[] fields = {&quot;fileName&quot;, &quot;content&quot;};

10    MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, ik);

11    Query query = parser.parse(&quot;我在学习全文检索技术Lucene&quot;);

12

13    System.out.println(&quot;打印查询条件&quot; + query);

14    printDoc(is, query);

15}

16

Lucene全文检索

{{userData.name}}已认证

Demo: Maven+Lucene

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

{{userData.name}}已认证

Demo: Maven+Lucene

Related posts:

OpenSSH-8.7p1离线升级修复安全漏洞

设计模式的设计原则

SSH+Lucene实战开发视频教程

Lucene.Net 2.3.1开发介绍 —— 二、分词（六）

Hadoop各商业发行版之比较

用Hadoop构建电影推荐系统