重新索引原理
更改分析器,升级es都需要重新索引数据,所以ES重新索引是需要重视的一个功能.
参考: https://www.daimajiaoliu.com/daima/4ed62ea791003fc (教你如何在 elasticsearch 中重建索引)
重新索引准备
我的是使用hanlp的分析器,自己根据自己所需修改参数.
注: number_of_replicas 设置为0,这会加快重新索引
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29 1PUT /new_index
2{
3 "settings": {
4 "number_of_shards": 12,
5 "number_of_replicas": 0,
6 "refresh_interval" : -1,
7 "analysis": {
8 "analyzer": {
9 "caseSensitive": {
10 "filter": "lowercase",
11 "type": "custom",
12 "tokenizer": "keyword",
13 "ignore_above": 256
14 },
15 "my_hanlp_analyzer": {
16 "filter": "lowercase",
17 "tokenizer": "my_hanlp"
18 }
19 },
20 "tokenizer": {
21 "my_hanlp": {
22 "type": "hanlp",
23 "enable_stop_dictionary": true
24 }
25 }
26 }
27 }
28}
29
修改相应属性
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49 1PUT /new_index/_mapping
2{
3 "properties":{
4 "title":{
5 "type":"text",
6 "analyzer": "my_hanlp_analyzer"
7 },
8 "summary":{
9 "type":"text",
10 "analyzer": "my_hanlp_analyzer"
11 },
12 "key_words":{
13 "type":"text",
14 "analyzer": "caseSensitive"
15 },
16 "content":{
17 "type":"text",
18 "analyzer": "my_hanlp_analyzer"
19 },
20 "id":{
21 "type": "long"
22 },
23 "con_md5": {
24 "type": "keyword"
25 },
26 "portrait": {
27 "type": "keyword"
28 },
29 "url": {
30 "type": "keyword"
31 },
32 "generate":{
33 "type": "integer"
34 },
35 "contype":{
36 "type": "integer"
37 },
38 "extra": {
39 "type": "keyword"
40 },
41 "images": {
42 "type": "keyword"
43 },
44 "codes": {
45 "type": "text"
46 }
47 }
48}
49
如果一直运行服务,可以通过使用别名(alias)来访问索引(index)
1
2
3
4
5
6
7
8
9
10
11
12 1POST /_aliases
2{
3 "actions": [
4 {
5 "add": {
6 "index": "old_index", // 原有索引
7 "alias": "old_index_latest" // 服务的别名
8 }
9 }
10 ]
11}
12
重新索引脚本
同一机器版本
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61 1#!/bin/bash
2
3if [ "$1" == "" ] || [ "$2" == "" ]; then
4 echo "Usage: ./reindex.sh [OLD_INDEX] [NEW_INDEX] [LOCAL_HOST:LOCAL_PORT]"
5 exit 1
6fi
7
8OLD_INDEX=$1
9NEW_INDEX=$2
10if [ "$3" == "" ]; then
11 LOCAL_HOST="localhost:9200"
12else
13 LOCAL_HOST=$3
14fi
15
16echo "---------------------------- NOTICE ----------------------------------"
17echo "You must ensur you have the following setting in your local ES host's:"
18echo "elasticsearch.yml config (the one re-indexing to):"
19echo " reindex.remote.whitelist: $REMOTE_HOST"
20echo "Also, if an index template is necessary for this data, you must create"
21echo "locally before you start the re-indexing process"
22echo "----------------------------------------------------------------------"
23sleep 3
24
25 TOTAL_DOCS_REMOTE=$(curl --silent "http://$LOCAL_HOST/_cat/indices/$OLD_INDEX?h=docs.count")
26 echo "Attempting to re-indexing $OLD_INDEX ($TOTAL_DOCS_REMOTE docs total) from remote ES server..."
27 SECONDS=0
28 curl -H "Content-Type: application/json" -XPOST "http://$LOCAL_HOST/_reindex?wait_for_completion=true&pretty=true" -d "{
29 \"conflicts\": \"proceed\",
30 \"source\": {
31 \"index\": \"${OLD_INDEX}\"
32 },
33 \"dest\": {
34 \"index\": \"${NEW_INDEX}\"
35 }
36 }"
37
38 duration=$SECONDS
39
40 LOCAL_INDEX_EXISTS=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "http://$LOCAL_HOST/$OLD_INDEX")
41 if [ "$LOCAL_INDEX_EXISTS" == "200" ]; then
42 TOTAL_DOCS_REINDEXED=$(curl --silent "http://$LOCAL_HOST/_cat/indices/$NEW_INDEX?h=docs.count")
43 else
44 TOTAL_DOCS_REINDEXED=0
45 fi
46
47 echo " Re-indexing results:"
48 echo " -> Time taken: $(($duration / 60)) minutes and $(($duration % 60)) seconds"
49 echo " -> Docs indexed: $TOTAL_DOCS_REINDEXED out of $TOTAL_DOCS_REMOTE"
50 echo ""
51
52 TOTAL_DURATION=$(($TOTAL_DURATION+$duration))
53
54 if [ "$TOTAL_DOCS_REMOTE" -ne "$TOTAL_DOCS_REINDEXED" ]; then
55 echo " INCOMPPLET $TOTAL_DOCS_REMOTE not equal $TOTAL_DOCS_REINDEXED"
56 else
57 echo " "
58 fi
59
60
61
用法:
1
2 1./reindex2.sh old_index new_index 192.168.1.155:9200 2>&1 > ./_redeinx.log &
2
不同机器版本:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83 1#!/bin/bash
2
3if [ "$1" == "" ] || [ "$2" == "" ]; then
4 echo "Usage: ./reindex.sh [REMOTE_HOST:REMOTE_PORT] [INDEX_PATTERN] [LOCAL_HOST:LOCAL_PORT]"
5 exit 1
6fi
7
8REMOTE_HOST=$1
9PATTERN=$2
10if [ "$3" == "" ]; then
11 LOCAL_HOST="localhost:9200"
12else
13 LOCAL_HOST=$3
14fi
15
16echo "---------------------------- NOTICE ----------------------------------"
17echo "You must ensur you have the following setting in your local ES host's:"
18echo "elasticsearch.yml config (the one re-indexing to):"
19echo " reindex.remote.whitelist: $REMOTE_HOST"
20echo "Also, if an index template is necessary for this data, you must create"
21echo "locally before you start the re-indexing process"
22echo "----------------------------------------------------------------------"
23sleep 3
24
25INDICES=$(curl --silent "$REMOTE_HOST/_cat/indices/$PATTERN?h=index")
26TOTAL_INCOMPLETE_INDICES=0
27TOTAL_INDICES=0
28TOTAL_DURATION=0
29INCOMPLETE_INDICES=()
30
31for INDEX in $INDICES; do
32
33 TOTAL_DOCS_REMOTE=$(curl --silent "http://$REMOTE_HOST/_cat/indices/$INDEX?h=docs.count")
34 echo "Attempting to re-indexing $INDEX ($TOTAL_DOCS_REMOTE docs total) from remote ES server..."
35 SECONDS=0
36 curl -H "Content-Type: application/json" -XPOST "http://$LOCAL_HOST/_reindex?wait_for_completion=true&pretty=true" -d "{
37 \"conflicts\": \"proceed\",
38 \"source\": {
39 \"remote\": {
40 \"host\": \"http://$REMOTE_HOST\"
41 },
42 \"index\": \"${INDEX}\"
43 },
44 \"dest\": {
45 \"index\": \"${INDEX}\"
46 }
47 }"
48
49 duration=$SECONDS
50
51 LOCAL_INDEX_EXISTS=$(curl -o /dev/null --silent --head --write-out '%{http_code}' "http://$LOCAL_HOST/$INDEX")
52 if [ "$LOCAL_INDEX_EXISTS" == "200" ]; then
53 TOTAL_DOCS_REINDEXED=$(curl --silent "http://$LOCAL_HOST/_cat/indices/$INDEX?h=docs.count")
54 else
55 TOTAL_DOCS_REINDEXED=0
56 fi
57
58 echo " Re-indexing results:"
59 echo " -> Time taken: $(($duration / 60)) minutes and $(($duration % 60)) seconds"
60 echo " -> Docs indexed: $TOTAL_DOCS_REINDEXED out of $TOTAL_DOCS_REMOTE"
61 echo ""
62
63 TOTAL_DURATION=$(($TOTAL_DURATION+$duration))
64
65 if [ "$TOTAL_DOCS_REMOTE" -ne "$TOTAL_DOCS_REINDEXED" ]; then
66 TOTAL_INCOMPLETE_INDICES=$(($TOTAL_INCOMPLETE_INDICES+1))
67 INCOMPLETE_INDICES+=($INDEX)
68 fi
69
70 TOTAL_INDICES=$((TOTAL_INDICES+1))
71
72done
73
74echo "---------------------- STATS --------------------------"
75echo "Total Duration of Re-Indexing Process: $((TOTAL_DURATION / 60))m $((TOTAL_DURATION % 60))"
76echo "Total Indices: $TOTAL_INDICES"
77echo "Total Incomplete Re-Indexed Indices: $TOTAL_INCOMPLETE_INDICES"
78if [ "$TOTAL_INCOMPLETE_INDICES" -ne "0" ]; then
79 printf '%s\n' "${INCOMPLETE_INDICES[@]}"
80fi
81echo "-------------------------------------------------------"
82echo ""
83
用法(可以用作迁移数据):
1
2 1./reindex.sh old_index 192.168.1.155:9200 192.168.1.144:9200 2>&1 > ./_redeinx.log &
2
重建索引之后
重新修改别名
1
2
3
4
5
6
7
8
9
10
11 1POST _aliases
2{
3 "actions": [{"add": {
4 "index": "new_index",
5 "alias": "old_index_latest"
6 }}, {"remove": {
7 "index": "old_index",
8 "alias": "old_index_latest"
9 }}]
10}
11
删除旧索引
1
2 1DELETE old_index
2
我的直接修改别名解决方法: 我的索引服务是可以停止,所以我到最后修改别名匹配原来的旧索引名字
1
2
3
4
5
6
7
8
9
10
11
12 1POST /_aliases
2{
3 "actions": [
4 {
5 "add": {
6 "index": "new_index", // 新索引
7 "alias": "old_index" // 旧索引的名字
8 }
9 }
10 ]
11}
12
恢复设置
1
2
3
4
5
6
7
8 1PUT /new_index/_settings
2{
3 "index" : {
4 "number_of_replicas" : 2,
5 "refresh_interval" : null
6 }
7}
8