本章是《Docker下RabbitMQ四部曲》系列的终篇,今天的我们一起来体验Rabbit’MQ集群的高可用能力,看看RabbitMQ集群中的部分节点宕机时,是否还能生产和消费消息;
实战概要
今天实战的步骤如下:
- 制作docker-compose.yml文件,为每个容器配置好参数;
- 启动所有容器,包括RabbitMQ集群、消息生产者的web应用、消息消费者的web应用;
- 逐个停止集群中的RabbitMQ容器,每停止一个,就验证一次消息的生产和消费;
- 逐个恢复集群中的RabbitMQ容器,每恢复一个,就验证一次消息的生产和消费;
制作docker-compose.yml文件
本次实战会创建6个容器,整理如下:
hacluster_rabbit1_1
172.19.0.2
RabbitMQ主节点
hacluster_rabbit2_1
172.19.0.3
RabbitMQ从节点,内存节点
hacluster_rabbit3_1
172.19.0.4
RabbitMQ从节点
hacluster_producer_1
172.19.0.5
web应用,负责生产消息
hacluster_consumer1_1
172.19.0.6
web应用,负责消费消息
hacluster_consumer2_1
172.19.0.7
web应用,负责消费消息
前面章节的实战中,我们也创建了上述六个容器,今天依然是六个,并且身份角色不变,变化的地方主要是以下三点:
- 负责生产消息的hacluster_producer_1容器,前面章节只连接了一个RabbitMQ容器,本章会连接三个;
- 负责消费消息的hacluster_consumer1_1,前面章节只连接了一个RabbitMQ容器,本章会连接三个;
- 负责消费消息的hacluster_consumer2_1,前面章节只连接了一个RabbitMQ容器,本章会连接三个;
基于以上总结,我们写出的docker-compose.yml文件内容如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81 1version: '2'
2services:
3 rabbit1:
4 image: bolingcavalry/rabbitmq-server:0.0.3
5 hostname: rabbit1
6 ports:
7 - "15672:15672"
8 environment:
9 - RABBITMQ_DEFAULT_USER=admin
10 - RABBITMQ_DEFAULT_PASS=888888
11 rabbit2:
12 image: bolingcavalry/rabbitmq-server:0.0.3
13 hostname: rabbit2
14 depends_on:
15 - rabbit1
16 links:
17 - rabbit1
18 environment:
19 - CLUSTERED=true
20 - CLUSTER_WITH=rabbit1
21 - RAM_NODE=true
22 - HA_ENABLE=true
23 ports:
24 - "15673:15672"
25 rabbit3:
26 image: bolingcavalry/rabbitmq-server:0.0.3
27 hostname: rabbit3
28 depends_on:
29 - rabbit2
30 links:
31 - rabbit1
32 - rabbit2
33 environment:
34 - CLUSTERED=true
35 - CLUSTER_WITH=rabbit1
36 ports:
37 - "15675:15672"
38 producer:
39 image: bolingcavalry/rabbitmqproducer:0.0.2-SNAPSHOT
40 hostname: producer
41 depends_on:
42 - rabbit3
43 links:
44 - rabbit1:rabbitmqhost1
45 - rabbit2:rabbitmqhost2
46 - rabbit3:rabbitmqhost3
47 ports:
48 - "18080:8080"
49 environment:
50 - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672
51 - mq.rabbit.username=admin
52 - mq.rabbit.password=888888
53 consumer1:
54 image: bolingcavalry/rabbitmqconsumer:0.0.5-SNAPSHOT
55 hostname: consumer1
56 depends_on:
57 - producer
58 links:
59 - rabbit1:rabbitmqhost1
60 - rabbit2:rabbitmqhost2
61 - rabbit3:rabbitmqhost3
62 environment:
63 - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672
64 - mq.rabbit.username=admin
65 - mq.rabbit.password=888888
66 - mq.rabbit.queue.name=consumer1.queue
67 consumer2:
68 image: bolingcavalry/rabbitmqconsumer:0.0.5-SNAPSHOT
69 hostname: consumer2
70 depends_on:
71 - consumer1
72 links:
73 - rabbit1:rabbitmqhost1
74 - rabbit2:rabbitmqhost2
75 - rabbit3:rabbitmqhost3
76 environment:
77 - mq.rabbit.address=rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672
78 - mq.rabbit.username=admin
79 - mq.rabbit.password=888888
80 - mq.rabbit.queue.name=consumer2.queue
81
以上的docker-compose.yml文件,有以下两点需要注意:
- rabbit2:增加了一个环境变量
HA_ENABLE=true,《Docker下RabbitMQ四部曲之二:细说RabbitMQ镜像制作》一文中分析镜像制作的时候曾提到过,容器创建时startrabbit.sh脚本中会检查这个环境变量,如果为true,就会执行命令:
rabbitmqctl set_policy HA ‘^(?!amq.).*’ ‘{“ha-mode”: “all”}’,该命令会将队列设置为镜像模式,在三个Rabbit MQ之间同步;
2. producer、consumer1、consumer3这三个容器的环境变量
mq.rabbit.address,都设置成了三个RabbitMQ容器的地址加端口:
rabbitmqhost1:5672,rabbitmqhost2:5672,rabbitmqhost3:5672;
启动所有容器
在刚刚创建的docker-compose.yml文件所在目录下执行命令
docker-compose up -d,即可创建所有容器,创建完成后执行以下操作来确认是否启动成功:
- 例如我的电脑IP地址是
192.168.119.155,那么在浏览器输入地址:192.168.119.155:15672即可访问RabbitMQ的管理页面,用户名:admin,密码:888888,如下图:
2. 点击”Exchanges”这个Tab页,如下图,看到交换机创建成功,HA模式:
3. 点击”Queues”这个Tab页,如下图,看到两个队列创建成功,HA模式:
4. 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,即可控制容器hacluster_producer_1 生产一条消息,如下图:
5. 在控制台输入
docker logs -f hacluster_consumer1_1,即可看到hacluster_consumer1_1 消费消息的日志,如下:
1
2
3 12018-05-19 11:21:44.217 INFO 1 --- [ main] c.b.r.RabbitmqconsumerApplication : Started RabbitmqconsumerApplication in 29.099 seconds (JVM running for 39.398)
22018-05-19 11:36:21.332 INFO 1 --- [cTaskExecutor-1] c.b.r.receiver.FanoutReceiver : receive message : hello, aaa, bbb, 2018-05-19 11:36:21
3
看来整个RabbitMQ集群的生产和消费是没有问题的,接下来通过停止容器的方式来模拟生产环境的宕机;
逐个停止集群中的RabbitMQ容器
- 先停hacluster_rabbit1_1 ,执行命令
docker stop hacluster_rabbit1_1,如下:
1
2
3 1root@maven:~# docker stop hacluster_rabbit1_1
2hacluster_rabbit1_1
3
- 去管理页面看看,由于hacluster_rabbit1_1容器已经停止了,所以我们要访问hacluster_rabbit2_1容器提供的web页面:http://192.168.119.155:15673,如下图红框,可以看见页面提示节点故障:
- 交换机和队列的页面并无异常;
- 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,尝试发送一条消息,响应时间明显变长,但是依然会返回操作成功;
- 在控制台输入
docker logs -f hacluster_producer_1,查看生产消息的web容器的日志,如下:
1
2
3
4
5 12018-05-19 11:43:22.681 WARN 1 --- [172.19.0.2:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
22018-05-19 11:43:22.703 ERROR 1 --- [172.19.0.2:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
32018-05-19 11:53:31.836 INFO 1 --- [io-8080-exec-10] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
42018-05-19 11:53:46.878 INFO 1 --- [io-8080-exec-10] o.s.a.r.c.CachingConnectionFactory : Created new connection: connectionFactory#4ae3c1cd:1/SimpleConnection@44028da7 [delegate=amqp://admin@172.19.0.3:5672/, localPort= 37818]
5
从日志中可以清晰的看到,停止hacluster_rabbit1_1容器是,消息生产者会立即报异常,但是不会自动重连,等到发送消息的时候,才会连接到新的RabbitMQ,这次连接的是hacluster_rabbit2_1 ;
6. 在控制台输入
docker logs -f hacluster_consumer1_1,查看消费消息的web容器的日志,如下:
1
2
3
4
5
6
7
8 12018-05-19 11:38:14.945 INFO 1 --- [cTaskExecutor-1] c.b.r.receiver.FanoutReceiver : receive message : hello, aaa, bbb, 2018-05-19 11:38:14
22018-05-19 11:43:22.672 WARN 1 --- [172.19.0.2:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
32018-05-19 11:43:22.726 ERROR 1 --- [172.19.0.2:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
42018-05-19 11:43:23.163 INFO 1 --- [cTaskExecutor-1] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@9f116cc: tags=[{amq.ctag-0csUBn5OQiTGEphcGI2p3A=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.2:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@5a1a52da [delegate=amqp://admin@172.19.0.2:5672/, localPort= 34240], acknowledgeMode=AUTO local queue size=0
52018-05-19 11:43:23.181 INFO 1 --- [cTaskExecutor-2] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
62018-05-19 11:43:29.042 INFO 1 --- [cTaskExecutor-2] o.s.a.r.c.CachingConnectionFactory : Created new connection: connectionFactory#45f45fa1:1/SimpleConnection@2b9e231d [delegate=amqp://admin@172.19.0.3:5672/, localPort= 49624]
72018-05-19 11:53:46.899 INFO 1 --- [cTaskExecutor-2] c.b.r.receiver.FanoutReceiver : receive message : hello, aaa, bbb, 2018-05-19 11:53:31
8
从日志上可以看出:
RabbitMQ宕机的时候,消费者会立即重连到集群中的其他机器;(日志关键字:Created new connection)
7. 停掉RabbitMQ集群中的第二个容器,执行命令
docker stop hacluster_rabbit2_1;
8. 访问管理页面的时候,要输入容器hacluster_rabbit3_1的地址:http://192.168.119.155:15675,基本情况如下图,两个节点的问题都能看到:
9. 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,尝试发送一条消息,响应时间再次明显变长,但是依然会返回操作成功;
10. 在控制台输入
docker logs -f hacluster_producer_1,查看生产消息的web容器的日志,如下,提示重连成功,这次连接到了容器hacluster_rabbit3_1 :
1
2
3
4
5
6 12018-05-19 12:07:45.322 WARN 1 --- [172.19.0.3:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
22018-05-19 12:07:45.334 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
32018-05-19 12:07:45.336 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
42018-05-19 12:12:06.404 INFO 1 --- [nio-8080-exec-4] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
52018-05-19 12:12:41.467 INFO 1 --- [nio-8080-exec-4] o.s.a.r.c.CachingConnectionFactory : Created new connection: connectionFactory#4ae3c1cd:2/SimpleConnection@6d23e50 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 54310]
6
- 在控制台输入
docker logs -f hacluster_consumer1_1,查看消费消息的web容器的日志,如下:
1
2
3
4
5
6
7
8 12018-05-19 12:07:45.327 WARN 1 --- [172.19.0.3:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
22018-05-19 12:07:45.346 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
32018-05-19 12:07:45.348 ERROR 1 --- [172.19.0.3:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
42018-05-19 12:07:45.427 INFO 1 --- [cTaskExecutor-2] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@317c5a8a: tags=[{amq.ctag-ZKT8Q4gcU9v7bA-lNOUEFQ=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.3:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@2b9e231d [delegate=amqp://admin@172.19.0.3:5672/, localPort= 49624], acknowledgeMode=AUTO local queue size=0
52018-05-19 12:07:45.432 INFO 1 --- [cTaskExecutor-3] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
62018-05-19 12:08:07.352 INFO 1 --- [cTaskExecutor-3] o.s.a.r.c.CachingConnectionFactory : Created new connection: connectionFactory#45f45fa1:2/SimpleConnection@71dadbb0 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 34416]
72018-05-19 12:12:56.869 INFO 1 --- [cTaskExecutor-3] c.b.r.receiver.FanoutReceiver : receive message : hello, aaa, bbb, 2018-05-19 12:12:06
8
日志显示,也是连到了容器hacluster_rabbit3_1 ,并且消费消息成功;
12. 停掉RabbitMQ集群中的第三个容器(也是最后一个),执行命令
docker stop hacluster_rabbit3_1;
13. 这次没有管理页面看了……
14. 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,尝试发送一条消息,长时间等待后,页面提示错误如下图:
15. 查看容器hacluster_producer_1的日志,如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 12018-05-19 12:18:27.812 WARN 1 --- [172.19.0.4:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
22018-05-19 12:18:27.813 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
32018-05-19 12:18:27.813 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
42018-05-19 12:18:55.836 INFO 1 --- [nio-8080-exec-7] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
52018-05-19 12:19:50.921 ERROR 1 --- [nio-8080-exec-7] o.a.c.c.C.[.[.[/].[dispatcherServlet] : Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Request processing failed; nested exception is org.springframework.amqp.AmqpIOException: java.net.UnknownHostException: rabbitmqhost3] with root cause
6
7java.net.UnknownHostException: rabbitmqhost3
8 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184) ~[na:1.8.0_111]
9 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[na:1.8.0_111]
10 at java.net.Socket.connect(Socket.java:589) ~[na:1.8.0_111]
11 at com.rabbitmq.client.impl.SocketFrameHandlerFactory.create(SocketFrameHandlerFactory.java:60) ~[amqp-client-5.1.2.jar!/:5.1.2]
12 at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:955) ~[amqp-client-5.1.2.jar!/:5.1.2]
13 at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:907) ~[amqp-client-5.1.2.jar!/:5.1.2]
14 at com.rabbitmq.client.ConnectionFactory.newConnection(ConnectionFactory.java:847) ~[amqp-client-5.1.2.jar!/:5.1.2]
15 at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:449) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
16 at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:614) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
17 at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createBareChannel(CachingConnectionFactory.java:564) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
18 at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getCachedChannelProxy(CachingConnectionFactory.java:538) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
19 at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.getChannel(CachingConnectionFactory.java:520) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
20......
21...
22
如上所示,异常信息为连接RabbitMQ服务器失败;
16. 查看容器hacluster_consumer1_1的日志,如下:
1
2
3
4
5
6
7
8
9
10
11
12 12018-05-19 12:18:27.815 WARN 1 --- [172.19.0.4:5672] c.r.c.impl.ForgivingExceptionHandler : An unexpected connection driver error occured (Exception message: Connection reset)
22018-05-19 12:18:27.816 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
32018-05-19 12:18:27.816 ERROR 1 --- [172.19.0.4:5672] o.s.a.r.c.CachingConnectionFactory : Channel shutdown: connection error
42018-05-19 12:18:28.100 INFO 1 --- [cTaskExecutor-3] o.s.a.r.l.SimpleMessageListenerContainer : Restarting Consumer@5b0307b0: tags=[{amq.ctag-0UhQ6jE-D5Wl2ZPl4EWhDQ=consumer1.queue}], channel=Cached Rabbit Channel: PublisherCallbackChannelImpl: AMQChannel(amqp://admin@172.19.0.4:5672/,1), conn: Proxy@42dd311 Shared Rabbit Connection: SimpleConnection@71dadbb0 [delegate=amqp://admin@172.19.0.4:5672/, localPort= 34416], acknowledgeMode=AUTO local queue size=0
52018-05-19 12:18:28.104 INFO 1 --- [cTaskExecutor-4] o.s.a.r.c.CachingConnectionFactory : Attempting to connect to: [rabbitmqhost1:5672, rabbitmqhost2:5672, rabbitmqhost3:5672]
62018-05-19 12:19:23.178 ERROR 1 --- [cTaskExecutor-4] o.s.a.r.l.SimpleMessageListenerContainer : Failed to check/redeclare auto-delete queue(s).
7
8org.springframework.amqp.AmqpIOException: java.net.UnknownHostException: rabbitmqhost3
9 at org.springframework.amqp.rabbit.support.RabbitExceptionTranslator.convertRabbitAccessException(RabbitExceptionTranslator.java:71) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
10 at org.springframework.amqp.rabbit.connection.AbstractConnectionFactory.createBareConnection(AbstractConnectionFactory.java:476) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
11 at org.springframework.amqp.rabbit.connection.CachingConnectionFactory.createConnection(CachingConnectionFactory.java:614) ~[spring-rabbit-2.0.3.RELEASE.jar!/:2.0.3.RELEASE]
12
如上所示,也是连接失败,并且,日志的最后会发现应用在自动尝试重新连接RabbitMQ;
至此,RabbitMQ集群宕机模拟就完成了,结果说明在HA模式下,只要还有可用的节点,应用就会尝试连接,如果连接成功,消息的消费是不受影响的;
目前RabbitMQ集群的所有容器都停掉了,接下来我们逐个恢复刚才停下来的容器,看看服务能否恢复;
逐个恢复集群中的RabbitMQ容器
- 先恢复hacluster_rabbit1_1,执行命令
docker start hacluster_rabbit1_1;
- 执行命令
docker logs -f hacluster_rabbit1_1,查看容器日志,发现一直停留在下面的位置,不再更新:
- 浏览器访问管理页面:http://192.168.119.155:15672,结果页面无法打开;
- 在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,尝试发送一条消息,会显示报错页面;
- 去看生产者和消费者容器的日志,发现都是连接RabbitMQ失败的错误;
- 从上述现象可以发现:恢复集群服务时,只恢复一台机器是不够的;
- 再恢复hacluster_rabbit2_1,执行命令
docker start hacluster_rabbit12_1;
- 管理页面依旧不可用,发送消息失败,并且生产者和消费者容器都无法连接到RabbitMQ容器上去;
- 再恢复hacluster_rabbit3_1,执行命令
docker start hacluster_rabbit3_1,这样整个集群的所有容器都恢复了;
-
hacluster_rabbit1_1的日志发生了变化,如下:
-
hacluster_rabbit2_1的日志也更新了,如下:
-
管理页面可以正常打开,并且显示三个节点都正常了:
-
在浏览器输入:http://192.168.119.155:18080/send/aaa/bbb,尝试发送一条消息,消息的生产和消费都正常了;
至此,RabbitMQ的高可用实战已经完成了,从宕机到恢复我们都试了一遍,对RabbitMQ集群也有了更多直观的了解;