一 应用场景描述
线上业务使用RabbitMQ作为消息队列中间件,那么作为运维人员对RabbitMQ的监控就很重要,本文就针对如何从头到尾使用Zabbix来监控RabbitMQ进行说明。
二 RabbitMQ监控要点
RabbitMQ官方提供两种方法来管理和监控RabbitMQ。
1.使用rabbitmqctl管理和监控
Usage:
rabbitmqctl [-n <node>] [-q] <command> [<command options>]
查看虚拟主机
# rabbitmqctl list_vhosts
查看队列
# rabbitmqctl list_queues
查看exchanges
# rabbitmqctl list_exchanges
查看用户
# rabbitmqctl list_users
查看连接
# rabbitmqctl list_connections
查看消费者信息
# rabbitmqctl list_consumers
查看环境变量
# rabbitmqctl environment
查看未被确认的队列
# rabbitmqctl list_queues name messages_unacknowledged
查看单个队列的内存使用
# rabbitmqctl list_queues name memory
查看准备就绪的队列
# rabbitmqctl list_queues name messages_ready
2.使用RabbitMQ Management插件来监控和管理
开启Management插件
# rabbitmq-plugins enable rabbitmq_management
通过这样的网址访问可以看到RabbitMQ的状态
http://172.28.2.157:15672/cli/rabbitmqadmin
下载rabbitmqadmin管理工具
获取vhost列表
# curl -i -u guest:guest http://localhost:15672/api/vhosts
获取频道列表,限制显示格式
# curl -i -u guest:guest "http://localhost:15672/api/channels?sort=message_stats.publish_details.rate&sort_reverse=true&columns=name,message_stats.publish_details.rate,message_stats.deliver_get_details.rate"
显示概括信息
# curl -i -u guest:guest "http://localhost:15672/api/overview"
management_version 管理插件版本
cluster_name 整个RabbitMQ集群的名称,通过rabbitmqctl set_cluster_name 进行设置
publish 发布的消息总数
queue_totals 显示准备就绪的消息,未确认的消息,未提交的消息等
statistics_db_event_queue 显示还未必数据库处理的事件数量
consumers 消费者个数
queues 队列长度
exchanges 队列交换机的数量
connections 连接数
channels 频道数量
显示节点信息
# curl -i -u guest:guest "http://localhost:15672/api/nodes"
disk_free 磁盘剩余空间,以字节表示
disk_free_limit 磁盘报警的阀值
fd_used 使用掉的文件描述符数量
fd_total 可用的文件描述符数量
io_read_avg_time 读操作平均时间,毫秒为单位
io_read_bytes 总共读入磁盘数据大小,以字节为单位
io_read_count 总共读操作的数量
io_seek_avg_time seek操作的平均时间,毫秒单位
io_seek_count seek操作总量
io_sync_avg_time fsync操作的平均时间,毫秒为单位
io_sync_count fsync操作的总量
io_write_avg_time 每个磁盘写操作的平均时间,毫秒为单位
io_write_bytes 写入磁盘数据总量,以字节为单位
io_write_count 磁盘写操作总量
mem_used 内存使用字节
mem_limit 内存报警阀值,默认是总的物理内存的40%
mnesia_disk_tx_count 需要写入到磁盘的Mnesia事务的数量
mnesia_ram_tx_count 不需要写入到磁盘的Mnesia事务的数量
msg_store_write_count 写入到消息存储的消息数量
msg_store_read_count 从消息存储读入的消息数量
proc_used Erlang进程的使用数量
proc_total Erlang进程的最大数量
queue_index_journal_write_count 写入到队列索引日志的记录数量。每条记录表示一个被发布到队列,从消息队列中被投递出或者在消息队列中被q确认的消息
queue_index_read_count 从队列索引读出的记录数量
queue_index_write_count 写入到队列索引的记录数量
sockets_used 以socket方式使用掉的文件描述符数量
partitions
uptime 自从Erlang VM启动时,运行的时间,单位好毫秒
run_queue 等待运行的Erlang进程数量
processors 检测到被Erlang进程使用到的内核数量
net_ticktime 当前设置的内核tick time
查看频道信息
# curl -i -u guest:guest "http://localhost:15672/api/channels"
查看交换机信息
# curl -i -u guest:guest "http://localhost:15672/api/exchanges"
查看队列信息
# curl -i -u guest:guest "http://localhost:15672/api/queues"
查看vhosts信息
# curl -i -u guest:guest "http://localhost:15672/api/vhosts/?name=/"
三 编写监控脚本和添加Zabbix配置文件
监控脚本主要包括三个部分,监控overview,监控当前主机的节点信息,还有监控各个队列
根据网上的脚本进行了修改,新增加了很多监控项目,把原来脚本中的filter去掉了
这里顺便提一下,对于网上的各种代码,不能拿来就用,要结合自身的需求对代码进行分析,也可以提升自己的编码能力,如果只是一味地拿来就用,那永远也得不到提高。
rabbitmq_status.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228 1#!/usr/bin/env /usr/bin/python
2'''Python module to query the RabbitMQ Management Plugin REST API and get
3results that can then be used by Zabbix.
4https://github.com/jasonmcintosh/rabbitmq-zabbix
5'''
6'''
7 This script is tested on RabbitMQ 3.5.3
8'''
9import json
10import optparse
11import socket
12import urllib2
13import subprocess
14import tempfile
15import os
16import logging
17
18logging.basicConfig(filename='/opt/logs/zabbix/rabbitmq_zabbix.log', level=logging.WARNING, format='%(asctime)s %(levelname)s: %(message)s')
19
20class RabbitMQAPI(object):
21 '''Class for RabbitMQ Management API'''
22
23 def __init__(self, user_name='guest', password='guest', host_name='',
24 protocol='http', port=15672, conf='/opt/app/zabbix/conf/zabbix_agentd.conf', senderhostname=None):
25 self.user_name = user_name
26 self.password = password
27 self.host_name = host_name or socket.gethostname()
28 self.protocol = protocol
29 self.port = port
30 self.conf = conf or '/opt/app/zabbix/conf/zabbix_agentd.conf'
31 self.senderhostname = senderhostname if senderhostname else host_name
32
33 def call_api(self, path):
34 '''
35 All URIs will server only resource of type application/json,and will require HTTP basic authentication. The default username and password is guest/guest. /%sf is encoded for the default virtual host '/'
36 '''
37 url = '{0}://{1}:{2}/api/{3}'.format(self.protocol, self.host_name, self.port, path)
38 password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
39 password_mgr.add_password(None, url, self.user_name, self.password)
40 handler = urllib2.HTTPBasicAuthHandler(password_mgr)
41 logging.debug('Issue a rabbit API call to get data on ' + path)
42######## json.loads() transfer json data to python data
43######## json.dump() transfer python data to json data
44 return json.loads(urllib2.build_opener(handler).open(url).read())
45
46 def list_queues(self):
47 ''' curl -i -u guest:guest http://localhost:15672/api/queues
48 return a list
49 '''
50 queues = []
51 for queue in self.call_api('queues'):
52 logging.debug("Discovered queue " + queue['name'])
53 element = {'{#VHOSTNAME}': queue['vhost'],
54 '{#QUEUENAME}': queue['name']
55 }
56 queues.append(element)
57 logging.debug('Discovered queue '+queue['vhost']+'/'+queue['name'])
58 return queues
59
60 def list_nodes(self):
61 '''Lists all rabbitMQ nodes in the cluster'''
62 nodes = []
63 for node in self.call_api('nodes'):
64 # We need to return the node name, because Zabbix
65 # does not support @ as an item parameter
66 name = node['name'].split('@')[1]
67 element = {'{#NODENAME}': name,
68 '{#NODETYPE}': node['type']}
69 nodes.append(element)
70 logging.debug('Discovered nodes '+name+'/'+node['type'])
71 return nodes
72
73 def check_queue(self):
74 '''Return the value for a specific item in a queue's details.'''
75 return_code = 0
76 #### use tempfile module to create a file on memory, will not be deleted when it is closed , because 'delete' argument is set to False
77 rdatafile = tempfile.NamedTemporaryFile(delete=False)
78 for queue in self.call_api('queues'):
79 self._get_queue_data(queue, rdatafile)
80 rdatafile.close()
81 return_code = self._send_queue_data(rdatafile)
82 #### os.unlink is used to remove a file
83 os.unlink(rdatafile.name)
84 return return_code
85
86 def _get_queue_data(self, queue, tmpfile):
87 '''Prepare the queue data for sending'''
88 '''
89 ### one single queue's information like this #####
90 ### curl -i -u guest:guest http://localhost:15672/api/queues dumps a list ###
91{"memory":32064,"message_stats":{"ack":3870,"ack_details":{"rate":0.0},"deliver":3871,"deliver_details":{"rate":0.0},"deliver_get":3871,"deliver_get_details":{"rate":0.0},"disk_writes":3870,"disk_writes_details":{"rate":0.0},"publish":3870,"publish_details":{"rate":0.0},"redeliver":1,"redeliver_details":{"rate":0.0}},"messages":0,"messages_details":{"rate":0.0},"messages_ready":0,"messages_ready_details":{"rate":0.0},"messages_unacknowledged":0,"messages_unacknowledged_details":{"rate":0.0},"idle_since":"2016-03-01 22:04:22","consumer_utilisation":"","policy":"","exclusive_consumer_tag":"","consumers":4,"recoverable_slaves":"","state":"running","messages_ram":0,"messages_ready_ram":0,"messages_unacknowledged_ram":0,"messages_persistent":0,"message_bytes":0,"message_bytes_ready":0,"message_bytes_unacknowledged":0,"message_bytes_ram":0,"message_bytes_persistent":0,"disk_reads":0,"disk_writes":3870,"backing_queue_status":{"q1":0,"q2":0,"delta":["delta",0,0,0],"q3":0,"q4":0,"len":0,"target_ram_count":"infinity","next_seq_id":3870,"avg_ingress_rate":0.060962064328682466,"avg_egress_rate":0.060962064328682466,"avg_ack_ingress_rate":0.060962064328682466,"avg_ack_egress_rate":0.060962064328682466},"name":"app000","vhost":"/","durable":true,"auto_delete":false,"arguments":{},"node":"rabbit@test2"}
92 '''
93 for item in [ 'memory','messages','messages_ready','messages_unacknowledged','consumers' ]:
94 #key = rabbitmq.queues[/,queue_memory,queue.helloWorld]
95 key = '"rabbitmq.queues[{0},queue_{1},{2}]"'.format(queue['vhost'], item, queue['name'])
96 ### if item is in queue,value=queue[item],else value=0
97 value = queue.get(item, 0)
98 logging.debug("SENDER_DATA: - %s %s" % (key,value))
99 tmpfile.write("- %s %s\n" % (key, value))
100 ## This is a non standard bit of information added after the standard items
101 for item in ['deliver_get', 'publish']:
102 key = '"rabbitmq.queues[{0},queue_message_stats_{1},{2}]"'.format(queue['vhost'], item, queue['name'])
103 value = queue.get('message_stats', {}).get(item, 0)
104 logging.debug("SENDER_DATA: - %s %s" % (key,value))
105 tmpfile.write("- %s %s\n" % (key, value))
106
107 def _send_queue_data(self, tmpfile):
108 '''Send the queue data to Zabbix.'''
109 '''Get key value from temp file. '''
110 args = '/opt/app/zabbix/sbin/zabbix_sender -c {0} -i {1}'
111 if self.senderhostname:
112 args = args + " -s " + self.senderhostname
113 return_code = 0
114 process = subprocess.Popen(args.format(self.conf, tmpfile.name),
115 shell=True, stdout=subprocess.PIPE,
116 stderr=subprocess.PIPE)
117 out, err = process.communicate()
118 logging.debug("Finished sending data")
119 return_code = process.wait()
120 logging.info("Found return code of " + str(return_code))
121 if return_code != 0:
122 logging.warning(out)
123 logging.warning(err)
124 else:
125 logging.debug(err)
126 logging.debug(out)
127 return return_code
128
129 def check_aliveness(self):
130 '''Check the aliveness status of a given vhost. '''
131 '''virtual host '/' should be encoded as '/%2f' '''
132 return self.call_api('aliveness-test/%2f')['status']
133
134 def check_overview(self, item):
135 '''First, check the overview specific items'''
136 ''' curl -i -u guest:guest http://localhost:15672/api/overview '''
137 ## rabbitmq[overview,connections]
138 if item in [ 'channels','connections','consumers','exchanges','queues' ]:
139 return self.call_api('overview').get('object_totals').get(item,0)
140 ## rabbitmq[overview,messages]
141 elif item in [ 'messages','messages_ready','messages_unacknowledged' ]:
142 return self.call_api('overview').get('queue_totals').get(item,0)
143 elif item == 'message_stats_deliver_get':
144 return self.call_api('overview').get('message_stats', {}).get('deliver_get',0)
145 elif item == 'message_stats_publish':
146 return self.call_api('overview').get('message_stats', {}).get('publish',0)
147 elif item == 'message_stats_ack':
148 return self.call_api('overview').get('message_stats', {}).get('ack',0)
149 elif item == 'message_stats_redeliver':
150 return self.call_api('overview').get('message_stats', {}).get('redeliver',0)
151 elif item == 'rabbitmq_version':
152 return self.call_api('overview').get('rabbitmq_version', 'None')
153
154 def check_server(self,item,node_name):
155 '''Return the value for a specific item in a node's details. '''
156 '''curl -i -u guest:guest http://localhost:15672/api/nodes'''
157 '''return a list'''
158 # hostname hk-prod-mq1.example.com
159 # self.call_api('nodes')[0]['name'] rabbit@hk-prod-mq1
160 node_name = node_name.split('.')[0]
161 for nodeData in self.call_api('nodes'):
162 if node_name in nodeData['name']:
163 return nodeData.get(item,0)
164 return 'Not Found'
165
166
167def main():
168 '''Command-line parameters and decoding for Zabbix use/consumption.'''
169 choices = ['list_queues', 'list_nodes', 'queues', 'check_aliveness',
170 'overview','server']
171 parser = optparse.OptionParser()
172 parser.add_option('--username', help='RabbitMQ API username',
173 default='guest')
174 parser.add_option('--password', help='RabbitMQ API password',
175 default='guest')
176 parser.add_option('--hostname', help='RabbitMQ API host',
177 default=socket.gethostname())
178 parser.add_option('--protocol', help='RabbitMQ API protocol (http or https)',
179 default='http')
180 parser.add_option('--port', help='RabbitMQ API port', type='int',
181 default=15672)
182 parser.add_option('--check', type='choice', choices=choices,
183 help='Type of check')
184 parser.add_option('--metric', help='Which metric to evaluate', default='')
185 parser.add_option('--node', help='Which node to check (valid for --check=server)')
186 parser.add_option('--conf', default='/opt/app/zabbix/conf/zabbix_agentd.conf')
187 parser.add_option('--senderhostname', default='', help='Allows including a sender parameter on calls to zabbix_sender')
188 (options, args) = parser.parse_args()
189 if not options.check:
190 parser.error('At least one check should be specified')
191 logging.debug("Started trying to process data")
192 api = RabbitMQAPI(user_name=options.username, password=options.password,
193 host_name=options.hostname, protocol=options.protocol, port=options.port,
194 conf=options.conf, senderhostname=options.senderhostname)
195
196 if options.check == 'list_queues':
197 print json.dumps({'data': api.list_queues()},indent=4,separators=(',',':'))
198 elif options.check == 'list_nodes':
199 print json.dumps({'data': api.list_nodes()},indent=4,separators=(',',':'))
200 elif options.check == 'queues':
201 print api.check_queue()
202 elif options.check == 'check_aliveness':
203 print api.check_aliveness()
204 elif options.check == 'overview':
205 #rabbitmq[overview,connections]
206 #--check=overview --metric=connections
207 if not options.metric:
208 parser.error('Missing required parameter: "metric"')
209 else:
210 if options.node:
211 print api.check_overview(options.metric)
212 else:
213 print api.check_overview(options.metric)
214 elif options.check == 'server':
215 #rabbitmq[server,sockets_used]
216 #--check=server --metric=sockets_used
217 if not options.metric:
218 parser.error('Missing required parameter: "metric"')
219 else:
220 if options.node:
221 print api.check_server(options.metric,options.node)
222 else:
223 print api.check_server(options.metric,api.host_name)
224
225
226if __name__ == '__main__':
227 main()
228
脚本思路:
使用urllib2模块访问RabbitMQ的API接口
对API接口返回的数据进行处理
overview和nodes的数据通过zabbix_agent获取,queues通过zabbix_sender推送给zabbix,zabbix_sender推送之前需要有一个zabbix_agent的key进行主动触发
rabbitmq_status.conf
1
2
3
4 1UserParameter=rabbitmq.discovery_queue,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=list_queues
2UserParameter=rabbitmq.queues,/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=queues
3UserParameter=rabbitmq[*],/usr/bin/python /opt/app/zabbix/sbin/rabbitmq_status.py --check=$1 --metric=$2
4
四 添加Zabbix监控模板
模板参加附件
参考文档:
http://blog.thomasvandoren.com/monitoring-rabbitmq-queues-with-zabbix.html
http://www.rabbitmq.com/how.html\#management
https://github.com/alfss/zabbix-rabbitmq
https://cdn.rawgit.com/rabbitmq/rabbitmq-management/rabbitmq_v3_6_0/priv/www/api/index.html
https://github.com/jasonmcintosh/rabbitmq-zabbix
http://chase-seibert.github.io/blog/2011/07/01/checking-rabbitmq-queue-sizeage-with-nagios.html