博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Open-Falcon 监控系统监控 MySQL/Redis/MongoDB 状态监控
阅读量:7018 次
发布时间:2019-06-28

本文共 12212 字,大约阅读时间需要 40 分钟。

背景:

Open-Falcon 是小米运维部开源的一款互联网企业级监控系统解决方案,具体的安装和使用说明请见官网:。

监控:

1) MySQL 收集信息脚本(mysql_monitor.py)

#!/bin/env python
#-- encoding: utf-8 --

from future import division

import MySQLdb
import datetime
import time
import os
import sys
import fileinput
import requests
import json
import re

class MySQLMonitorInfo():

def __init__(self,host,port,user,password):    self.host     = host    self.port     = port    self.user     = user    self.password = passworddef stat_info(self):    try:        m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')        query = "SHOW GLOBAL STATUS"        cursor = m.cursor()        cursor.execute(query)        Str_string = cursor.fetchall()        Status_dict = {}        for Str_key,Str_value in Str_string:            Status_dict[Str_key] = Str_value        cursor.close()        m.close()        return Status_dict    except Exception, e:        print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")        print e        Status_dict = {}        return Status_dict def engine_info(self):    try:        m = MySQLdb.connect(host=self.host,user=self.user,passwd=self.password,port=self.port,charset='utf8')        _engine_regex = re.compile(ur'(History list length) ([0-9]+\.?[0-9]*)\n')        query = "SHOW ENGINE INNODB STATUS"        cursor = m.cursor()        cursor.execute(query)        Str_string = cursor.fetchone()        a,b,c = Str_string        cursor.close()        m.close()        return dict(_engine_regex.findall(c))    except Exception, e:        print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")        print e        return dict(History_list_length=0)

if name == 'main':

open_falcon_api = 'http://192.168.200.86:1988/v1/push'db_list= []for line in fileinput.input():    db_list.append(line.strip())for db_info in db_list:

#host,port,user,password,endpoint,metric = db_info.split(',')

host,port,user,password,endpoint = db_info.split(',')

timestamp = int(time.time())    step      = 60

#tags = "port=%s" %port

tags = ""

conn = MySQLMonitorInfo(host,int(port),user,password)    stat_info = conn.stat_info()    engine_info = conn.engine_info()    mysql_stat_list = []    monitor_keys = [        ('Com_select','COUNTER'),        ('Qcache_hits','COUNTER'),        ('Com_insert','COUNTER'),        ('Com_update','COUNTER'),        ('Com_delete','COUNTER'),        ('Com_replace','COUNTER'),        ('MySQL_QPS','COUNTER'),        ('MySQL_TPS','COUNTER'),        ('ReadWrite_ratio','GAUGE'),        ('Innodb_buffer_pool_read_requests','COUNTER'),        ('Innodb_buffer_pool_reads','COUNTER'),        ('Innodb_buffer_read_hit_ratio','GAUGE'),        ('Innodb_buffer_pool_pages_flushed','COUNTER'),        ('Innodb_buffer_pool_pages_free','GAUGE'),        ('Innodb_buffer_pool_pages_dirty','GAUGE'),        ('Innodb_buffer_pool_pages_data','GAUGE'),        ('Bytes_received','COUNTER'),        ('Bytes_sent','COUNTER'),        ('Innodb_rows_deleted','COUNTER'),        ('Innodb_rows_inserted','COUNTER'),        ('Innodb_rows_read','COUNTER'),        ('Innodb_rows_updated','COUNTER'),        ('Innodb_os_log_fsyncs','COUNTER'),        ('Innodb_os_log_written','COUNTER'),        ('Created_tmp_disk_tables','COUNTER'),        ('Created_tmp_tables','COUNTER'),        ('Connections','COUNTER'),        ('Innodb_log_waits','COUNTER'),        ('Slow_queries','COUNTER'),        ('Binlog_cache_disk_use','COUNTER')    ]    for _key,falcon_type in monitor_keys:        if _key == 'MySQL_QPS':            _value = int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0))        elif _key == 'MySQL_TPS':            _value = int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))        elif _key == 'Innodb_buffer_read_hit_ratio':            try:                _value = round((int(stat_info.get('Innodb_buffer_pool_read_requests',0)) - int(stat_info.get('Innodb_buffer_pool_reads',0)))/int(stat_info.get('Innodb_buffer_pool_read_requests',0)) * 100,3)            except ZeroDivisionError:                _value = 0        elif _key == 'ReadWrite_ratio':            try:                _value = round((int(stat_info.get('Com_select',0)) + int(stat_info.get('Qcache_hits',0)))/(int(stat_info.get('Com_insert',0)) + int(stat_info.get('Com_update',0)) + int(stat_info.get('Com_delete',0)) + int(stat_info.get('Com_replace',0))),2)            except ZeroDivisionError:                _value = 0                    else:            _value = int(stat_info.get(_key,0))        falcon_format = {                'Metric': '%s' % (_key),                'Endpoint': endpoint,                'Timestamp': timestamp,                'Step': step,                'Value': _value,                'CounterType': falcon_type,                'TAGS': tags            }        mysql_stat_list.append(falcon_format)    #_key : History list length    for _key,_value in  engine_info.items():        _key = "Undo_Log_Length"        falcon_format = {                'Metric': '%s' % (_key),                'Endpoint': endpoint,                'Timestamp': timestamp,                'Step': step,                'Value': int(_value),                'CounterType': "GAUGE",                'TAGS': tags            }        mysql_stat_list.append(falcon_format)    print json.dumps(mysql_stat_list,sort_keys=True,indent=4)    requests.post(open_falcon_api, data=json.dumps(mysql_stat_list))            指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

指标 类型 说明

Undo_Log_Length GAUGE 未清除的Undo事务数
Com_select COUNTER select/秒=QPS
Com_insert COUNTER insert/秒
Com_update COUNTER update/秒
Com_delete COUNTER delete/秒
Com_replace COUNTER replace/秒
MySQL_QPS COUNTER QPS
MySQL_TPS COUNTER TPS
ReadWrite_ratio GAUGE 读写比例
Innodb_buffer_pool_read_requests COUNTER innodb buffer pool 读次数/秒
Innodb_buffer_pool_reads COUNTER Disk 读次数/秒
Innodb_buffer_read_hit_ratio GAUGE innodb buffer pool 命中率
Innodb_buffer_pool_pages_flushed COUNTER innodb buffer pool 刷写到磁盘的页数/秒
Innodb_buffer_pool_pages_free GAUGE innodb buffer pool 空闲页的数量
Innodb_buffer_pool_pages_dirty GAUGE innodb buffer pool 脏页的数量
Innodb_buffer_pool_pages_data GAUGE innodb buffer pool 数据页的数量
Bytes_received COUNTER 接收字节数/秒
Bytes_sent COUNTER 发送字节数/秒
Innodb_rows_deleted COUNTER innodb表删除的行数/秒
Innodb_rows_inserted COUNTER innodb表插入的行数/秒
Innodb_rows_read COUNTER innodb表读取的行数/秒
Innodb_rows_updated COUNTER innodb表更新的行数/秒
Innodb_os_log_fsyncs COUNTER Redo Log fsync次数/秒
Innodb_os_log_written COUNTER Redo Log 写入的字节数/秒
Created_tmp_disk_tables COUNTER 创建磁盘临时表的数量/秒
Created_tmp_tables COUNTER 创建内存临时表的数量/秒
Connections COUNTER 连接数/秒
Innodb_log_waits COUNTER innodb log buffer不足等待的数量/秒
Slow_queries COUNTER 慢查询数/秒
Binlog_cache_disk_use COUNTER Binlog Cache不足的数量/秒

使用说明:读取配置到都数据库列表执行,配置文件格式如下(mysqldb_list.txt):

IP,Port,User,Password,endpoint

192.168.2.21,3306,root,123,mysql-21:3306

192.168.2.88,3306,root,123,mysql-88:3306
最后执行:

python mysql_monitor.py mysqldb_list.txt

2) Redis 收集信息脚本(redis_monitor.py)

#!/bin/env python

#-- coding:utf-8 --

import json

import time
import re
import redis
import requests
import fileinput
import datetime

class RedisMonitorInfo():

def __init__(self,host,port,password):    self.host     = host    self.port     = port    self.password = passworddef stat_info(self):     try:        r = redis.Redis(host=self.host, port=self.port, password=self.password)        stat_info = r.info()        return stat_info     except Exception, e:        print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")        print e        return dict()def cmdstat_info(self):    try:        r = redis.Redis(host=self.host, port=self.port, password=self.password)        cmdstat_info = r.info('Commandstats')        return cmdstat_info    except Exception, e:        print (datetime.datetime.now()).strftime("%Y-%m-%d %H:%M:%S")        print e        return dict()

if name == 'main':

open_falcon_api = 'http://192.168.200.86:1988/v1/push'db_list= []for line in fileinput.input():    db_list.append(line.strip())for db_info in db_list:

#host,port,password,endpoint,metric = db_info.split(',')

host,port,password,endpoint = db_info.split(',')

timestamp = int(time.time())    step      = 60    falcon_type = 'COUNTER'

#tags = "port=%s" %port

tags = ""

conn = RedisMonitorInfo(host,port,password)    #查看各个命令每秒执行次数    redis_cmdstat_dict = {}    redis_cmdstat_list = []    cmdstat_info = conn.cmdstat_info()    for cmdkey in cmdstat_info:        redis_cmdstat_dict[cmdkey] = cmdstat_info[cmdkey]['calls']    for _key,_value in redis_cmdstat_dict.items():        falcon_format = {                'Metric': '%s' % (_key),                'Endpoint': endpoint,                'Timestamp': timestamp,                'Step': step,                'Value': int(_value),                'CounterType': falcon_type,                'TAGS': tags            }        redis_cmdstat_list.append(falcon_format)    #查看Redis各种状态,根据需要增删监控项,str的值需要转换成int    redis_stat_list = []    monitor_keys = [        ('connected_clients','GAUGE'),        ('blocked_clients','GAUGE'),        ('used_memory','GAUGE'),        ('used_memory_rss','GAUGE'),        ('mem_fragmentation_ratio','GAUGE'),        ('total_commands_processed','COUNTER'),        ('rejected_connections','COUNTER'),        ('expired_keys','COUNTER'),        ('evicted_keys','COUNTER'),        ('keyspace_hits','COUNTER'),        ('keyspace_misses','COUNTER'),        ('keyspace_hit_ratio','GAUGE'),        ('keys_num','GAUGE'),    ]    stat_info = conn.stat_info()       for _key,falcon_type in monitor_keys:        #计算命中率        if _key == 'keyspace_hit_ratio':            try:                _value = round(float(stat_info.get('keyspace_hits',0))/(int(stat_info.get('keyspace_hits',0)) + int(stat_info.get('keyspace_misses',0))),4)*100            except ZeroDivisionError:                _value = 0        #碎片率是浮点数        elif _key == 'mem_fragmentation_ratio':            _value = float(stat_info.get(_key,0))        #拿到key的数量        elif _key == 'keys_num':            _value = 0             for i in range(16):                _key = 'db'+str(i)                _num = stat_info.get(_key)                if _num:                    _value += int(_num.get('keys'))            _key = 'keys_num'        #其他的都采集成counter,int        else:            try:                _value = int(stat_info[_key])            except:                continue        falcon_format = {                'Metric': '%s' % (_key),                'Endpoint': endpoint,                'Timestamp': timestamp,                'Step': step,                'Value': _value,                'CounterType': falcon_type,                'TAGS': tags            }        redis_stat_list.append(falcon_format)    load_data = redis_stat_list+redis_cmdstat_list    print json.dumps(load_data,sort_keys=True,indent=4)    requests.post(open_falcon_api, data=json.dumps(load_data))            指标说明:收集指标里的COUNTER表示每秒执行次数,GAUGE表示直接输出值。

指标 类型 说明

connected_clients GAUGE 连接的客户端个数
blocked_clients GAUGE 被阻塞客户端的数量
used_memory GAUGE Redis分配的内存的总量
used_memory_rss GAUGE OS分配的内存的总量
mem_fragmentation_ratio GAUGE 内存碎片率,used_memory_rss/used_memory
total_commands_processed COUNTER 每秒执行的命令数,比较准确的QPS
rejected_connections COUNTER 被拒绝的连接数/秒
expired_keys COUNTER 过期KEY的数量/秒
evicted_keys COUNTER 被驱逐KEY的数量/秒
keyspace_hits COUNTER 命中KEY的数量/秒
keyspace_misses COUNTER 未命中KEY的数量/秒
keyspace_hit_ratio GAUGE KEY的命中率
keysnum GAUGE KEY的数量
cmd
* COUNTER 各种名字都执行次数/秒
使用说明:读取配置到都数据库列表执行,配置文件格式如下(redisdb_list.txt):

IP,Port,Password,endpoint

192.168.1.56,7021,zhoujy,redis-56:7021

192.168.1.55,7021,zhoujy,redis-55:7021
最后执行:

python redis_monitor.py redisdb_list.txt

3) MongoDB 收集信息脚本(mongodb_monitor.py)

...后续添加

4)其他相关的监控(需要装上agent),比如下面的指标:

告警项 触发条件 备注

load.1min all(#3)>10 Redis服务器过载,处理能力下降
cpu.idle all(#3)<10 CPU idle过低,处理能力下降
df.bytes.free.percent all(#3)<20 磁盘可用空间百分比低于20%,影响从库RDB和AOF持久化
mem.memfree.percent all(#3)<15 内存剩余低于15%,Redis有OOM killer和使用swap的风险
mem.swapfree.percent all(#3)<80 使用20% swap,Redis性能下降或OOM风险
net.if.out.bytes all(#3)>94371840 网络出口流量超90MB,影响Redis响应
net.if.in.bytes all(#3)>94371840 网络入口流量超90MB,影响Redis响应
disk.io.util all(#3)>90 磁盘IO可能存负载,影响从库持久化和阻塞写

转载于:https://blog.51cto.com/ershao/2118760

你可能感兴趣的文章
Dwr3.0纯注解(纯Java Code配置)配置与应用浅析一之零配置文件化
查看>>
IMAP共享文件夹配置
查看>>
NTFS权限设置案例分享
查看>>
什么是checkpoint
查看>>
神经网络算法对车牌价格的预测
查看>>
使用kubekit搭建k8s集群
查看>>
Hyper-V 2016 系列教程41 Windows 10 Hyper-V 系统要求
查看>>
MySQL使用裸设备
查看>>
win7无线图标显示叹号
查看>>
zabbix安装(5)
查看>>
正则表达式之二---稍复杂一点的匹配符
查看>>
查询的几种连接(摘自网络)
查看>>
linux安装Vmware的时候出现“Could not open /dev/vmmon”
查看>>
linux下常用快捷键
查看>>
Listview下拉刷新数据
查看>>
CentOS 6.3部署KVM虚拟机 V1.0
查看>>
RHCS
查看>>
Tomcat启动脚本catalina.sh
查看>>
Django解决扩展用户表时,后台Admin显示密码为明文的问题
查看>>
vCenter连接ESXi主机出错
查看>>