Your MySQL instance went down at 2 AM and nobody knew until users started complaining at 2:15 AM. A Prometheus + Grafana monitoring stack would have alerted you 20 minutes earlier when replication lag hit 60 seconds and connections maxed out. Here's how to build it in an afternoon.
- mysqld_exporter exposes 300+ MySQL metrics as a Prometheus-scrapable endpoint
- Five dashboards cover 95% of MySQL production incidents: connections, replication, InnoDB, queries, and slow log
- Alert on: connections > 80% of max, replication lag > 30s, InnoDB buffer pool hit rate < 99%
- Use Grafana dashboard ID 7362 (MySQL Overview) as your starting point
Setting Up mysqld_exporter
Step 1: Create the Monitoring User
-- MySQL: create least-privilege monitoring user
CREATE USER 'exporter'@'localhost'
IDENTIFIED BY 'strong_monitoring_password';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
GRANT SELECT ON performance_schema.* TO 'exporter'@'localhost';
FLUSH PRIVILEGES;Step 2: Deploy mysqld_exporter
# Download and install
wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.15.1/mysqld_exporter-0.15.1.linux-amd64.tar.gz
tar -xzf mysqld_exporter-0.15.1.linux-amd64.tar.gz
mv mysqld_exporter-0.15.1.linux-amd64/mysqld_exporter /usr/local/bin/
# Create credentials file
cat > /etc/mysqld_exporter/.my.cnf < /etc/systemd/system/mysqld_exporter.service < Step 3: Configure Prometheus Scrape
# prometheus.yml
scrape_configs:
- job_name: mysql
static_configs:
- targets:
- mysql-primary.internal:9104
- mysql-replica1.internal:9104
- mysql-replica2.internal:9104
relabel_configs:
- source_labels: [__address__]
target_label: instanceEssential Alerting Rules
# mysql-alerts.yml
groups:
- name: mysql
rules:
# Connection saturation
- alert: MySQLConnectionSaturation
expr: |
mysql_global_status_threads_connected /
mysql_global_variables_max_connections > 0.8
for: 2m
labels: { severity: warning }
annotations:
summary: "MySQL connections at {{ $value | humanizePercentage }} of max"
# Replication lag
- alert: MySQLReplicationLag
expr: mysql_slave_status_seconds_behind_master > 30
for: 1m
labels: { severity: critical }
annotations:
summary: "MySQL replica {{ $labels.instance }} is {{ $value }}s behind primary"
# InnoDB buffer pool hit rate
- alert: MySQLBufferPoolHitRateLow
expr: |
rate(mysql_global_status_innodb_buffer_pool_read_requests[5m]) /
(rate(mysql_global_status_innodb_buffer_pool_read_requests[5m]) +
rate(mysql_global_status_innodb_buffer_pool_reads[5m])) < 0.99
for: 5m
labels: { severity: warning }
annotations:
summary: "InnoDB buffer pool hit rate {{ $value | humanizePercentage }}"
# Slow queries spike
- alert: MySQLSlowQueriesSpike
expr: rate(mysql_global_status_slow_queries[5m]) > 10
for: 2m
labels: { severity: warning }
annotations:
summary: "{{ $value }} slow queries/sec on {{ $labels.instance }}"
# MySQL down
- alert: MySQLDown
expr: mysql_up == 0
for: 30s
labels: { severity: critical }
annotations:
summary: "MySQL instance {{ $labels.instance }} is DOWN"Key Metrics to Dashboard
| Metric | PromQL | Healthy Range |
|---|---|---|
| QPS | rate(mysql_global_status_queries[1m]) | Workload-dependent |
| Connections used | mysql_global_status_threads_connected / mysql_global_variables_max_connections | < 80% |
| Buffer pool hit rate | 1 - rate(reads[5m]) / rate(read_requests[5m]) | > 99% |
| Replication lag | mysql_slave_status_seconds_behind_master | < 5s |
| Aborted connections | rate(mysql_global_status_aborted_connects[5m]) | < 1/s |
Import Grafana dashboard ID 7362 (MySQL Overview by Percona) as your baseline. It includes all critical panels pre-built. Then add your custom alert annotations on top.
- Deploy mysqld_exporter with a least-privilege
exporterMySQL user — it needs PROCESS, REPLICATION CLIENT, and SELECT on performance_schema. - Alert on five signals: connection saturation, replication lag, buffer pool hit rate, slow query rate, and
mysql_up. - Start with Grafana dashboard 7362 — it covers the 20 most important MySQL metrics without custom panel work.
- Scrape all replicas separately — replication lag is only visible on the replica endpoint.
Working with JusDB on MySQL Monitoring
JusDB deploys Prometheus + Grafana monitoring stacks for MySQL deployments and configures alerting rules that catch problems before they become incidents. We build runbooks alongside every alert so on-call engineers know exactly what to do when a page fires.