Database SRE

Percona Monitoring and Management (PMM): Full Setup Guide

Deploy PMM to get deep MySQL and PostgreSQL visibility — query analytics, dashboards, and alerting in one tool

JusDB Team
September 6, 2022
13 min read
162 views

Running MySQL or PostgreSQL in production without deep observability is flying blind — you know something is slow, but not why. Percona Monitoring and Management (PMM) is a free, open-source platform that gives DBAs exactly what generic APM tools miss: per-query execution plans, wait event breakdowns, InnoDB internals, and PostgreSQL vacuum tracking, all in a single self-hosted dashboard. Unlike Datadog or New Relic, which treat databases as a secondary concern, PMM is built from the ground up for database engineers. This guide walks through deploying PMM end-to-end, from spinning up the server to analyzing your first slow query.

TL;DR
  • PMM Server runs as a Docker container; clients install on each database host
  • Use pmm-admin add mysql and pmm-admin add postgresql to register instances
  • Query Analytics (QAN) surfaces slow queries with execution plans and wait events — no manual log parsing
  • Key dashboards: MySQL Overview, PostgreSQL Overview, MySQL InnoDB Details, Query Analytics
  • PMM is free and self-hosted; Datadog/New Relic charge per host and lack DB-native query intelligence

What is Percona Monitoring and Management?

PMM is an open-source database monitoring platform maintained by Percona. It combines Grafana for dashboards, VictoriaMetrics as the time-series backend, Prometheus exporters for metric collection, and a custom Query Analytics engine that understands MySQL and PostgreSQL query internals. The result is a purpose-built DBA toolbox rather than a generic observability platform bolted onto database plugins.

PMM tracks metrics that matter to database engineers: InnoDB buffer pool hit ratio, replication lag, lock wait time, autovacuum activity, table bloat, and query fingerprints with normalized execution statistics. You can drill from a high-level throughput spike directly into the specific query causing it, see its execution plan, and identify which wait event is the bottleneck — all without leaving the PMM UI.

Compared to Datadog and New Relic, PMM offers several concrete advantages for database-focused teams:

  • Cost: PMM is fully free and self-hosted. Datadog charges per host per month and query-level tracing requires additional APM add-ons.
  • Query depth: PMM's QAN engine provides normalized query fingerprints, P95/P99 latency histograms, and execution plan snapshots. Generic APMs show slow query counts but rarely the plan.
  • Database-native dashboards: Hundreds of pre-built panels covering InnoDB internals, PostgreSQL statistics views, ProxySQL, and MongoDB — not available in generic tools without custom dashboard work.
  • No data egress: All telemetry stays on your infrastructure, which matters for compliance-sensitive environments.

PMM Architecture

PMM uses a server/client architecture. The PMM Server is a single container (or VM image) that hosts Grafana, VictoriaMetrics, the QAN API, and the alert manager. PMM Clients run on each database host, collecting metrics via Prometheus exporters and shipping query digest data to the server.

The key components on the client side are:

  • mysqld_exporter — scrapes MySQL performance schema and status variables
  • postgres_exporter — scrapes PostgreSQL pg_stat_* views
  • node_exporter — collects OS-level metrics (CPU, memory, disk I/O)
  • pmm-agent — the orchestration daemon that manages exporters and the QAN data pipeline

Query Analytics works by reading the MySQL slow query log or Performance Schema digest tables, and PostgreSQL's pg_stat_statements extension. The agent normalizes queries into fingerprints, computes aggregate statistics, and ships them to the QAN API on the PMM Server.

Tip

Run PMM Server on a dedicated host or VM separate from your database servers. It stores time-series data locally and will consume meaningful disk space (plan for at least 50 GB for a multi-node setup with 30 days of retention).


Installing PMM Server via Docker

The fastest way to get PMM Server running is Docker. The official image bundles every server-side component into a single container.

First, create a named volume to persist data across container restarts:

text
docker volume create pmm-data

Then run the PMM Server container:

text
docker run -d \
  --name pmm-server \
  --restart always \
  -p 443:443 \
  -p 80:80 \
  -v pmm-data:/srv \
  percona/pmm-server:2

PMM Server will be accessible at https://<your-host-ip>. The default credentials are admin / admin — you will be prompted to change the password on first login.

Warning

Do not expose PMM Server directly to the public internet. Place it behind a VPN or private network. The Grafana UI and QAN API contain sensitive query data and database credentials.

To verify the server is healthy:

text
curl -k https://localhost/v1/server/version

You should receive a JSON response with the PMM version. If you see a connection refused error, check that the container started cleanly with docker logs pmm-server.


Adding MySQL and PostgreSQL Clients

Installing pmm-client

Install the PMM client package on each database host. On Debian/Ubuntu:

text
wget https://repo.percona.com/apt/percona-release_latest.generic_all.deb
dpkg -i percona-release_latest.generic_all.deb
apt-get update
apt-get install -y pmm2-client

On RHEL/CentOS/Amazon Linux:

text
yum install -y https://repo.percona.com/yum/percona-release-latest.noarch.rpm
yum install -y pmm2-client

Register the client with your PMM Server (replace the IP and password with your actual values):

text
pmm-admin config \
  --server-insecure-tls \
  --server-url="https://admin:your_password@:443"

Adding a MySQL Instance

PMM needs a MySQL user with sufficient privileges to query Performance Schema and status variables. Create a dedicated monitoring user:

text
CREATE USER 'pmm'@'localhost' IDENTIFIED BY 'strong_password' WITH MAX_USER_CONNECTIONS 10;
GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'pmm'@'localhost';
GRANT SELECT, UPDATE, DELETE, DROP ON performance_schema.* TO 'pmm'@'localhost';

Then register the MySQL instance with pmm-admin:

text
pmm-admin add mysql \
  --username=pmm \
  --password=strong_password \
  --service-name=prod-mysql-01 \
  --host=localhost \
  --port=3306 \
  --query-source=perfschema

The --query-source flag controls where QAN reads query data. Use perfschema (Performance Schema) for most setups. If Performance Schema is disabled, fall back to slowlog:

text
pmm-admin add mysql \
  --username=pmm \
  --password=strong_password \
  --service-name=prod-mysql-01 \
  --host=localhost \
  --port=3306 \
  --query-source=slowlog \
  --size-slow-logs=1GiB
Tip

Enable the slow query log with long_query_time=0 in your MySQL config to capture all queries for QAN analysis, then raise the threshold to 0.1 or 1 second once you have identified your baseline. Capturing everything for 24 hours is a good way to find unexpected heavy hitters.

Enable slow query logging in /etc/mysql/mysql.conf.d/mysqld.cnf:

text
[mysqld]
slow_query_log = ON
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0
log_queries_not_using_indexes = ON

Adding a PostgreSQL Instance

PMM's PostgreSQL integration relies on pg_stat_statements. Enable it by adding the extension to postgresql.conf:

text
shared_preload_libraries = 'pg_stat_statements'
pg_stat_statements.track = all
pg_stat_statements.max = 10000

Restart PostgreSQL, then create the extension and a monitoring user:

text
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

CREATE USER pmm WITH PASSWORD 'strong_password';
GRANT pg_monitor TO pmm;
GRANT SELECT ON pg_stat_statements TO pmm;

Register the PostgreSQL instance:

text
pmm-admin add postgresql \
  --username=pmm \
  --password=strong_password \
  --service-name=prod-postgres-01 \
  --host=localhost \
  --port=5432 \
  --database=postgres \
  --query-source=pgstatements

Verify both services are registered and reporting:

text
pmm-admin list

You should see entries for each service with a status of Running.


Key Grafana Dashboards

PMM ships with a comprehensive dashboard library. The most useful dashboards for day-to-day operations are:

MySQL Overview

The MySQL Overview dashboard provides a top-level summary of instance health: queries per second, connections, InnoDB buffer pool usage, disk I/O, and replication status. Use it as your starting point for any performance investigation. Key panels to watch:

  • MySQL Connections — compare current connections against max_connections; connection exhaustion is a common production failure mode
  • InnoDB Buffer Pool — a hit ratio below 95% indicates the buffer pool is too small for your working set
  • MySQL Slow Queries — spikes here correlate directly with application latency increases

MySQL InnoDB Details

This dashboard exposes InnoDB internals: row lock waits, deadlock rate, checkpoint age, redo log usage, and page flush activity. If you are troubleshooting lock contention or write bottlenecks, this dashboard will point you to the right subsystem.

PostgreSQL Overview

The equivalent dashboard for PostgreSQL shows active connections, transaction rates, tuple reads/writes, cache hit ratio, and replication lag. Critical panels:

  • PostgreSQL Cache Hit Ratio — should stay above 99% for most OLTP workloads; drops indicate shared_buffers or OS page cache pressure
  • PostgreSQL Autovacuum — monitors vacuum and analyze activity; excessive dead tuple accumulation causes table bloat and index degradation
  • PostgreSQL Connections — track idle connections; consider PgBouncer if you are approaching max_connections

Node Summary

Database performance problems often originate at the OS level. The Node Summary dashboard tracks CPU steal (a sign of noisy neighbors on cloud VMs), disk I/O saturation, memory pressure, and network throughput alongside your database metrics on the same timeline.


Query Analytics (QAN)

Query Analytics is the feature that separates PMM from generic monitoring tools. QAN normalizes raw SQL into query fingerprints, aggregates execution statistics, and presents a ranked view of your most expensive queries.

Access QAN from the PMM left nav under Query Analytics. The default view shows queries ranked by total query time over the selected time window. The most important columns:

  • QPS — queries per second for this fingerprint
  • Avg — average execution time
  • P99 — 99th percentile latency; this is what your slowest users experience
  • Load — total time consumed (QPS × avg); a low-QPS query with high avg can still dominate your database load

Click any query fingerprint to open the detail view, which shows:

  • Normalized query text with parameter placeholders
  • Execution count and latency histogram over time
  • EXPLAIN output (for MySQL with Performance Schema, automatic; for PostgreSQL, requires pg_stat_statements)
  • Wait event breakdown (MySQL 8.0+ with Performance Schema events instrumentation)
Warning

For MySQL QAN with Performance Schema, ensure performance_schema = ON in your MySQL config and that the events_statements_* consumer tables are enabled. On MySQL 8.0 these are on by default; on MySQL 5.7 you may need to enable them explicitly with UPDATE performance_schema.setup_consumers SET ENABLED='YES' WHERE NAME LIKE 'events_statements%';

Finding Slow Queries in Practice

A practical workflow for query optimization with QAN:

  1. Set the time range to the last 1 hour during a known slow period
  2. Sort by Load to find the highest-impact queries
  3. Click the top query, review the P99 latency and trend over time
  4. Open the Explain tab to view the execution plan — look for full table scans (type: ALL in MySQL, Seq Scan in PostgreSQL) on large tables
  5. Cross-reference with the MySQL InnoDB Details or PostgreSQL Overview dashboard to see if the slow period correlates with lock waits or I/O spikes

Alerting Setup

PMM 2.x includes Integrated Alerting built on Grafana Alertmanager. To configure basic database alerts, navigate to Alerting > Alert Rules in the PMM UI and create rules based on Prometheus metrics.

Useful starter alert rules:

text
# MySQL: connection usage above 80%
(mysql_global_status_threads_connected / mysql_global_variables_max_connections) > 0.8

# MySQL: InnoDB buffer pool hit ratio below 95%
(rate(mysql_global_status_innodb_buffer_pool_reads[5m]) /
 rate(mysql_global_status_innodb_buffer_pool_read_requests[5m])) > 0.05

# PostgreSQL: replication lag above 30 seconds
pg_replication_lag > 30

# Node: disk usage above 85%
(node_filesystem_size_bytes - node_filesystem_free_bytes) /
 node_filesystem_size_bytes > 0.85

Route alerts to Slack, PagerDuty, or email via Alerting > Contact Points. For production setups, route critical alerts (replication failure, disk full) directly to on-call and non-critical alerts (high connection count, slow query spike) to a Slack channel for async review.

Tip

PMM also ships with a library of pre-built alert rule templates under Alerting > Alert Rule Templates. These cover the most common database failure conditions and are a good starting point before writing custom PromQL rules.


Key Takeaways
  • PMM is a free, self-hosted alternative to Datadog and New Relic that is purpose-built for MySQL and PostgreSQL observability
  • Deploy PMM Server as a Docker container with a persistent volume; install pmm-client on each database host
  • Use pmm-admin add mysql --query-source=perfschema and pmm-admin add postgresql --query-source=pgstatements to register instances
  • Enable pg_stat_statements on PostgreSQL and Performance Schema consumers on MySQL before registering — QAN is not useful without them
  • Query Analytics (QAN) provides normalized query fingerprints, P99 latencies, and execution plans; sort by Load to find the highest-impact optimization targets
  • The MySQL InnoDB Details and PostgreSQL Overview dashboards surface the subsystem-level metrics needed for root cause analysis
  • Set up Alertmanager alert rules for connection exhaustion, replication lag, buffer pool hit ratio, and disk usage as your baseline production alert set

Run PMM Without Managing the Infrastructure

PMM gives you exceptional visibility into your databases, but running the PMM Server itself — storage sizing, version upgrades, backup, and availability — is another operational burden on top of your databases. At JusDB, we handle the full database operations layer for you: managed MySQL and PostgreSQL with built-in observability, query performance insights, alerting, and expert DBA support, so your engineering team can focus on the product rather than the database plumbing.

If you are evaluating whether to self-manage PMM or offload database operations entirely, talk to the JusDB team — we can walk through what production database observability looks like for your specific stack.

Share this article