Database observability goes beyond monitoring. It means having enough context to understand why the database is slow — not just that it is slow. Here is how to build a complete observability stack.
The Three Pillars
- Metrics: aggregated numbers over time (TPS, latency, cache hit rate)
- Logs: timestamped records of individual events (slow queries, errors)
- Traces: end-to-end request paths showing time spent in each layer
Key Metrics to Track
PostgreSQL:
- transactions per second (tps)
- cache hit ratio (from pg_stat_bgwriter)
- replication lag (pg_stat_replication.replay_lag)
- table/index bloat (n_dead_tup)
- connection count (pg_stat_activity)
MySQL:
- Questions/Queries per second
- InnoDB buffer pool hit ratio
- Threads_running vs Threads_connected
- Innodb_row_lock_waits
- Seconds_Behind_MasterPrometheus + postgres_exporter
# Run postgres_exporter
docker run -e DATA_SOURCE_NAME='postgresql://postgres:pass@localhost:5432/postgres?sslmode=disable' \
quay.io/prometheuscommunity/postgres-exporter
# Key metrics exposed:
# pg_stat_database_tup_fetched
# pg_stat_bgwriter_buffers_alloc
# pg_replication_lag
# pg_stat_user_tables_n_dead_tupPrometheus Alert Rules
groups:
- name: postgres
rules:
- alert: PostgreSQLHighReplicationLag
expr: pg_replication_lag > 30
for: 2m
labels:
severity: warning
annotations:
summary: PostgreSQL replication lag > 30s
- alert: PostgreSQLLowCacheHitRate
expr: |
pg_stat_database_blks_hit /
(pg_stat_database_blks_hit + pg_stat_database_blks_read) < 0.95
for: 5m
labels:
severity: criticalDistributed Tracing with OpenTelemetry
from opentelemetry import trace
from opentelemetry.instrumentation.psycopg2 import Psycopg2Instrumentor
# Auto-instrument database calls
Psycopg2Instrumentor().instrument()
# Every SQL call now appears in your trace with:
# - SQL query text
# - Execution time
# - Row counts
# - Connection attributesKey Takeaways
- Instrument metrics, logs, and traces — each answers different diagnostic questions
- Set up Prometheus alerts for replication lag, low cache hit rate, and connection count spikes
- OpenTelemetry auto-instrumentation adds database spans to distributed traces with zero code changes
- Slow query logs are the most actionable log — ensure they are enabled and shipped to your log aggregator
JusDB Can Help
Building a complete database observability stack is complex. JusDB can design and implement monitoring, alerting, and tracing for your database layer.