NoSQL Databases

Maximize MongoDB Log Understanding with Hatchet: Log Analytical Arsenal

Hatchet transforms raw MongoDB JSON logs into actionable insights — surfacing slow operations, connection storms, and namespace hotspots that are impossible to spot in raw log files.

JusDB Team
August 21, 2023
8 min read
160 views

It is 2 a.m. and your on-call engineer is staring at a 4 GB mongod.log file, tail-scrolling past thousands of lines of JSON while the P1 ticket ages in Jira. The application is crawling — median API response time has quadrupled in the past hour — but the log file is a wall of structured noise: operations, connection pool churn, replica set heartbeats, and slow queries all interleaved with no obvious signal. Grepping for durationMillis returns 80,000 lines. Sorting that output by hand is not a workflow; it is a career-limiting experience. Hatchet changes this completely: it ingests the entire log file, groups every slow operation by its query shape, computes percentile latencies across the full history, and presents a ranked summary in seconds — turning a 2 a.m. guessing session into a disciplined, evidence-based analysis.

TL;DR
  • Hatchet is an open-source MongoDB log analysis tool from Percona that groups slow queries by shape, computes P50/P95/P99 latencies, and surfaces index recommendations from your mongod.log.
  • Install with a single go install command or a pre-built binary; basic usage is hatchet --log mongod.log.
  • Hatchet natively parses MongoDB 4.4+ structured JSON logs and falls back gracefully to legacy text-format logs from older versions.
  • Query shape grouping abstracts specific literal values, so find({user_id: 123}) and find({user_id: 456}) are treated as the same shape — giving you accurate frequency and latency aggregation across millions of log lines.
  • The recommended workflow is: download logs → run Hatchet → identify top shapes by total duration → run EXPLAIN → add the missing index.
  • mtools (mloginfo, mplotqueries) remains useful for time-series visualization and quick one-liners; Hatchet wins on structured shape analysis and index recommendations for production-scale log files.
  • MongoDB Atlas log download integrates cleanly with Hatchet — no special preprocessing required.

What Hatchet Is and Why Percona Built It

Hatchet is a command-line log analysis tool maintained by Percona and released under the Apache 2.0 license. The project was created to address a real gap: MongoDB's built-in slow query log is excellent raw material, but the log itself does not aggregate, rank, or group queries. Every invocation of a slow query produces a separate log line, so a query running 500 times per minute at 200ms each generates 30,000 log lines per hour — and there is no built-in mechanism to say "this one pattern is responsible for 47% of your total slow-query time."

Hatchet's core value is query shape normalization. It strips out literal values from filter documents and replaces them with a canonical placeholder, producing a fingerprint — called a "shape" — that represents the structural pattern of the query. All log lines sharing the same shape are aggregated together, giving you count, total duration, average duration, and percentile latencies (P50, P95, P99) for each distinct access pattern. This is the same approach used by tools like pt-query-digest for MySQL, now available for MongoDB.

The project lives at github.com/percona/hatchet. It is written in Go, ships as a single statically-linked binary, and has no runtime dependencies.

Installation and Basic Usage

Hatchet can be installed via the Go toolchain, downloaded as a pre-built binary from GitHub Releases, or built from source. The Go install path is simplest if you have Go 1.21+ available:

text
# Install latest release via Go toolchain
go install github.com/percona/hatchet@latest

# Verify the binary is available
hatchet --version

If you prefer a pre-built binary, download the appropriate archive from the GitHub Releases page, extract it, and place the binary in your $PATH:

text
# Linux x86_64 — adjust version tag as needed
curl -LO https://github.com/percona/hatchet/releases/download/v1.0.0/hatchet_linux_amd64.tar.gz
tar -xzf hatchet_linux_amd64.tar.gz
sudo mv hatchet /usr/local/bin/

# macOS (Apple Silicon)
curl -LO https://github.com/percona/hatchet/releases/download/v1.0.0/hatchet_darwin_arm64.tar.gz
tar -xzf hatchet_darwin_arm64.tar.gz
sudo mv hatchet /usr/local/bin/

With the binary available, point it at a log file:

text
# Basic usage — analyzes the log and prints a summary to stdout
hatchet --log /var/log/mongodb/mongod.log

# Analyze a gzip-compressed log (common for Atlas downloads and rotated logs)
hatchet --log mongod.log.gz

# Limit analysis to the slow-query threshold used by MongoDB (default: 100ms)
hatchet --log mongod.log --slow-ms 100

# Output the full report to a file instead of stdout
hatchet --log mongod.log --output report.txt
Tip

If your log file is large (multi-gigabyte), Hatchet still processes it efficiently because it streams the file rather than loading it entirely into memory. On a typical 4-core server with a 2 GB log file, expect analysis to complete in under two minutes. You do not need to pre-filter the log — Hatchet ignores non-slow-query log lines automatically.

Parsing MongoDB JSON Logs vs Legacy Text Logs

MongoDB 4.4 introduced a new structured JSON log format. Every log line is a valid JSON object with a predictable schema including fields like t (timestamp), s (severity), c (component), msg (message template), and attr (attributes). A slow query line in this format looks like:

text
{"t":{"$date":"2025-09-14T02:17:43.812+00:00"},"s":"I","c":"COMMAND","id":51803,
"ctx":"conn4821","msg":"Slow query","attr":{"type":"command","ns":"mydb.orders",
"command":{"find":"orders","filter":{"status":"pending","user_id":{"$gt":1000}},
"sort":{"created_at":-1},"limit":50},"planSummary":"COLLSCAN",
"durationMillis":2341,"keysExamined":0,"docsExamined":1482930,"nreturned":50}}

Hatchet parses this JSON format natively, extracting the namespace, operation type, filter shape, plan summary, and duration from the structured attr block. For MongoDB versions prior to 4.4 (3.6, 4.0, 4.2), logs were plain text in a fixed format. Hatchet detects the format automatically at parse time — you do not need to pass a flag. If your fleet still has mixed-version replica sets or you are analyzing historical archives from older deployments, Hatchet handles both without any configuration change.

Warning

MongoDB 3.4 and earlier used a slightly different slow-query log structure that Hatchet may not parse with full fidelity. If you are regularly analyzing logs from pre-4.0 deployments, verify that Hatchet's output operation counts match what you expect by cross-checking against a manual grep count for durationMillis.

Understanding Hatchet's Output: Reading the Summary Report

After processing a log file, Hatchet prints a ranked table of query shapes. Each row represents one unique structural pattern found in the slow query log. Here is an abbreviated example of what that output looks like:

text
Hatchet MongoDB Log Analysis Report
====================================
Log file:   mongod.log
Duration:   2025-09-14 00:00:01 to 2025-09-14 06:00:01 (6 hours)
Slow query threshold: 100ms
Total slow operations analyzed: 14,832

SLOW QUERY SUMMARY (sorted by total duration)
==============================================
 Rank  Count   TotalMs    AvgMs   P50Ms   P95Ms   P99Ms  NS                    Op      Shape
    1   3821  9,124,500   2,388   1,940   6,210   9,881  mydb.orders           find    {status:1,user_id:{$gt:1}}
    2   2104  4,871,200   2,315   1,801   5,940   8,442  mydb.sessions         find    {token:1}
    3    988  2,204,400   2,232   1,700   4,820   7,301  mydb.products         find    {category:1,price:{$lte:1}}
    4   1542  1,851,600     1201     890   3,100   4,980  mydb.events          aggregate {user_id:1,ts:{$gte:1,$lte:1}}
    5    402    924,800   2,300   1,620   5,401   8,200  mydb.audit_log        find    {actor_id:1,action:1}

Breaking down the columns:

  • Count: how many times this query shape appeared in the slow log during the analysis window.
  • TotalMs: the cumulative wall-clock time spent executing this shape. Sorting by this column identifies the shapes causing the most overall database load — not just the slowest single execution.
  • AvgMs: arithmetic mean duration. Useful but can be skewed by outliers — always look at percentiles.
  • P50Ms: median latency. Half of executions of this shape were faster than this value.
  • P95Ms: 95th percentile. This is the latency your users at the 95th percentile are actually experiencing. If your SLA is "99% of requests under 2 seconds," P95 is the first alarm bell.
  • P99Ms: 99th percentile. Captures the worst-tail behavior. Spikes here often indicate lock contention, working-set pressure, or plan regression on a specific data distribution.
  • Shape: the normalized filter structure with all literal values replaced by 1. This is what allows Hatchet to group millions of log lines into a handful of actionable patterns.

In the example above, the top shape — {status:1, user_id:{$gt:1}} on mydb.orders — is responsible for over 9,000 seconds of cumulative slow-query time across 3,821 executions. That single access pattern, if indexed correctly, would eliminate the plurality of your slow-query load. That is the finding you bring to your 9 a.m. standup.

Plan Summaries and Index Recommendations

Hatchet also aggregates the planSummary field that MongoDB includes in each slow query log entry. The plan summary tells you how MongoDB executed the query — COLLSCAN means a full collection scan with no index used; IXSCAN shows the index that was selected.

text
PLAN SUMMARY BREAKDOWN
======================
Shape: {status:1,user_id:{$gt:1}} on mydb.orders
  COLLSCAN   :  3,821 executions  (100.0%)
  Recommendation: Consider index on { status: 1, user_id: 1 }

Shape: {token:1} on mydb.sessions
  COLLSCAN   :  2,104 executions  (100.0%)
  Recommendation: Consider index on { token: 1 }

Shape: {category:1,price:{$lte:1}} on mydb.products
  IXSCAN idx_category :    312 executions  (31.6%)
  COLLSCAN             :    676 executions  (68.4%)
  Recommendation: Consider compound index on { category: 1, price: 1 }

A shape that is 100% COLLSCAN is a near-certain missing index. A shape with mixed IXSCAN and COLLSCAN indicates a partial coverage problem — an index exists but does not cover all the fields in the filter, or the query planner is choosing a full scan because the selectivity of the existing index is too low for certain parameter ranges. Both findings are immediately actionable.

Tip

Hatchet's index recommendations are heuristic starting points, not definitive schema advice. After Hatchet identifies a candidate index, always run db.collection.explain("executionStats") with a representative query to confirm that the proposed index would actually be selected by the query planner, and to verify the executionStats.totalKeysExamined vs totalDocsExamined ratio improves as expected before creating the index in production.

Identifying Top Slow Queries by Namespace and Operation Type

Beyond the top-level summary, Hatchet provides breakdowns by namespace and operation type (find, aggregate, update, delete, getMore). This lets you answer questions like: "Is the aggregation pipeline or the read path causing more total latency on the events collection?"

text
# Filter analysis to a specific namespace
hatchet --log mongod.log --namespace mydb.orders

# Filter to a specific operation type
hatchet --log mongod.log --op find

# Combine filters: aggregations on the events collection only
hatchet --log mongod.log --namespace mydb.events --op aggregate

# Show the top N shapes (default is 10)
hatchet --log mongod.log --top 25

The --namespace and --op flags are particularly useful when you have already identified a problematic collection from application metrics (e.g., your APM tool shows mydb.events response times degraded) and you want to drill into that collection's specific access patterns without noise from the rest of the database.

Hatchet vs mtools: When to Use Each

mtools (specifically mloginfo and mplotqueries) is the older, Python-based toolkit for MongoDB log analysis and remains widely deployed. Both tools solve overlapping problems but have meaningfully different strengths:

Capability Hatchet mtools (mloginfo / mplotqueries)
Query shape grouping First-class feature — all output organized by shape. Basic grouping; less sophisticated normalization.
P95/P99 latency reporting Built-in per shape. Not available natively.
Index recommendations Yes, with plan summary aggregation. No.
Time-series visualization No built-in charts. Yes — mplotqueries generates matplotlib scatter plots.
Performance on large logs Fast — compiled Go binary, streaming parse. Slower on very large files — Python interpreter overhead.
MongoDB 4.4+ JSON logs Full native support. Partial — some mtools versions have JSON log parsing gaps.
Replica set event analysis Focused on slow queries. Yes — mloginfo also summarizes elections, rollbacks, restarts.

Practical rule: use Hatchet when your primary goal is identifying and ranking slow query patterns with actionable index guidance. Use mloginfo for quick replica set health summaries or when you need time-series charts via mplotqueries. In a mature operations workflow, both tools are present — Hatchet for deep query analysis, mtools for operational event investigation.

Workflow: From Log Collection to Index Creation

Here is the end-to-end workflow a senior DBA should follow when a performance incident implicates slow MongoDB queries:

Step 1: Collect the Log

For self-managed MongoDB, retrieve the current log file and any rotated archives covering the incident window:

text
# On the mongod host — copy log to analysis machine
scp mongo-primary:/var/log/mongodb/mongod.log ./incident-2025-09-14.log

# For rotated logs (gzip compressed)
scp mongo-primary:/var/log/mongodb/mongod.log.2025-09-14T00-00-00.gz .

For MongoDB Atlas, download logs from the Atlas UI (Cluster → Logs → Download) or via the Atlas CLI:

text
# Atlas CLI — download mongod logs for a specific host
atlas logs download  mongodb.gz --output ./incident.log.gz --start 1726272000 --end 1726293600

# Hatchet reads gzip files directly — no need to decompress
hatchet --log incident.log.gz

Step 2: Run Hatchet and Identify the Top Culprits

text
# Run with a lower slow-ms threshold to capture more context if the incident
# involved borderline-slow queries (e.g., 50ms instead of default 100ms)
hatchet --log incident.log.gz --slow-ms 50 --top 15 --output hatchet-report.txt

# Review the report — focus on Rank 1-5 by TotalMs
cat hatchet-report.txt

Step 3: Reproduce and EXPLAIN

Take the top shape from Hatchet's output and run it through MongoDB's explain with executionStats mode. Substitute representative literal values back into the normalized shape:

text
// Shape identified: {status:1, user_id:{$gt:1}} on mydb.orders
// Run EXPLAIN on a representative query

db.orders.explain("executionStats").find(
  { status: "pending", user_id: { $gt: 1000 } },
  { sort: { created_at: -1 }, limit: 50 }
)

The executionStats output will confirm whether a COLLSCAN is occurring (stage: "COLLSCAN" in the winning plan), show you how many documents were examined vs returned, and reveal the execution time in isolation.

Step 4: Create the Index

text
// Create the index identified by Hatchet — always use background build
// (In MongoDB 4.2+ background: true is the default for index builds)
db.orders.createIndex(
  { status: 1, user_id: 1, created_at: -1 },
  { name: "idx_orders_status_user_created", background: true }
)

// Monitor the index build progress
db.currentOp({ "command.createIndexes": { $exists: true } })

Step 5: Validate

text
// Re-run EXPLAIN after index creation to confirm the plan changed
db.orders.explain("executionStats").find(
  { status: "pending", user_id: { $gt: 1000 } },
  { sort: { created_at: -1 }, limit: 50 }
)
// Expect: winningPlan.stage = "FETCH" with inputStage.stage = "IXSCAN"
// Expect: executionStats.totalDocsExamined ≈ executionStats.nReturned
Warning

On a busy production replica set, index builds consume I/O and CPU. In MongoDB 4.4+, index builds hold a collection-level lock only at the very start and very end of the build — the bulk of the work happens without blocking reads or writes. However, on a heavily loaded primary, even a momentary lock acquisition at build completion can cause a latency spike. For large collections, always build indexes during a low-traffic window and monitor replication lag on secondaries during the build.

Integration with MongoDB Atlas Log Download

MongoDB Atlas stores compressed log archives for each cluster node. Hatchet integrates with Atlas logs without any preprocessing step — it reads gzip-compressed files directly and understands the 4.4+ JSON format that Atlas uses exclusively. The recommended pattern for Atlas users is to automate log download and Hatchet analysis as a scheduled task:

text
#!/bin/bash
# Scheduled script: download Atlas logs and run Hatchet analysis daily

CLUSTER_HOST="mycluster-shard-00-00.abc123.mongodb.net"
OUTPUT_DIR="/var/log/mongodb-analysis/$(date +%Y-%m-%d)"
mkdir -p "$OUTPUT_DIR"

# Download last 24 hours of logs (Unix timestamps)
START=$(date -d "24 hours ago" +%s)
END=$(date +%s)

atlas logs download "$CLUSTER_HOST" mongodb.gz \
  --output "$OUTPUT_DIR/mongod.log.gz" \
  --start "$START" \
  --end "$END"

# Run Hatchet analysis and store report
hatchet \
  --log "$OUTPUT_DIR/mongod.log.gz" \
  --slow-ms 100 \
  --top 20 \
  --output "$OUTPUT_DIR/hatchet-report.txt"

# Optional: email the report or push to a Slack channel
cat "$OUTPUT_DIR/hatchet-report.txt" | mail -s "MongoDB Slow Query Report $(date +%Y-%m-%d)" dba-team@example.com

Running this daily gives your team a proactive signal on emerging slow query patterns before they become incidents. A new shape appearing in the top 10 that was not present yesterday is an early warning that a recent deployment introduced a new access pattern without a corresponding index.

Key Takeaways
  • Hatchet is the most effective open-source tool available for MongoDB slow query analysis at scale — install it as a standard part of your DBA toolkit alongside mongostat, mongotop, and mtools.
  • Query shape normalization is what makes Hatchet valuable: aggregating millions of log lines into a ranked list of structural access patterns with P95/P99 latency makes the root cause of slow-query incidents immediately visible.
  • Always sort Hatchet output by TotalMs, not AvgMs — the highest-impact patterns to fix are those consuming the most cumulative database time, not necessarily the single slowest individual execution.
  • Plan summary aggregation (COLLSCAN vs IXSCAN ratios per shape) converts Hatchet's output from a diagnostic finding into a concrete action item: a shape that is 100% COLLSCAN needs an index.
  • Hatchet reads MongoDB 4.4+ JSON logs natively and handles gzip-compressed files directly — no preprocessing needed for Atlas log downloads.
  • Use mtools for time-series visualization and replica set event analysis; use Hatchet for query-shape analysis and index recommendations. The two tools are complementary.
  • Automate daily Hatchet runs against Atlas log downloads to catch emerging slow-query patterns before they escalate to incidents.

Diagnosing MongoDB performance at scale requires more than instinct and grep — it requires tooling that can reduce millions of log lines to a ranked, actionable list in seconds. If your team is navigating a complex MongoDB slow-query investigation, dealing with unexplained latency spikes, or planning a proactive query review before a traffic event, the team at JusDB can help. We work with engineering teams to establish systematic log analysis workflows, design compound indexes for high-traffic access patterns, and build the operational practices that prevent 2 a.m. incidents from recurring.

Share this article