NoSQL Databases

High Performance with MongoDB: A Top-Down Tuning Guide

A top-down playbook for high-performance MongoDB: measure with the profiler and explain(), model for access patterns, index by the ESR rule, keep the working set in the WiredTiger cache, pool connections, and scale reads with secondaries and sharding — with flow diagrams for each layer.

JusDB Team
June 6, 2026
14 min read
0 views

MongoDB is fast by default — until it isn't. The moment your working set outgrows RAM, an unindexed query starts collection-scanning millions of documents, or a chatty application opens a connection per request, the same database that handled 50,000 ops/sec begins timing out. High performance in MongoDB is not a single setting; it is a stack of decisions — data model, indexes, queries, memory, connections, and read scaling — where the layer you get wrong dominates everything below it. This guide is a top-down playbook: measure first, fix the highest-leverage layer, and verify, using the standard tooling (explain(), the database profiler, mongostat, $collStats) that ships with every MongoDB deployment.

TL;DR
  • Tune top-down: data model → indexes → queries → working set / RAM → connections → read scaling. The data model decides your ceiling; fix it before tuning anything below.
  • Keep the working set in RAM. WiredTiger's cache (default ~50% of RAM) must hold your hot documents and indexes, or every query pays a disk read.
  • Every query needs an index. A COLLSCAN in explain() on a large collection is the #1 cause of slow MongoDB.
  • Build indexes that match the ESR rule — Equality, then Sort, then Range — so one compound index serves filter + sort without an in-memory sort.
  • Model for your access pattern. Embed what you read together; reference what grows unbounded. Avoid the 16 MB document limit and unbounded array growth.
  • Pool connections in the driver (don't open one per request) and read-scale with secondary reads or sharding only after the single-node layers are tuned.
  • Measure with the profiler and explain("executionStats") before and after every change.

The MongoDB Performance Hierarchy

The most expensive mistake in MongoDB tuning is starting at the bottom — adding RAM or shards to compensate for a bad data model or a missing index. Work down this hierarchy in order. The leverage is highest at the top: a schema redesign or a missing index can be a 100–1000× change, while hardware and config are usually 2–5×.

The MongoDB Performance Hierarchy

MongoDB performance tuning hierarchy, highest leverage at the top Six tuning layers from data model at the top down to read scaling, each listing its focus and levers. Fix the upper layers before adding hardware or shards. Layer 1 · highest leverage · decides your ceilingDATA MODELembed vs reference · access patterns · 16 MB limit · array growth Layer 2 · ~100-1000x winsINDEXESESR rule · compound · covered · partial · kill every COLLSCAN Layer 3QUERIESprojection · $match early · avoid $where / regex scans · batch Layer 4 · keep the hot set in RAMWORKING SET / RAMWiredTiger cache · cache hit ratio · evictions Layer 5CONNECTIONSdriver pool size · reuse clients · avoid per-request connects Layer 6 · scale out only after the aboveREAD SCALINGsecondary reads · read concern · sharding + shard key
Fix the data model and indexes before adding RAM or shards — that is where the leverage lives.
Important

Change one layer at a time and measure with explain("executionStats") or the profiler before and after. Adding a shard to mask a missing index just spreads the slow scan across more machines.

Step 1: Measure First — Profiler and explain()

You cannot tune what you have not measured. MongoDB ships two essential tools: the database profiler, which logs slow operations, and explain(), which shows exactly how a query executes.

Enable the profiler to capture every operation slower than 100 ms (level 1), then read the slowest from the system.profile capped collection:

javascript
// Log operations slower than 100ms
db.setProfilingLevel(1, { slowms: 100 });

// Find the slowest recent operations
db.system.profile.find()
  .sort({ millis: -1 })
  .limit(10)
  .pretty();

For any slow query, run explain("executionStats") and look at three numbers: the stage, documents examined vs returned, and whether an index was used.

javascript
db.orders.find({ status: "pending", region: "us-east" })
  .sort({ createdAt: -1 })
  .explain("executionStats");

The four signals below catch the vast majority of MongoDB performance problems. The deciding number is the ratio of totalDocsExamined to nReturned — if MongoDB scanned 1,000,000 documents to return 10, you have a missing or wrong index.

What to look for in explain("executionStats")

MongoDB explain symptom-to-fix reference A two-column reference mapping four explain symptoms to their meaning and fix: COLLSCAN stage, docs examined far exceeding returned, in-memory SORT stage, and high keys examined. SIGNALWHAT IT MEANS / FIX stage: COLLSCANNo index used — full collection scan.→ create an index matching the filter. docsExamined >> nReturned(e.g. 1M scanned, 10 returned)Index too broad / not selective.→ add a more selective compound index. stage: SORT (in memory)hasSortStage: trueSort not served by an index (32 MB cap).→ index with sort key (ESR rule). keysExamined >> nReturnedindex scanned, many rejectedWrong key order or unbounded range.→ reorder keys; tighten the range.
The ratio of documents examined to documents returned is the single best health signal.

Step 2: Data Model — The Layer That Sets Your Ceiling

In MongoDB the schema is the performance plan. Relational instincts — normalize everything, join at read time — produce slow MongoDB. The guiding principle is model for the access pattern: data that is read together should be stored together.

  • Embed when data is accessed together and the embedded set is bounded — an order and its line items, a post and its top comments. One document read, no join.
  • Reference when data is large, grows unbounded, or is shared — a user referenced by millions of events, a product catalog. Embedding these blows past the 16 MB document limit and rewrites the whole document on every update.
  • Avoid unbounded arrays. An array that grows forever (every event appended to one document) means ever-larger documents, slower updates, and eventual document-size failure. Use the bucket pattern or a separate collection.
  • Pre-compute and denormalize the values you read on the hot path. Storing a commentCount beats counting a referenced collection on every page load.

Embed vs Reference — the modeling decision

Choosing between embedding and referencing in MongoDB A decision: is the related data read together and bounded? If yes, embed it for a single-read document. If no — large, unbounded, or shared — reference it in a separate collection. Is the related data read togetherAND bounded in size? YES NO (large / unbounded / shared) EMBED{ _id, total, items: [ {...}, {...} ] }one read, no joinorder + line items,post + top comments REFERENCEevents: { userId: ObjectId }separate collectionstays under 16 MBuser → millions of events,shared catalog
Embed what you read together and bounded; reference what is large, unbounded, or shared.

Step 3: Indexes — Kill Every COLLSCAN

After the data model, indexes are the highest-leverage fix. A collection scan over millions of documents is slow no matter how much RAM you add. The most important rule for compound indexes is ESR — Equality, Sort, Range: list equality-match fields first, then the sort field, then range fields. An index built this way serves the filter and the sort from the index alone, avoiding an in-memory sort.

javascript
// Query: equality on status + region, sort by createdAt, range on amount
db.orders.find({ status: "pending", region: "us-east", amount: { $gt: 100 } })
         .sort({ createdAt: -1 });

// ESR-ordered compound index: Equality (status, region), Sort (createdAt), Range (amount)
db.orders.createIndex({ status: 1, region: 1, createdAt: -1, amount: 1 });
  • Covered queries — if the index contains every field the query needs (filter + projection), MongoDB answers from the index without touching documents. Project only what you need and include those fields in the index.
  • Partial indexes — index only the documents you actually query (e.g. status: "active"), shrinking the index and its memory footprint.
  • Drop unused indexes — every index slows writes and consumes cache. Use $indexStats to find indexes with zero accesses.ops and remove them.
  • Build in the background on production — modern MongoDB builds indexes with minimal locking, but still schedule large builds for low-traffic windows.
javascript
// Find indexes that are never used — pure write overhead, drop them
db.orders.aggregate([ { $indexStats: {} } ])
  .forEach(i => print(i.name, i.accesses.ops));
Tip

For the full mechanics of single-field, compound, multikey, text, and wildcard indexes — plus index intersection and prefix rules — see the dedicated MongoDB indexing guide. And watch for index bloat over time; our index fragmentation guide covers reclaiming space with compact and rebuilds.

Step 4: Queries and Aggregation

With a good model and the right indexes, query-level tuning closes the remaining gap:

  • Project only the fields you need. Returning whole documents wastes network, cache, and deserialization time. find(filter, { name: 1, email: 1 }) can also enable a covered query.
  • Filter early in aggregation. Put $match and $sort as the first stages so they use an index and shrink the pipeline before $group/$lookup run. A $match after a $group cannot use an index.
  • Avoid $where, JavaScript, and unanchored regex. These force collection scans. An anchored prefix regex (/^abc/) can use an index; /abc/ cannot.
  • Batch writes with bulkWrite() and reads with $in instead of N+1 round-trips.
  • Mind $lookup. Joins are expensive; ensure the foreign field is indexed, or denormalize to avoid the lookup entirely on hot paths.

Step 5: Working Set and WiredTiger Cache

MongoDB's storage engine, WiredTiger, keeps recently used documents and indexes in an in-memory cache — by default about 50% of (RAM − 1 GB). Performance falls off a cliff when your working set (the documents and index entries you actually touch) no longer fits in this cache, because every query then triggers a disk read.

The working set, cache, and disk

How the working set, WiredTiger cache, and disk interact A query first checks the WiredTiger cache. A cache hit returns from RAM and is fast. A cache miss reads from disk, is slow, and evicts another page. When the working set exceeds the cache, miss rate climbs and performance degrades. query / read WiredTiger cache (RAM)~50% of (RAM - 1 GB) cache HIT cache MISS served from RAM — fastmicroseconds disk read — slow + evictmilliseconds working set > cache → miss rate climbs → cliff
Keep the working set in cache. A high cache-miss rate is the warning sign before the performance cliff.

Monitor the cache with mongostat and the server's WiredTiger statistics. The key metrics are the cache usage (should stay below the eviction trigger) and pages read into cache (sustained high values mean the working set doesn't fit).

javascript
// WiredTiger cache pressure signals
const wt = db.serverStatus().wiredTiger.cache;
print("bytes in cache:", wt["bytes currently in the cache"]);
print("max bytes:",      wt["maximum bytes configured"]);
print("pages read in:",  wt["pages read into cache"]);
Warning

Do not blindly raise the WiredTiger cache to 80–90% of RAM. The OS, connections, aggregation, and in-memory sorts all need memory too; over-allocating the cache causes swapping, which is far slower than a tuned cache. Add RAM (or reduce the working set with better indexes and projection) rather than starving the rest of the system.

Step 6: Connections and Read Scaling

Two final layers, in order. First, connection pooling: every MongoDB driver maintains a pool of reusable connections. The classic anti-pattern is creating a new MongoClient per request — connection churn exhausts server file descriptors and adds handshake latency. Create one client at startup, reuse it everywhere, and size the pool to your concurrency.

javascript
// Create ONCE at startup and reuse — never per request
const client = new MongoClient(uri, {
  maxPoolSize: 100,   // upper bound on concurrent connections
  minPoolSize: 10,    // keep warm connections ready
  maxIdleTimeMS: 60000
});

Only after the single-node layers are tuned does scaling out make sense:

  • Secondary reads — in a replica set, route read-heavy, latency-tolerant queries (analytics, reports) to secondaries with a read preference, leaving the primary for writes and strongly-consistent reads. Mind replication lag for data that must be current.
  • Sharding — when one node's RAM or write throughput is genuinely the ceiling, shard the collection. Everything depends on the shard key: it must distribute writes evenly (avoid monotonically increasing keys that create a hot shard) and match your common query filter so queries are targeted to one shard rather than scattered across all of them.
Important

A bad shard key is nearly impossible to change later and can make performance worse than a single node — every query becomes a scatter-gather across all shards. Choose a key with high cardinality, even write distribution, and alignment to your query patterns. Shard to solve a proven RAM or write-throughput ceiling, not preemptively.

Frequently Asked Questions

Why is my MongoDB query slow even though it returns few documents?

Run explain("executionStats") and compare totalDocsExamined to nReturned. If MongoDB examined far more documents than it returned, the query is doing a collection scan (COLLSCAN) or using a non-selective index. Create a compound index that matches the filter and sort using the ESR rule (Equality, Sort, Range).

How much RAM does MongoDB need?

Enough to hold your working set — the documents and index entries you actively query — in the WiredTiger cache, which defaults to about 50% of (RAM − 1 GB). If serverStatus shows a sustained high rate of pages read into cache, your working set no longer fits and you need more RAM, a smaller working set (better indexes and projection), or both.

What is the ESR rule for MongoDB indexes?

ESR stands for Equality, Sort, Range — the optimal field order for a compound index. Put fields matched by equality first, then the field you sort on, then fields matched by a range. This lets one index satisfy both the filter and the sort, avoiding a slow in-memory sort stage.

Embed when the related data is read together and bounded in size (an order and its line items). Reference when the data is large, grows unbounded, or is shared across many documents (events belonging to a user). Embedding unbounded data risks hitting the 16 MB document limit and rewriting the whole document on every update.

When should I shard a MongoDB collection?

Shard only after tuning the data model, indexes, queries, working set, and connections — and only when a single node's RAM or write throughput is genuinely the ceiling. Sharding success depends entirely on the shard key: it must distribute writes evenly and match your query filters so reads target a single shard instead of scattering across all of them.

Key Takeaways
  • Tune top-down — data model and indexes deliver 100–1000× wins; RAM and sharding are only 2–5×, so fix the upper layers first.
  • Measure with the profiler and explain("executionStats"); the docs-examined-to-returned ratio is your best health signal.
  • Model for the access pattern — embed what you read together and bounded, reference what is large or unbounded.
  • Index by the ESR rule (Equality, Sort, Range) and kill every COLLSCAN on a large collection.
  • Keep the working set in the WiredTiger cache; a rising cache-miss rate is the warning before the cliff.
  • Pool connections in the driver, and scale out with secondary reads or sharding only after the single-node layers are tuned.

High-Performance MongoDB with JusDB

A performance playbook gives you the method; production gives you the edge cases — the working set that quietly outgrew RAM, the shard key chosen two years ago that now hot-spots one node, the aggregation pipeline that was fast at 10,000 documents and falls over at 10 million. JusDB's MongoDB team does this work daily across self-managed, Atlas, and on-prem deployments: data-model and schema review, index and query optimization, WiredTiger and working-set tuning, replica-set and sharding architecture, and 24×7 production support.

If you want a second set of eyes on a slow cluster — or a standing partner to keep it fast as it grows — talk to the JusDB team. To go deeper on the layers in this guide, read the MongoDB indexing guide and learn to mine your logs for slow operations with Hatchet log analysis.

Share this article

JusDB Team

Official JusDB content team