MongoDB is fast by default — until it isn't. The moment your working set outgrows RAM, an unindexed query starts collection-scanning millions of documents, or a chatty application opens a connection per request, the same database that handled 50,000 ops/sec begins timing out. High performance in MongoDB is not a single setting; it is a stack of decisions — data model, indexes, queries, memory, connections, and read scaling — where the layer you get wrong dominates everything below it. This guide is a top-down playbook: measure first, fix the highest-leverage layer, and verify, using the standard tooling (explain(), the database profiler, mongostat, $collStats) that ships with every MongoDB deployment.
- Tune top-down: data model → indexes → queries → working set / RAM → connections → read scaling. The data model decides your ceiling; fix it before tuning anything below.
- Keep the working set in RAM. WiredTiger's cache (default ~50% of RAM) must hold your hot documents and indexes, or every query pays a disk read.
- Every query needs an index. A
COLLSCANinexplain()on a large collection is the #1 cause of slow MongoDB. - Build indexes that match the ESR rule — Equality, then Sort, then Range — so one compound index serves filter + sort without an in-memory sort.
- Model for your access pattern. Embed what you read together; reference what grows unbounded. Avoid the 16 MB document limit and unbounded array growth.
- Pool connections in the driver (don't open one per request) and read-scale with secondary reads or sharding only after the single-node layers are tuned.
- Measure with the profiler and
explain("executionStats")before and after every change.
The MongoDB Performance Hierarchy
The most expensive mistake in MongoDB tuning is starting at the bottom — adding RAM or shards to compensate for a bad data model or a missing index. Work down this hierarchy in order. The leverage is highest at the top: a schema redesign or a missing index can be a 100–1000× change, while hardware and config are usually 2–5×.
The MongoDB Performance Hierarchy
Change one layer at a time and measure with explain("executionStats") or the profiler before and after. Adding a shard to mask a missing index just spreads the slow scan across more machines.
Step 1: Measure First — Profiler and explain()
You cannot tune what you have not measured. MongoDB ships two essential tools: the database profiler, which logs slow operations, and explain(), which shows exactly how a query executes.
Enable the profiler to capture every operation slower than 100 ms (level 1), then read the slowest from the system.profile capped collection:
// Log operations slower than 100ms
db.setProfilingLevel(1, { slowms: 100 });
// Find the slowest recent operations
db.system.profile.find()
.sort({ millis: -1 })
.limit(10)
.pretty();For any slow query, run explain("executionStats") and look at three numbers: the stage, documents examined vs returned, and whether an index was used.
db.orders.find({ status: "pending", region: "us-east" })
.sort({ createdAt: -1 })
.explain("executionStats");The four signals below catch the vast majority of MongoDB performance problems. The deciding number is the ratio of totalDocsExamined to nReturned — if MongoDB scanned 1,000,000 documents to return 10, you have a missing or wrong index.
What to look for in explain("executionStats")
Step 2: Data Model — The Layer That Sets Your Ceiling
In MongoDB the schema is the performance plan. Relational instincts — normalize everything, join at read time — produce slow MongoDB. The guiding principle is model for the access pattern: data that is read together should be stored together.
- Embed when data is accessed together and the embedded set is bounded — an order and its line items, a post and its top comments. One document read, no join.
- Reference when data is large, grows unbounded, or is shared — a user referenced by millions of events, a product catalog. Embedding these blows past the 16 MB document limit and rewrites the whole document on every update.
- Avoid unbounded arrays. An array that grows forever (every event appended to one document) means ever-larger documents, slower updates, and eventual document-size failure. Use the bucket pattern or a separate collection.
- Pre-compute and denormalize the values you read on the hot path. Storing a
commentCountbeats counting a referenced collection on every page load.
Embed vs Reference — the modeling decision
Step 3: Indexes — Kill Every COLLSCAN
After the data model, indexes are the highest-leverage fix. A collection scan over millions of documents is slow no matter how much RAM you add. The most important rule for compound indexes is ESR — Equality, Sort, Range: list equality-match fields first, then the sort field, then range fields. An index built this way serves the filter and the sort from the index alone, avoiding an in-memory sort.
// Query: equality on status + region, sort by createdAt, range on amount
db.orders.find({ status: "pending", region: "us-east", amount: { $gt: 100 } })
.sort({ createdAt: -1 });
// ESR-ordered compound index: Equality (status, region), Sort (createdAt), Range (amount)
db.orders.createIndex({ status: 1, region: 1, createdAt: -1, amount: 1 });- Covered queries — if the index contains every field the query needs (filter + projection), MongoDB answers from the index without touching documents. Project only what you need and include those fields in the index.
- Partial indexes — index only the documents you actually query (e.g.
status: "active"), shrinking the index and its memory footprint. - Drop unused indexes — every index slows writes and consumes cache. Use
$indexStatsto find indexes with zeroaccesses.opsand remove them. - Build in the background on production — modern MongoDB builds indexes with minimal locking, but still schedule large builds for low-traffic windows.
// Find indexes that are never used — pure write overhead, drop them
db.orders.aggregate([ { $indexStats: {} } ])
.forEach(i => print(i.name, i.accesses.ops));For the full mechanics of single-field, compound, multikey, text, and wildcard indexes — plus index intersection and prefix rules — see the dedicated MongoDB indexing guide. And watch for index bloat over time; our index fragmentation guide covers reclaiming space with compact and rebuilds.
Step 4: Queries and Aggregation
With a good model and the right indexes, query-level tuning closes the remaining gap:
- Project only the fields you need. Returning whole documents wastes network, cache, and deserialization time.
find(filter, { name: 1, email: 1 })can also enable a covered query. - Filter early in aggregation. Put
$matchand$sortas the first stages so they use an index and shrink the pipeline before$group/$lookuprun. A$matchafter a$groupcannot use an index. - Avoid
$where, JavaScript, and unanchored regex. These force collection scans. An anchored prefix regex (/^abc/) can use an index;/abc/cannot. - Batch writes with
bulkWrite()and reads with$ininstead of N+1 round-trips. - Mind
$lookup. Joins are expensive; ensure the foreign field is indexed, or denormalize to avoid the lookup entirely on hot paths.
Step 5: Working Set and WiredTiger Cache
MongoDB's storage engine, WiredTiger, keeps recently used documents and indexes in an in-memory cache — by default about 50% of (RAM − 1 GB). Performance falls off a cliff when your working set (the documents and index entries you actually touch) no longer fits in this cache, because every query then triggers a disk read.
The working set, cache, and disk
Monitor the cache with mongostat and the server's WiredTiger statistics. The key metrics are the cache usage (should stay below the eviction trigger) and pages read into cache (sustained high values mean the working set doesn't fit).
// WiredTiger cache pressure signals
const wt = db.serverStatus().wiredTiger.cache;
print("bytes in cache:", wt["bytes currently in the cache"]);
print("max bytes:", wt["maximum bytes configured"]);
print("pages read in:", wt["pages read into cache"]);Do not blindly raise the WiredTiger cache to 80–90% of RAM. The OS, connections, aggregation, and in-memory sorts all need memory too; over-allocating the cache causes swapping, which is far slower than a tuned cache. Add RAM (or reduce the working set with better indexes and projection) rather than starving the rest of the system.
Step 6: Connections and Read Scaling
Two final layers, in order. First, connection pooling: every MongoDB driver maintains a pool of reusable connections. The classic anti-pattern is creating a new MongoClient per request — connection churn exhausts server file descriptors and adds handshake latency. Create one client at startup, reuse it everywhere, and size the pool to your concurrency.
// Create ONCE at startup and reuse — never per request
const client = new MongoClient(uri, {
maxPoolSize: 100, // upper bound on concurrent connections
minPoolSize: 10, // keep warm connections ready
maxIdleTimeMS: 60000
});Only after the single-node layers are tuned does scaling out make sense:
- Secondary reads — in a replica set, route read-heavy, latency-tolerant queries (analytics, reports) to secondaries with a read preference, leaving the primary for writes and strongly-consistent reads. Mind replication lag for data that must be current.
- Sharding — when one node's RAM or write throughput is genuinely the ceiling, shard the collection. Everything depends on the shard key: it must distribute writes evenly (avoid monotonically increasing keys that create a hot shard) and match your common query filter so queries are targeted to one shard rather than scattered across all of them.
A bad shard key is nearly impossible to change later and can make performance worse than a single node — every query becomes a scatter-gather across all shards. Choose a key with high cardinality, even write distribution, and alignment to your query patterns. Shard to solve a proven RAM or write-throughput ceiling, not preemptively.
Frequently Asked Questions
Why is my MongoDB query slow even though it returns few documents?
Run explain("executionStats") and compare totalDocsExamined to nReturned. If MongoDB examined far more documents than it returned, the query is doing a collection scan (COLLSCAN) or using a non-selective index. Create a compound index that matches the filter and sort using the ESR rule (Equality, Sort, Range).
How much RAM does MongoDB need?
Enough to hold your working set — the documents and index entries you actively query — in the WiredTiger cache, which defaults to about 50% of (RAM − 1 GB). If serverStatus shows a sustained high rate of pages read into cache, your working set no longer fits and you need more RAM, a smaller working set (better indexes and projection), or both.
What is the ESR rule for MongoDB indexes?
ESR stands for Equality, Sort, Range — the optimal field order for a compound index. Put fields matched by equality first, then the field you sort on, then fields matched by a range. This lets one index satisfy both the filter and the sort, avoiding a slow in-memory sort stage.
Should I embed or reference related data in MongoDB?
Embed when the related data is read together and bounded in size (an order and its line items). Reference when the data is large, grows unbounded, or is shared across many documents (events belonging to a user). Embedding unbounded data risks hitting the 16 MB document limit and rewriting the whole document on every update.
When should I shard a MongoDB collection?
Shard only after tuning the data model, indexes, queries, working set, and connections — and only when a single node's RAM or write throughput is genuinely the ceiling. Sharding success depends entirely on the shard key: it must distribute writes evenly and match your query filters so reads target a single shard instead of scattering across all of them.
- Tune top-down — data model and indexes deliver 100–1000× wins; RAM and sharding are only 2–5×, so fix the upper layers first.
- Measure with the profiler and
explain("executionStats"); the docs-examined-to-returned ratio is your best health signal. - Model for the access pattern — embed what you read together and bounded, reference what is large or unbounded.
- Index by the ESR rule (Equality, Sort, Range) and kill every
COLLSCANon a large collection. - Keep the working set in the WiredTiger cache; a rising cache-miss rate is the warning before the cliff.
- Pool connections in the driver, and scale out with secondary reads or sharding only after the single-node layers are tuned.
High-Performance MongoDB with JusDB
A performance playbook gives you the method; production gives you the edge cases — the working set that quietly outgrew RAM, the shard key chosen two years ago that now hot-spots one node, the aggregation pipeline that was fast at 10,000 documents and falls over at 10 million. JusDB's MongoDB team does this work daily across self-managed, Atlas, and on-prem deployments: data-model and schema review, index and query optimization, WiredTiger and working-set tuning, replica-set and sharding architecture, and 24×7 production support.
If you want a second set of eyes on a slow cluster — or a standing partner to keep it fast as it grows — talk to the JusDB team. To go deeper on the layers in this guide, read the MongoDB indexing guide and learn to mine your logs for slow operations with Hatchet log analysis.