NoSQL Databases

Handling Index Fragmentation in MongoDB: Detection and Remediation

MongoDB indexes fragment over time as documents are inserted, updated, and deleted. Here's how to detect fragmentation, measure its performance impact, and rebuild indexes safely.

JusDB Team
February 18, 2025
8 min read
151 views

Your MongoDB cluster is running on modern hardware, your queries are hitting the right indexes, and explain() confirms the planner is choosing the correct index path — yet p95 read latency has been drifting upward for three weeks and nobody can explain why. You have not changed application code. You have not dropped or rebuilt any indexes. The data volume grew by roughly 40% over the quarter, mostly through updates to existing documents rather than net-new inserts. What you are seeing is not a missing index problem or a query shape regression: it is WiredTiger B-tree fragmentation silently degrading the physical efficiency of indexes that look perfectly healthy on the surface. Fragmentation of this kind accumulates invisibly, creates measurable IXSCAN overhead, and will not resolve itself — until you know how to detect and remediate it.

TL;DR
  • MongoDB index fragmentation occurs when frequent updates, deletes, and in-place document growth cause WiredTiger B-tree pages to become sparsely filled, increasing the number of page reads required per index scan.
  • Detect it via db.collection.stats() — compare indexSizes against logical data size, and inspect wiredTiger.cache metrics for high eviction rates and cache miss pressure.
  • Use the $indexStats aggregation to identify unused indexes (candidates for removal) and heavily used indexes (candidates for prioritised rebuild).
  • Rebuild indexes safely on replica sets with a rolling strategy: drop and rebuild on each secondary first, then step down the primary and rebuild last — never run db.collection.reIndex() on a live primary.
  • The compact command reclaims space and rewrites the collection's data and index files, but it takes an exclusive lock and should be run on secondaries in a rolling fashion.
  • Set up size-growth-rate alerts on totalIndexSize so fragmentation is caught in the drift phase, not after latency has already degraded.

What Index Fragmentation Means in MongoDB

MongoDB's default storage engine, WiredTiger, stores collection data and indexes in separate B-tree structures on disk. Each B-tree is composed of fixed-size pages. When a new document is inserted or an index key is added, WiredTiger writes to a leaf page. When that page reaches its fill factor threshold, WiredTiger splits it into two pages, each roughly half full. This split-and-half-fill behavior is by design — it reserves space for future insertions without requiring an immediate rebalance.

Fragmentation occurs when those reserved spaces never get filled. In workloads dominated by random-access updates, document deletions, or documents that grow in size and require relocation, the B-tree accumulates pages that are 30–50% utilized rather than the 80–90% utilization of a freshly built index. The logical size of the index — the number of key-value pairs it stores — stays the same, but the physical size on disk and in the WiredTiger cache expands. Every index range scan must now read more pages to traverse the same logical key space, and the cache must hold more pages to serve the same working set. The result is increased IXSCAN cost and cache eviction pressure, both of which manifest as elevated query latency.

Warning

Fragmentation is not visible in explain() output. The query planner will still report the index as the winning plan and show an IXSCAN stage. The fragmentation tax is paid at the storage layer as additional page reads — it shows up in execution time and in WiredTiger cache metrics, not in plan selection.

How Fragmentation Accumulates

Three workload patterns are the primary drivers of index fragmentation in production MongoDB deployments.

Frequent in-place field updates. Every update to an indexed field removes the old key from the B-tree and inserts the new key at a different position. The removal leaves a gap on the old leaf page; the insertion may trigger a page split at the new position. Over millions of updates, the B-tree develops chronic low page utilization throughout its leaf level.

Document deletions. Deleting documents removes their index entries. Unlike a new-key insertion, a deletion does not fill in gaps on adjacent pages — it simply marks space as available for future reuse. In collections with continuous delete-heavy churn (session stores, event queues, audit logs after TTL exhaustion), large contiguous regions of index pages become sparsely populated. The B-tree does not automatically consolidate these pages.

Document growth triggering record relocation. When an update causes a document to exceed its allocated on-disk space, WiredTiger moves the document to a new location. Indexes storing that document's _id and any indexed fields must be updated to point to the new physical location, creating the same delete-then-insert fragmentation pattern as an explicit field update.

Tip

Schema design that avoids unbounded array growth and uses fixed-width field types (ObjectId, integer, date) for indexed fields significantly slows the fragmentation rate. If your workload requires frequent updates to indexed string or embedded-document fields, schedule fragmentation checks more aggressively — quarterly is not sufficient for high-velocity update workloads.

Detecting Fragmentation: Collection Stats and WiredTiger Metrics

The starting point for fragmentation diagnosis is db.collection.stats(). The key fields to examine are indexSizes, totalIndexSize, and the nested wiredTiger block.

javascript
// Get full stats with byte-level precision (no scale divisor)
var stats = db.orders.stats();

// Total data size (compressed, on disk)
print("Data size (bytes):        " + stats.size);
print("Storage size (bytes):     " + stats.storageSize);

// Index sizes per index
printjson(stats.indexSizes);

// Total index footprint
print("Total index size (bytes): " + stats.totalIndexSize);

// WiredTiger cache metrics for the collection
var wt = stats.wiredTiger;
print("Pages read into cache:    " + wt.cache["pages read into cache"]);
print("Pages evicted from cache: " + wt.cache["pages evicted because they exceeded the in-memory maximum"]);
print("Cache hits:               " + wt.cache["pages requested from the cache"]);

A healthy index has a totalIndexSize that is proportional to the number of indexed documents and the average key size. As a rough heuristic, if totalIndexSize exceeds 20–30% of storageSize on a collection with modest cardinality indexes, or if index size has grown faster than document count over the past 30 days, fragmentation is a likely contributor.

The WiredTiger cache metrics tell a more direct story. High values for pages evicted because they exceeded the in-memory maximum combined with a growing pages read into cache rate indicate that the working set of index pages no longer fits in the configured WiredTiger cache. This is the mechanical link between fragmentation and latency: more physical pages per logical key range means a larger cache footprint for the same query patterns, which displaces other working-set data and increases cache miss rates.

javascript
// Server-level WiredTiger cache health check
var ss = db.adminCommand({ serverStatus: 1 });
var cache = ss.wiredTiger.cache;

printjson({
  "cache size (MB)":              Math.round(cache["maximum bytes configured"] / 1048576),
  "bytes currently in cache":     Math.round(cache["bytes currently in the cache"] / 1048576) + " MB",
  "bytes read into cache":        Math.round(cache["bytes read into cache"] / 1048576) + " MB",
  "pages evicted by app threads": cache["pages evicted by application threads"],
  "unmodified pages evicted":     cache["unmodified pages evicted"],
  "tracked dirty bytes":          Math.round(cache["tracked dirty bytes in the cache"] / 1048576) + " MB"
});

// Eviction pressure indicator: if "pages evicted by application threads" > 0,
// eviction is happening synchronously during query execution — a strong signal
// that cache is undersized relative to working set, often exacerbated by fragmentation

Finding Unused and Overused Indexes with $indexStats

Before committing to a rebuild, use $indexStats to understand which indexes are actually being used. Rebuilding indexes that are never accessed is wasted operational effort; more importantly, indexes that are never used are pure overhead — they consume cache space and increase write amplification with every insert or update, both of which compound fragmentation on the indexes that do matter.

javascript
// Get usage statistics for all indexes on the orders collection
db.orders.aggregate([
  { $indexStats: {} },
  {
    $project: {
      name: 1,
      "accesses.ops": 1,
      "accesses.since": 1,
      host: 1
    }
  },
  { $sort: { "accesses.ops": -1 } }
]).forEach(printjson);

The accesses.ops counter resets when mongod restarts, so interpret it relative to accesses.since. An index with zero ops since the last restart — especially if the server has been running for more than a week in production — is a strong candidate for removal. An index with hundreds of millions of ops is your highest-priority fragmentation target: it is on the critical query path and degraded physical efficiency there directly impacts user-facing latency.

Tip

On replica sets, run $indexStats on both the primary and each secondary independently. Read traffic from analytics or reporting workloads often hits secondaries with different index patterns than the primary sees. An index that looks unused on the primary may be carrying heavy load on a read-preferred secondary.

Rebuilding Indexes: reIndex and Its Risks

db.collection.reIndex() drops and recreates all indexes on a collection in a single operation. The rebuilt indexes are written as fully packed B-trees with near-optimal page utilization, which eliminates accumulated fragmentation entirely. On standalone instances and in testing environments, this is the fastest remediation path.

javascript
// Rebuild all indexes on the orders collection (do NOT run on a live primary)
db.orders.reIndex();

// Or rebuild a specific index by dropping and recreating it
db.orders.dropIndex("status_1_createdAt_1");
db.orders.createIndex(
  { status: 1, createdAt: -1 },
  { background: true, name: "status_1_createdAt_1" }
);
Warning

db.collection.reIndex() holds an exclusive write lock on the collection for the duration of the rebuild. On large collections, this lock can be held for minutes to hours, making the collection completely unavailable for reads and writes. Never run reIndex() on the primary of a production replica set. Use the rolling rebuild procedure described below instead.

Rolling Index Rebuild on Replica Sets

The safe production path for index rebuilds on a replica set rebuilds each member sequentially while the others continue to serve traffic. Because index builds do not replicate through the oplog (each node builds its own index locally), you can rebuild on one node at a time without affecting the set's overall consistency.

javascript
// Step 1: On each secondary (repeat for all secondaries first)
// Connect to the secondary directly (not via the replica set connection string)

// Confirm you are on a secondary before proceeding
rs.isMaster().ismaster; // Must return false

// Drop the fragmented index
db.orders.dropIndex("status_1_createdAt_1");

// Rebuild it — on MongoDB 4.4+ this is always foreground on the node
// (background: true is ignored for replica set members building after initial sync)
db.orders.createIndex(
  { status: 1, createdAt: -1 },
  { name: "status_1_createdAt_1" }
);

// Monitor build progress
db.adminCommand({
  currentOp: true,
  $or: [
    { op: "command", "command.createIndexes": { $exists: true } },
    { op: "none", "msg": /Index Build/ }
  ]
});
javascript
// Step 2: After all secondaries are rebuilt, step down the primary
// Connect to the current primary
rs.stepDown(60); // 60-second stepdown, allows a secondary to be elected

// Step 3: Rebuild the index on the former primary (now a secondary)
// Confirm it is now a secondary
rs.isMaster().ismaster; // Must return false

db.orders.dropIndex("status_1_createdAt_1");
db.orders.createIndex(
  { status: 1, createdAt: -1 },
  { name: "status_1_createdAt_1" }
);

Between each step, allow the rebuilt node to fully catch up in oplog replication before proceeding to the next member. Check lag with rs.printSecondaryReplicationInfo() and only proceed when lag is below your acceptable threshold.

The compact Command: When It Helps and When It Doesn't

The compact command rewrites the entire collection — both collection data and all indexes — in place, producing a compacted on-disk representation with optimal page fill ratios. Unlike reIndex(), which only rebuilds indexes, compact also defragments the collection data file itself, reclaiming space from deleted documents and relocated records.

javascript
// Run compact on a secondary (never on primary in production)
// This command blocks until completion and holds an exclusive lock on the collection
db.runCommand({ compact: "orders" });

// On MongoDB 6.0+ you can pass force: true to run on primary if needed (not recommended)
db.runCommand({ compact: "orders", force: true });

compact is the right tool when both collection data and indexes are fragmented — common after large-scale bulk deletes or after running the TTL monitor catch-up deletions. It is not useful if fragmentation is limited to indexes while the collection data is well-structured; in that case, a targeted index rebuild is faster and requires less exclusive lock time.

Warning

compact requires an exclusive collection lock for the full duration of the operation. On a 100 GB collection, compact can run for 30–90 minutes depending on storage throughput. Always run it on a secondary that has been removed from the read pool for the duration, and confirm the secondary has fully rejoined the set before moving to the next node.

WiredTiger Cache Pressure as a Fragmentation Symptom

When fragmentation is severe enough to materially increase the physical page count required for common index scans, the WiredTiger cache begins showing characteristic pressure signals. The most actionable metrics to track are the ratio of cache bytes in use to configured cache size, and the eviction rate of unmodified (clean) pages.

javascript
// Cache utilization ratio — alert if consistently above 90%
var ss = db.adminCommand({ serverStatus: 1 });
var c = ss.wiredTiger.cache;
var utilizationPct = (c["bytes currently in the cache"] / c["maximum bytes configured"]) * 100;
print("Cache utilization: " + utilizationPct.toFixed(1) + "%");

// Eviction of unmodified pages means cache is full and kicking out data
// that was not dirty — direct evidence of working set exceeding cache
print("Clean page evictions: " + c["unmodified pages evicted"]);

// If application threads are doing eviction work, queries are stalling
// waiting for eviction to complete before their pages can be loaded
print("App thread evictions: " + c["pages evicted by application threads"]);

Cache pressure from fragmentation is self-reinforcing: as the cache fills with sparsely packed index pages, there is less room for the collection data pages that filter stages need, which forces more reads from disk, which increases scan latency, which increases the number of in-flight queries, which further increases cache demand. Rebuilding the fragmented indexes breaks this cycle by reducing the physical page count required for the same logical key traversal.

Proactive Monitoring: Alerting on Index Size Growth Rate

The most effective way to handle fragmentation is to catch it in the growth phase — before it has degraded performance — rather than diagnosing it reactively after latency has already increased. The key metric to track is totalIndexSize relative to document count over time.

javascript
// Fragmentation monitoring script — run every 15 minutes via cron or monitoring agent
// Store output in a time-series collection or external metrics system

var collections = ["orders", "sessions", "events", "users"];
var dbName = "myapp";
var targetDb = db.getSiblingDB(dbName);

collections.forEach(function(collName) {
  var stats = targetDb[collName].stats();
  var record = {
    ts:             new Date(),
    collection:     dbName + "." + collName,
    docCount:       stats.count,
    dataSizeBytes:  stats.size,
    storageSizeBytes: stats.storageSize,
    indexSizeBytes: stats.totalIndexSize,
    // Index bytes per document: rising value indicates fragmentation growth
    indexBytesPerDoc: stats.count > 0
      ? Math.round(stats.totalIndexSize / stats.count)
      : 0
  };

  // Alert threshold: index size exceeds 2x storage size (heavy fragmentation signal)
  if (stats.totalIndexSize > stats.storageSize * 2) {
    print("ALERT: " + record.collection +
          " — index size (" + Math.round(stats.totalIndexSize / 1048576) + " MB) " +
          "exceeds 2x storage size (" + Math.round(stats.storageSize / 1048576) + " MB)");
  }

  printjson(record);
});

Feed this output into your monitoring stack (Prometheus with the MongoDB exporter, Datadog, Grafana, or a time-series collection inside MongoDB itself). Define two alert tiers: a warning at 20% week-over-week growth in totalIndexSize with flat or declining document count (index growing while data is not), and a critical alert when totalIndexSize exceeds your configured WiredTiger cache size — at that threshold, no cache configuration can hold the full working index set in memory.

Tip

For MongoDB Atlas deployments, enable the built-in "Total Index Size" alert under Database → Alerts and set the threshold at 80% of your instance's RAM. Atlas does not expose WiredTiger page-level metrics directly, so the index size alert is your primary early-warning signal. Pair it with the "Query Targeting" alert, which fires when scanned documents per returned document exceeds a ratio — another downstream symptom of index fragmentation degrading scan efficiency.

Key Takeaways
  • MongoDB WiredTiger B-tree fragmentation is a physical-layer problem driven by updates, deletes, and document relocation — it does not appear in explain() and accumulates silently over weeks to months in high-churn collections.
  • Diagnose fragmentation using db.collection.stats() — focus on totalIndexSize relative to storageSize and document count, and on WiredTiger cache eviction metrics from db.adminCommand({ serverStatus: 1 }).
  • Use $indexStats before any rebuild to identify and drop unused indexes first — removing deadweight indexes reduces cache pressure immediately and speeds up all subsequent write operations.
  • Never run db.collection.reIndex() or compact on a live primary; always use the rolling replica-set procedure (secondaries first, then step down the primary) to avoid collection-level downtime.
  • The compact command is appropriate when both collection data and indexes are fragmented; for index-only fragmentation, a targeted drop-and-rebuild is faster and less disruptive.
  • Monitor totalIndexSize as a rate metric, not just an absolute value — a rising index size against a flat document count is the earliest detectable fragmentation signal, visible weeks before latency begins to degrade.

Index fragmentation is one of those operational problems that rewards teams who instrument their databases proactively and penalises teams who wait for user complaints before investigating. JusDB manages MongoDB for engineering teams who want fragmentation caught and remediated automatically — our DBAs monitor index size growth rates, run rolling index rebuilds during low-traffic windows, and tune WiredTiger cache allocation as workload patterns shift over time. If your MongoDB queries are drifting slower and the usual suspects have already been ruled out, talk to a JusDB DBA and let us dig into the storage layer with you.

Share this article