At 2 AM on a Tuesday, an on-call engineer at a high-volume SaaS company noticed their session collection had grown to 47 GB — nearly triple its normal size. Their TTL index on expiresAt had been in place for two years without incident. Nothing in the application had changed. The only recent infrastructure event was a MongoDB upgrade to 8.0.4, completed three days earlier. By the time they traced the growth back to silent TTL deletion failure, six million stale session documents had accumulated and their storage costs had spiked by $800 in a single billing cycle. The bug was real, it was silent, and MongoDB had already filed it as SERVER-97368.
- MongoDB 8.0.4 introduced a regression (SERVER-97368) that causes the TTL monitor background thread to stop deleting expired documents under specific write-load conditions.
- Symptoms are silent: no errors in
mongod.log, TTL monitor reports as enabled, but collections grow unbounded. - Detect it by cross-referencing collection size growth with TTL monitor pass counts and oplog delete activity.
- Immediate workarounds are manual deletion via
deleteManyand downgrading to MongoDB 8.0.3. - The official fix shipped in MongoDB 8.0.5; upgrade via a rolling restart on replica sets with no
fcvchange required. - Pair any TTL index with collection-size alerts so a future silent failure is caught in minutes, not days.
How MongoDB TTL Indexes Work
A TTL (Time-To-Live) index is a special single-field index on a Date field that instructs MongoDB to automatically delete documents once a configured number of seconds has elapsed past that field's value. You create one with expireAfterSeconds:
// Expire session documents 86400 seconds (24 hours) after expiresAt
db.sessions.createIndex(
{ expiresAt: 1 },
{ expireAfterSeconds: 0 }
);
// Or store an absolute expiry time and expire immediately when reached
db.events.createIndex(
{ createdAt: 1 },
{ expireAfterSeconds: 3600 }
);The deletion is performed not at query time but by a dedicated background thread called the TTL monitor. This thread wakes up every 60 seconds by default, scans all TTL indexes across every database, calculates which documents are past their expiry, and issues batched deletes. The interval is controlled by the ttlMonitorSleepSecs server parameter.
The TTL Monitor Thread Internals
On each pass, the TTL monitor builds a list of all TTL-indexed namespaces from the catalog, acquires an intent lock on each collection, performs an index scan to find expired document _id values, and then issues internal delete operations. Those deletes are replicated to secondaries through the oplog like any other write. The thread tracks its own pass count and logs a summary at the DEBUG verbosity level. Under normal operation, you will see oplog entries with op: "d" originating from the TTL subsystem every 60 seconds for any collection with expiring data.
You can confirm the TTL monitor is enabled at any time with: db.adminCommand({ getParameter: 1, ttlMonitorEnabled: 1 }). A return value of true means the thread is running — but as SERVER-97368 demonstrates, "running" does not mean "deleting."
What Normally Triggers a Missed Pass
Before 8.0.4, the legitimate reasons for the TTL monitor to skip a collection were well-understood: the collection was being dropped, a chunk migration was in flight on a sharded cluster, or the monitor was temporarily disabled via db.adminCommand({ setParameter: 1, ttlMonitorEnabled: false }). None of these produce silent, persistent failure. SERVER-97368 is different.
The SERVER-97368 Bug: Root Cause
MongoDB 8.0.4 introduced a change to how the storage engine coordinates collection-level locking during write-heavy workloads as part of a broader write throughput improvement effort. The regression caused the TTL monitor thread to incorrectly interpret a lock acquisition timeout — triggered by high write concurrency — as a signal to abort the entire current pass and reschedule, rather than retrying the lock on that specific collection and continuing to the next namespace.
The critical failure mode is that under sustained write pressure, the abort-and-reschedule loop prevents the TTL monitor from ever completing a successful delete pass. The thread remains alive and reports itself as enabled, but it is caught in a perpetual cycle of attempting to acquire a collection lock, timing out under write load, and rescheduling. The ttlMonitorEnabled parameter returns true. The pass counter may even increment. But zero deletes are issued.
This bug is worst on collections with high sustained write rates — exactly the collections most likely to accumulate large volumes of expired documents. Session stores, event streams, audit logs, and time-series collections are highest risk. Read-heavy or low-write collections may experience intermittent TTL deletions rather than complete failure, making the bug harder to detect.
The bug is exclusive to the 8.0.4 release branch. MongoDB 8.0.3 and earlier are not affected. The fix was merged and shipped in 8.0.5, which correctly handles lock timeout by retrying per-collection rather than aborting the pass.
Detecting the Failure
There is no error log entry, no mongod crash, and no alert from the TTL subsystem itself. Detection requires correlating multiple signals.
Step 1: Verify TTL Monitor Status
// On the primary replica
db.adminCommand({ getParameter: 1, ttlMonitorEnabled: 1 });
// Expected: { ttlMonitorEnabled: true, ok: 1 }
// This confirms the thread is running — but does NOT confirm deletes are happeningStep 2: Check Collection Size Growth Rate
// Get current document count and storage size for suspected collections
db.sessions.stats({ scale: 1048576 });
// Look at: count, size (MB), storageSize (MB)
// Run this at T+0 and T+60s (one TTL monitor interval apart)
// If count is increasing or flat despite active expiresAt values in the past,
// TTL deletion is failing.
// Quick check: how many documents are already past their TTL?
db.sessions.countDocuments({
expiresAt: { $lt: new Date() }
});
// On a healthy system with a working TTL index, this number stays near zero.
// Thousands or millions here confirms the bug.Step 3: Inspect Oplog for TTL Delete Activity
// Connect to local.oplog.rs and look for recent deletes on your collection
use local;
db.oplog.rs.find({
op: "d",
ns: "mydb.sessions"
}).sort({ $natural: -1 }).limit(5);
// If this returns no results in the last 5-10 minutes despite expired documents
// existing in the collection, TTL deletes have stopped.
// Check the timestamp of the most recent TTL delete:
db.oplog.rs.find({ op: "d", ns: "mydb.sessions" })
.sort({ $natural: -1 })
.limit(1)
.toArray()
.map(e => new Date(e.ts.getHighBits() * 1000));Step 4: Cross-Reference MongoDB Version
db.version();
// "8.0.4" confirms exposure to the bug
// "8.0.3" or "8.0.5+" means the bug is not the causeAutomate detection with a simple cron job or monitoring script: every 5 minutes, run countDocuments({ expiresAt: { $lt: new Date() } }) against your TTL-indexed collections and alert if the count exceeds a threshold (e.g., 1000 documents past expiry). This catches silent TTL failure regardless of the root cause.
Workarounds While Waiting for the Fix
Option 1: Manual Deletion (Immediate Relief)
If you cannot immediately downgrade or upgrade, manually drain expired documents with a rate-limited deleteMany loop. Deleting millions of documents in a single operation will spike oplog volume and cause replication lag on secondaries.
// Batch deletion loop — run from mongosh on the primary
// Adjust batchSize and sleep interval based on your write load
let total = 0;
let deleted = 0;
do {
const result = db.sessions.deleteMany(
{ expiresAt: { $lt: new Date() } },
{ limit: 1000 } // Note: limit not supported in deleteMany directly
);
// Use this pattern instead for batched deletes:
const ids = db.sessions
.find({ expiresAt: { $lt: new Date() } }, { _id: 1 })
.limit(1000)
.toArray()
.map(d => d._id);
if (ids.length === 0) break;
const res = db.sessions.deleteMany({ _id: { $in: ids } });
deleted += res.deletedCount;
print(`Deleted ${deleted} documents so far...`);
sleep(500); // 500ms pause between batches to limit oplog pressure
} while (true);
print(`Total deleted: ${deleted}`);Running large deletes on a replica set will generate substantial oplog volume. Monitor secondary replication lag with rs.printSecondaryReplicationInfo() during the operation. If lag exceeds your application tolerance, increase the sleep() interval between batches or reduce batch size to 100–200 documents.
Option 2: Downgrade to MongoDB 8.0.3
If your environment is version-pinned and a hotfix upgrade is not immediately feasible, a rolling downgrade to 8.0.3 is the safest path. Because 8.0.3 and 8.0.4 share the same feature compatibility version (FCV), no setFeatureCompatibilityVersion step is required. Perform a standard rolling restart: step down each secondary, restart with the 8.0.3 binary, then step down the primary.
# On each secondary (repeat for all secondaries first):
sudo systemctl stop mongod
# Replace the mongod binary with 8.0.3 version
sudo systemctl start mongod
# Verify secondary is back in set:
mongosh --eval "rs.status().members.filter(m => m.stateStr === 'SECONDARY')"
# Step down the primary last:
mongosh --eval "rs.stepDown()"
# Then restart the former primary with 8.0.3 binaryOption 3: Upgrade to MongoDB 8.0.5 (Recommended)
The official fix ships in 8.0.5. This is the same rolling-restart process as the downgrade path above, but forward to 8.0.5. No schema changes, no index rebuilds, and no FCV change are required. After upgrade, confirm TTL deletions resume by watching the expired document count drop over the first two TTL monitor intervals (approximately 2 minutes).
TTL Index Best Practices
Use Partial Filter Expressions to Reduce Index Overhead
A TTL index on a high-cardinality collection scans many documents per pass. A partial filter expression scopes the index to only the documents that actually need TTL management, reducing the index size and the per-pass scan cost.
// Only index documents where status is "pending" — completed tasks don't need TTL
db.tasks.createIndex(
{ createdAt: 1 },
{
expireAfterSeconds: 604800, // 7 days
partialFilterExpression: { status: "pending" }
}
);Store Absolute Expiry Times with expireAfterSeconds: 0
Rather than relying on a fixed offset from a creation timestamp, store the precise expiry date in the document and use expireAfterSeconds: 0. This gives you per-document TTL control and makes it trivial to extend or shorten expiry without index modification.
// Insert with a computed absolute expiry
db.sessions.insertOne({
userId: ObjectId("..."),
token: "abc123",
// Expires 2 hours from now — can be set per user tier or session type
expiresAt: new Date(Date.now() + 2 * 60 * 60 * 1000)
});
// Index expires documents the moment their expiresAt is in the past
db.sessions.createIndex({ expiresAt: 1 }, { expireAfterSeconds: 0 });Monitor TTL Health as a First-Class Metric
Add these checks to your monitoring stack (Datadog, Prometheus + MongoDB exporter, or custom scripts):
// Monitor script — run every 5 minutes via cron or Atlas trigger
const collections = [
{ db: "mydb", coll: "sessions", ttlField: "expiresAt" },
{ db: "mydb", coll: "auditLogs", ttlField: "purgeAt" },
{ db: "mydb", coll: "events", ttlField: "createdAt" }
];
collections.forEach(({ db: dbName, coll, ttlField }) => {
const expiredCount = db.getSiblingDB(dbName)[coll].countDocuments({
[ttlField]: { $lt: new Date() }
});
if (expiredCount > 500) {
print(`ALERT: ${dbName}.${coll} has ${expiredCount} expired documents. TTL may be failing.`);
// Emit metric to your monitoring system here
}
});Set Alerts on Collection Storage Size
In MongoDB Atlas, configure a Data Explorer alert or use the Atlas API to set a collection-level storage threshold alert. For self-managed deployments, use the MongoDB Prometheus exporter metric mongodb_dbstats_dataSize or query db.stats() periodically and alert on anomalous growth rates — a 10% hour-over-hour increase in a TTL-managed collection is a reliable early indicator of silent deletion failure.
// Baseline collection size check
const stats = db.sessions.stats({ scale: 1048576 });
print(`Collection: sessions`);
print(`Documents: ${stats.count}`);
print(`Data size: ${stats.size} MB`);
print(`Storage: ${stats.storageSize} MB`);
print(`Index size: ${stats.totalIndexSize} MB`);For Atlas users, the built-in "Data Size" alert under Database → Alerts handles this automatically. Set a threshold 20% above your expected steady-state collection size and route it to PagerDuty. A TTL failure will breach the threshold within one billing hour on any high-write collection.
- MongoDB 8.0.4 contains a confirmed regression (SERVER-97368) that silently halts TTL deletion under write-heavy workloads — the fix is in 8.0.5.
- Detect the failure by combining three signals: the expired document count stays non-zero, oplog shows no recent deletes on the TTL collection, and
db.version()returns 8.0.4. - Recover immediately with batched
deleteManyloops; limit batches to 500–1000 documents with a brief sleep to avoid replication lag on secondaries. - Upgrade to 8.0.5 via rolling restart — no FCV change, no index rebuild, no application downtime required.
- Store absolute expiry timestamps and use
expireAfterSeconds: 0for per-document TTL flexibility rather than fixed offsets. - Treat expired document count as a first-class health metric and alert on it at 5-minute intervals — silent TTL failure should never go undetected for more than a few minutes.
Working with JusDB on MongoDB Operations
Bugs like SERVER-97368 are the reason proactive version management and deep operational monitoring matter. JusDB manages MongoDB for engineering teams who need production-grade reliability without dedicating senior DBA time to tracking upstream release notes, verifying TTL health after every upgrade, and tuning replication lag during emergency delete operations. Our DBAs monitor your TTL indexes, collection growth rates, and oplog health continuously — so a silent deletion failure is caught and resolved before it shows up on your storage invoice.