NoSQL Databases

MongoDB Atlas Search: Full-Text Search on Your MongoDB Data

Build full-text search with MongoDB Atlas Search — analyzers, scoring, autocomplete, and relevance tuning

JusDB Team
December 14, 2022
11 min read
146 views

If you have ever tried using MongoDB's built-in $text operator for full-text search in production, you already know its limitations: no relevance scoring control, no phrase proximity, no autocomplete, and no language-aware stemming beyond a handful of supported languages. MongoDB Atlas Search changes that picture entirely by embedding an Apache Lucene engine directly into your Atlas cluster, giving you Elasticsearch-grade full-text capabilities without running a separate service. Teams that previously maintained a sidecar Elasticsearch deployment just for search are increasingly retiring it in favor of Atlas Search. This guide walks through everything you need — index definitions, the $search aggregation stage, custom analyzers, relevance tuning, and autocomplete — so you can go from zero to production search in a single afternoon.

TL;DR
  • Atlas Search embeds Apache Lucene inside Atlas, replacing both $text and external Elasticsearch clusters.
  • Indexes use either dynamic mapping (index everything automatically) or static mapping (index specific fields for precision and cost control).
  • The $search aggregation stage drives all queries; operators include text, phrase, compound, near, and more.
  • Analyzers control tokenization and normalization — pick the right one for language, stemming, and synonym support.
  • Relevance scores surface via { $meta: "searchScore" }; use $searchMeta for faceted counts without returning documents.
  • A dedicated autocomplete index type powers prefix and edge-ngram suggestions at low latency.

Atlas Search is a fully managed full-text search layer built on top of Apache Lucene and integrated directly into MongoDB Atlas. Unlike the legacy $text operator — which uses a simple inverted index stored inside MongoDB's own storage engine — Atlas Search maintains a separate Lucene index that is kept in sync with your collection through a change stream. This architecture lets Lucene do what it was designed for: relevance scoring with BM25, rich analyzer pipelines, phrase proximity, fuzzy matching, and faceted search, all accessible through MongoDB's familiar aggregation pipeline.

The practical difference versus $text is significant. The $text operator supports only one text index per collection, cannot rank results by relevance beyond a basic score, lacks phrase queries, and has no autocomplete capability. Atlas Search supports multiple search indexes, full BM25 scoring with per-field boosting, highlight snippets, autocomplete, and synonym mappings. The practical difference versus running a dedicated Elasticsearch cluster is operational: no extra infrastructure, no connector to keep in sync, and billing that scales with your Atlas tier.

Tip

Atlas Search is available on M10+ clusters. If you are on M0/M2/M5 (free/shared tiers), you will need to upgrade before creating a search index.

Creating an Atlas Search Index

Every Atlas Search query requires at least one search index on the target collection. You create and manage indexes through the Atlas UI, the Atlas CLI, or the Atlas Administration API. The two fundamental mapping strategies are dynamic and static.

Dynamic mapping tells Atlas Search to automatically index all fields with compatible types (strings, numbers, dates, booleans). It is the fastest way to get started and is useful during development, but it can index more data than necessary, increasing index size and cost.

json
{
  "mappings": {
    "dynamic": true
  }
}

Static mapping gives you full control. You declare exactly which fields to index, which analyzer to apply, and which index types to use. This is the right choice for production workloads where you want to minimize index size and avoid indexing fields that will never be searched.

json
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": {
        "type": "string",
        "analyzer": "lucene.english"
      },
      "body": {
        "type": "string",
        "analyzer": "lucene.english",
        "searchAnalyzer": "lucene.english"
      },
      "tags": {
        "type": "string",
        "analyzer": "lucene.keyword"
      },
      "publishedAt": {
        "type": "date"
      },
      "price": {
        "type": "number"
      }
    }
  }
}

You can mix dynamic and static mappings: set "dynamic": false at the top level and then add a "dynamic": true sub-entry under a nested object field if you want automatic indexing only within that subdocument.

Warning

Dynamic mapping indexes all string fields using the lucene.standard analyzer by default. If your documents contain large text fields you never query (such as base64-encoded attachments stored as strings), you will pay for indexing them. Switch to static mapping before going to production.

Querying with the $search Aggregation Stage

Atlas Search queries run as the first stage of an aggregation pipeline using $search. The stage accepts an operator that defines the search logic. Here are the most commonly used operators.

text — tokenizes the query string with the same analyzer used at index time and matches documents containing any of the resulting tokens.

javascript
db.articles.aggregate([
  {
    $search: {
      index: "default",
      text: {
        query: "full text search mongodb",
        path: "body",
        fuzzy: { maxEdits: 1 }
      }
    }
  },
  {
    $project: {
      title: 1,
      score: { $meta: "searchScore" }
    }
  }
])

phrase — requires query terms to appear adjacent and in order, with an optional slop parameter that allows a configurable number of intervening words.

javascript
{
  $search: {
    phrase: {
      query: "Atlas Search",
      path: "title",
      slop: 2
    }
  }
}

compound — combines multiple operators with boolean logic. The four clauses are must (required), mustNot (excluded), should (boosts score), and filter (required but does not affect score).

javascript
{
  $search: {
    compound: {
      must: [
        {
          text: {
            query: "mongodb",
            path: "body"
          }
        }
      ],
      should: [
        {
          text: {
            query: "atlas search",
            path: "title",
            score: { boost: { value: 3 } }
          }
        }
      ],
      filter: [
        {
          range: {
            path: "publishedAt",
            gte: new Date("2024-01-01")
          }
        }
      ]
    }
  }
}
Tip

Use filter instead of must for date ranges and numeric filters. Filter clauses skip BM25 scoring computation for those conditions, which is faster and produces cleaner relevance scores on the text fields that matter.

Analyzers and Text Processing

Analyzers define how text is broken into tokens at both index time and query time. Choosing the right analyzer has a larger impact on result quality than almost any other configuration decision.

Atlas Search ships with several built-in analyzers:

  • lucene.standard — the default. Lowercases text, removes common stop words, and tokenizes on whitespace and punctuation. Good for English general-purpose search.
  • lucene.english — adds Porter stemming on top of standard, so "running" and "runs" both match documents containing "run". Use this for English-language content.
  • lucene.keyword — treats the entire field value as a single token. Useful for exact-match fields like tags, SKUs, or email addresses.
  • lucene.whitespace — splits only on whitespace, preserving punctuation. Useful for code snippets or hyphenated terms.
  • lucene.french, lucene.german, lucene.spanish, and other language analyzers apply language-specific stemming and stop word lists.

For finer control, define a custom analyzer by composing a character filter, a tokenizer, and one or more token filters:

json
{
  "analyzers": [
    {
      "name": "myCustomAnalyzer",
      "charFilters": [
        { "type": "htmlStrip" }
      ],
      "tokenizer": {
        "type": "standard"
      },
      "tokenFilters": [
        { "type": "lowercase" },
        {
          "type": "stopword",
          "tokens": ["the", "a", "an", "and", "or"]
        },
        { "type": "porterStemming" },
        {
          "type": "synonym",
          "synonyms": "mySynonymMapping"
        }
      ]
    }
  ],
  "mappings": {
    "dynamic": false,
    "fields": {
      "body": {
        "type": "string",
        "analyzer": "myCustomAnalyzer"
      }
    }
  }
}

The htmlStrip character filter is particularly useful when your documents contain HTML markup stored as strings — it strips tags before tokenization so angle brackets and tag names do not pollute the token stream.

Scoring and Relevance Tuning

Atlas Search uses the BM25 algorithm to compute relevance scores. BM25 rewards documents where query terms appear frequently relative to the document length and penalizes very common terms across the corpus. You can inspect and influence scores in several ways.

Surfacing the score with $meta:

javascript
db.products.aggregate([
  {
    $search: {
      text: { query: "wireless headphones", path: "description" }
    }
  },
  {
    $project: {
      name: 1,
      description: 1,
      searchScore: { $meta: "searchScore" },
      highlights: { $meta: "searchHighlights" }
    }
  },
  { $sort: { searchScore: -1 } },
  { $limit: 10 }
])

Boosting specific fields using per-clause score modifiers:

javascript
{
  compound: {
    should: [
      {
        text: {
          query: "atlas search",
          path: "title",
          score: { boost: { value: 5 } }
        }
      },
      {
        text: {
          query: "atlas search",
          path: "body",
          score: { boost: { value: 1 } }
        }
      }
    ]
  }
}

Faceted search with $searchMeta returns aggregation counts without returning the matching documents themselves — ideal for building facet sidebars:

javascript
db.products.aggregate([
  {
    $searchMeta: {
      facet: {
        operator: {
          text: { query: "laptop", path: "description" }
        },
        facets: {
          brandFacet: {
            type: "string",
            path: "brand",
            numBuckets: 10
          },
          priceFacet: {
            type: "number",
            path: "price",
            boundaries: [0, 500, 1000, 2000, 5000]
          }
        }
      }
    }
  }
])
Warning

$searchMeta requires fields used in facets to be indexed with "type": "stringFacet" (for strings) or "type": "numberFacet" (for numbers) in your index definition — a regular "type": "string" field will not work for faceting. You can index the same field twice with different types to support both search and faceting simultaneously.

Autocomplete

Atlas Search provides a dedicated autocomplete index type that uses edge n-grams to support prefix-style suggestions as users type. The index definition opts specific fields into autocomplete tokenization:

json
{
  "mappings": {
    "dynamic": false,
    "fields": {
      "title": [
        {
          "type": "string",
          "analyzer": "lucene.english"
        },
        {
          "type": "autocomplete",
          "analyzer": "lucene.standard",
          "tokenization": "edgeGram",
          "minGrams": 2,
          "maxGrams": 15,
          "foldDiacritics": true
        }
      ]
    }
  }
}

Querying autocomplete uses the autocomplete operator inside $search:

javascript
db.articles.aggregate([
  {
    $search: {
      autocomplete: {
        query: "mongo",
        path: "title",
        tokenOrder: "sequential",
        fuzzy: { maxEdits: 1, prefixLength: 3 }
      }
    }
  },
  { $limit: 5 },
  { $project: { title: 1, _id: 0 } }
])

The tokenOrder: "sequential" setting requires matched tokens to appear in the same order as the query, which produces more natural autocomplete results. The fuzzy option handles typos — with prefixLength: 3, the first three characters must match exactly before fuzzy matching kicks in, which avoids excessively broad suggestions on short prefixes.

Tip

Keep autocomplete queries fast by projecting only the fields needed for the suggestion list (typically just the title or name) and enforcing a low $limit. Autocomplete indexes grow large with small minGrams values — prefer minGrams: 3 or higher unless you specifically need two-character prefix matching.

Atlas Search vs $text vs Elasticsearch

The $text operator covers simple keyword search scenarios with minimal setup, but it is not suitable for any production search experience that requires relevance ranking, autocomplete, or phrase queries. Atlas Search supersedes it entirely for those use cases.

Versus a self-managed or cloud Elasticsearch/OpenSearch cluster, Atlas Search trades some configurability for operational simplicity. You lose access to Elasticsearch's percolator queries, custom similarity implementations, and cross-cluster search across non-Atlas data sources. What you gain is zero infrastructure overhead, native MongoDB document model alignment (no separate mapping schema to maintain), and billing through your existing Atlas contract. For the vast majority of application search requirements — full-text, autocomplete, facets, relevance tuning — Atlas Search is the lower-friction choice when your data is already in MongoDB Atlas.

Key Takeaways
  • Use dynamic mapping to prototype quickly, then switch to static mapping before production to control index size and cost.
  • The $search aggregation stage is the entry point for all Atlas Search queries; it must be the first stage in the pipeline.
  • The compound operator with must, should, mustNot, and filter clauses handles nearly any boolean search logic you need.
  • Pick an analyzer that matches your content: lucene.english for stemmed English text, lucene.keyword for exact-match fields, custom analyzers for HTML stripping or synonym expansion.
  • Surface relevance scores with { $meta: "searchScore" } and use per-field boost modifiers to weight title matches above body matches.
  • Use $searchMeta with stringFacet and numberFacet index types to build faceted navigation without fetching full documents.
  • The autocomplete index type with edge n-grams powers low-latency prefix suggestions; pair it with fuzzy matching for typo tolerance.
  • Atlas Search is a direct replacement for $text and a viable alternative to Elasticsearch for most application search workloads when your data lives in Atlas.

Run Atlas Search Without the Infrastructure Overhead — JusDB

Setting up Atlas Search indexes is straightforward, but tuning analyzers, managing index definitions across environments, monitoring search latency, and debugging relevance issues across a growing document collection still takes dedicated engineering time. JusDB gives MongoDB teams a managed environment for running Atlas Search workloads alongside the rest of their database operations — with observability tooling built in so you can see exactly which queries are slow, which index fields are underused, and where relevance tuning will have the most impact.

If you are evaluating Atlas Search for a production use case or migrating off a standalone Elasticsearch cluster, talk to the JusDB team about how we support Atlas Search deployments at scale. We can help you design your index mappings, configure custom analyzers for your content type, and set up the monitoring you need to keep search performance tight as your data grows.

Share this article