Last November, a logistics platform came to JusDB with a PostgreSQL table that had grown to 48 million rows, each storing a json column containing shipment event metadata: carrier codes, customs flags, temperature readings, and route waypoints. Every API request to their tracking dashboard ran a query like WHERE events->>'carrier' = 'DHL', and each one took between 800ms and 2.4 seconds. Their on-call engineer had already tried adding a B-tree index on the column itself, which PostgreSQL silently ignored at query time because standard json is not indexable. The column type was the problem, not the query.
Switching the column from json to jsonb took four minutes including the ALTER TABLE rewrite. Adding a single GIN index took another eleven minutes on their 48-million-row table. After those two operations, the same tracking queries ran in 4–9ms. No application code changed. No schema redesign. The column still stored the same JSON documents.
The difference between json and jsonb in PostgreSQL is not cosmetic. json stores a raw text copy of the input, validates it on write, and re-parses it on every read. jsonb decomposes the document into a binary format on write, discards redundant whitespace and duplicate keys, and exposes the result to GIN indexing. For any workload involving querying or filtering JSON content — which is nearly every workload that uses JSON at all — jsonb is the correct choice.
This guide covers exactly what that binary format means in practice, which operators and index types to use, what the performance trade-offs look like at scale, and when to abandon semi-structured JSON entirely in favor of normalized tables.
- Always use
jsonbinstead ofjsonfor any column you will query, filter, or index — the only exception is preserving exact key order or duplicate keys, which are edge cases almost no real application needs. jsonbstores a parsed binary representation; reads are faster and GIN indexes are possible.jsonstores raw text; it re-parses on every read and cannot be indexed on its contents.- GIN with
jsonb_ops(the default) indexes every key and value and supports all containment and existence operators. GIN withjsonb_path_opsindexes only values reachable via path and is smaller and faster for@>containment queries. - Operators to know:
->(key access, returns jsonb),->>(key access, returns text),#>/#>>(path access),@>(contains),<@(is contained by),?/?|/?&(key existence). - PostgreSQL 12+ jsonpath expressions (
jsonb_path_query,@@ jsonpath) provide XPath-style navigation and filtering, including arithmetic and array subscripting, inside JSON documents. - Large, deeply nested JSONB documents with many unique keys are a common anti-pattern. For any JSON structure that is fixed and well-understood, normalized columns outperform JSONB in query speed, storage, and join efficiency.
Background
PostgreSQL has supported a json type since version 9.2 (2012) and jsonb since version 9.4 (2014). Despite two years of maturity advantage, json exists today almost entirely for backwards compatibility and for the rare use case where preserving exact byte-for-byte input fidelity matters more than query performance.
The naming is misleading. Both types accept valid JSON as input. Both return valid JSON as output. The difference is entirely internal: json stores the raw text string. jsonb decomposes the input into a binary tree of typed values during the write path, discarding insignificant whitespace, normalizing duplicate keys to the last occurrence, and sorting object keys. That decomposition has a small write cost (typically 10–20% more CPU per insert compared to json) but pays dividends on every subsequent read and makes GIN indexing possible.
The practical consequences of the storage format difference are significant:
- Key ordering:
jsonpreserves the original key insertion order in output.jsonbdoes not — keys are sorted internally. If your application reads a JSONB column and compares the output string byte-for-byte to the original input, it will see a different key order. This is the most commonjsonbgotcha and the only reason to preferjsonin practice. - Duplicate keys:
jsonstores all duplicate keys and returns them all.jsonbsilently keeps only the last value for a duplicate key. Again, this matters only if your application intentionally produces JSON with repeated keys, which the JSON specification explicitly marks as undefined behavior. - Read performance:
jsonmust parse the entire text string on every expression evaluation.jsonbnavigates directly to the requested path in the binary structure. On a 48-million-row table running a filter on a key nested two levels deep, this difference shows up as seconds versus milliseconds.
If you are migrating a column from json to jsonb, audit your application for any code that compares JSON output strings directly or relies on key ordering. The migration is an ALTER TABLE ... ALTER COLUMN ... TYPE jsonb USING column::text::jsonb, which rewrites the entire table. On large tables, run this during a maintenance window or use a background migration strategy with a new column.
How It Works
The JSONB binary format stores each JSON value as a tagged element. Objects are stored as sorted arrays of key-value pairs; arrays are stored as indexed sequences. Each element carries a type tag (null, boolean, number, string, array, object) and its value. Navigation to a specific key or path does not require scanning the entire document — it performs a binary search over sorted keys at each level of the tree.
This structure is what makes GIN indexing possible. A GIN (Generalized Inverted Index) over a jsonb column inverts the document: instead of mapping document IDs to documents, it maps each JSON key, each JSON value, and each key-value pair to the set of rows that contain it. A query like WHERE metadata @> '{"status": "delivered"}' can hit the GIN index to retrieve only the matching row IDs in microseconds, without touching the table heap at all for most queries.
The Operator Family
JSONB exposes two classes of operators: navigation operators and containment/existence operators. Navigation operators work on both json and jsonb; containment and existence operators are jsonb-only and are the ones that benefit from GIN indexes.
Navigation operators (work on both json and jsonb):
-- -> returns the value as jsonb (use for further traversal)
SELECT metadata -> 'carrier' FROM shipments;
-- Returns: "DHL" (jsonb string, with quotes)
-- ->> returns the value as text (use for string comparison)
SELECT metadata ->> 'carrier' FROM shipments;
-- Returns: DHL (text, no quotes)
-- #> returns nested value as jsonb using path array
SELECT metadata #> '{address, city}' FROM shipments;
-- Equivalent to: metadata -> 'address' -> 'city'
-- #>> returns nested value as text
SELECT metadata #>> '{address, city}' FROM shipments;Containment and existence operators (jsonb only, GIN-indexable):
-- @> "contains": left jsonb contains right jsonb as a subset
-- This is the most important operator for GIN-indexed queries
SELECT * FROM shipments
WHERE metadata @> '{"status": "delivered", "carrier": "DHL"}';
-- <@ "is contained by": left is a subset of right
SELECT * FROM shipments
WHERE '{"carrier": "DHL"}' <@ metadata;
-- ? "key exists": checks if string is a top-level key
SELECT * FROM shipments WHERE metadata ? 'customs_flag';
-- ?| "any key exists": checks if any string in the array is a top-level key
SELECT * FROM shipments WHERE metadata ?| ARRAY['customs_flag', 'hazmat_code'];
-- ?& "all keys exist": checks if all strings in the array are top-level keys
SELECT * FROM shipments WHERE metadata ?& ARRAY['carrier', 'status', 'weight_kg'];When writing WHERE clauses that will hit a GIN index, always use the containment operators (@>, <@, ?, ?|, ?&). Expressions like metadata ->> 'carrier' = 'DHL' will not use a standard GIN index — they require either a functional B-tree index on that specific extracted path or a jsonb_path_ops GIN index combined with a jsonb_path_match expression. The @> operator is both the most readable and the most index-friendly option for equality filtering.
Configuration and Setup
Step 1: Define a JSONB Column
CREATE TABLE shipments (
id BIGSERIAL PRIMARY KEY,
tracking_id TEXT NOT NULL UNIQUE,
org_id UUID NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT now()
);
-- Insert a document
INSERT INTO shipments (tracking_id, org_id, metadata) VALUES (
'1Z999AA10123456784',
'550e8400-e29b-41d4-a716-446655440000',
'{
"carrier": "UPS",
"status": "in_transit",
"origin": {"city": "Chicago", "country": "US"},
"destination": {"city": "Berlin", "country": "DE"},
"weight_kg": 4.2,
"customs_flag": false,
"events": [
{"ts": "2024-11-01T08:00:00Z", "code": "PICKUP"},
{"ts": "2024-11-01T14:30:00Z", "code": "DEPARTED_ORIGIN"}
]
}'
);Step 2: Create a GIN Index
PostgreSQL provides two GIN operator classes for JSONB. Understanding the difference is important before creating the index.
jsonb_ops (default): indexes every key, every value, and every key-value pair at every level of the document. Supports @>, <@, ?, ?|, and ?& operators. Larger index, slower build, but the most flexible.
jsonb_path_ops: indexes only values reachable via a path from the document root. Does not index keys independently. Only supports @> containment queries. Significantly smaller index (typically 30–50% of jsonb_ops size) and faster for containment-heavy workloads.
-- Default GIN: supports all JSONB operators
-- Use when you need key existence checks (?), any-key (?|), all-key (?&)
CREATE INDEX shipments_metadata_gin_idx
ON shipments
USING GIN (metadata);
-- Equivalent to: USING GIN (metadata jsonb_ops)
-- Path-ops GIN: smaller, faster for @> only
-- Use when all your queries use @> containment
CREATE INDEX shipments_metadata_gin_path_idx
ON shipments
USING GIN (metadata jsonb_path_ops);
-- Functional B-tree index on a specific extracted path
-- Use when you query one specific key repeatedly with = or range operators
CREATE INDEX shipments_carrier_idx
ON shipments ((metadata ->> 'carrier'));
CREATE INDEX shipments_status_idx
ON shipments ((metadata ->> 'status'));Do not create both a jsonb_ops and a jsonb_path_ops GIN index on the same column. They overlap significantly and you will pay double the storage and write amplification with no query benefit. Choose one based on your operator usage, or use functional B-tree indexes for high-cardinality keys you filter with equality or range conditions.
Step 3: Mutation Functions
JSONB documents in PostgreSQL are immutable values. To update a field inside a JSONB column, you use the jsonb_set(), jsonb_insert(), and jsonb_strip_nulls() functions rather than directly mutating the stored value.
-- jsonb_set(target, path, new_value, create_missing)
-- Update the status field
UPDATE shipments
SET metadata = jsonb_set(metadata, '{status}', '"delivered"', false)
WHERE tracking_id = '1Z999AA10123456784';
-- Update a nested field
UPDATE shipments
SET metadata = jsonb_set(metadata, '{destination, city}', '"Munich"', true)
WHERE tracking_id = '1Z999AA10123456784';
-- jsonb_insert(target, path, new_value, insert_after)
-- Append a new event to the events array
UPDATE shipments
SET metadata = jsonb_insert(
metadata,
'{events, -1}', -- -1 means after the last element
'{"ts": "2024-11-03T09:00:00Z", "code": "DELIVERED"}',
true -- insert after the path, not before
)
WHERE tracking_id = '1Z999AA10123456784';
-- jsonb_strip_nulls: removes all keys with null values, recursively
-- Useful for compressing sparse documents before storage
SELECT jsonb_strip_nulls('{"a": 1, "b": null, "c": {"d": null, "e": 2}}');
-- Returns: {"a": 1, "c": {"e": 2}}
-- The || operator merges two JSONB objects (right side wins on key conflict)
UPDATE shipments
SET metadata = metadata || '{"status": "delivered", "delivered_at": "2024-11-03T09:00:00Z"}'
WHERE tracking_id = '1Z999AA10123456784';Step 4: PostgreSQL 12+ Jsonpath Queries
PostgreSQL 12 introduced the SQL/JSON path language, which provides XPath-style navigation inside JSONB documents. Jsonpath expressions support arithmetic, comparison, array filtering, and recursive descent — operations that are cumbersome with the standard operator set.
-- jsonb_path_query: returns all values matching a jsonpath expression
-- Find all event codes in the events array
SELECT jsonb_path_query(metadata, '$.events[*].code')
FROM shipments
WHERE tracking_id = '1Z999AA10123456784';
-- Returns rows: "PICKUP", "DEPARTED_ORIGIN", "DELIVERED"
-- @@ operator: returns boolean, true if the path matches anything
-- Find shipments where any event has code "DELIVERED"
SELECT * FROM shipments
WHERE metadata @@ '$.events[*].code == "DELIVERED"';
-- Filter events by timestamp using jsonpath arithmetic
SELECT jsonb_path_query(
metadata,
'$.events[*] ? (@.ts >= "2024-11-03T00:00:00Z")'
)
FROM shipments;
-- @? operator: returns true if the path returns any value (existence check)
-- Find shipments that have at least one customs-related event
SELECT * FROM shipments
WHERE metadata @? '$.events[*] ? (@.code starts with "CUSTOMS")';
-- jsonb_path_query_array: returns all matches as a JSONB array
SELECT jsonb_path_query_array(metadata, '$.events[*].code')
FROM shipments;
-- Returns: ["PICKUP", "DEPARTED_ORIGIN", "DELIVERED"]The @@ and @? jsonpath operators can use a GIN index with jsonb_path_ops when the expression is a simple path equality or containment check. Complex jsonpath expressions with arithmetic or string functions will fall back to a sequential scan. Use EXPLAIN ANALYZE to verify index usage before deploying jsonpath predicates on large tables.
Performance and Best Practices
Understanding the performance model for JSONB queries requires thinking about three separate layers: operator type, index type, and document structure. Getting any one of these wrong can negate the benefits of the other two.
Operator choice determines index eligibility. A query using ->> (navigation with text cast) requires a functional index on that specific path. A query using @> (containment) can use either GIN index type. A query using ?, ?|, or ?& (existence) requires jsonb_ops GIN specifically. Mixing operator types across queries on the same column often means you need multiple indexes.
For high-frequency equality queries on a specific key, a functional B-tree index on (metadata ->> 'key') is faster than GIN for that specific query because B-tree indexes support equality and range operators with lower overhead than GIN. Use GIN for multi-key containment queries where the queried key set varies at runtime.
JSONB document size matters. GIN indexes every element in every document. A column storing 5KB JSONB documents with 200 unique keys per document will produce a very large GIN index — potentially larger than the table itself. On a 48-million-row table with 5KB average JSONB, a full jsonb_ops GIN index can exceed 50GB. Monitor index sizes with pg_relation_size and consider partial indexes or jsonb_path_ops as mitigations.
-- Check index sizes
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE tablename = 'shipments'
ORDER BY pg_relation_size(indexrelid) DESC;
-- Check whether your query is using the GIN index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM shipments
WHERE metadata @> '{"carrier": "DHL", "status": "in_transit"}';
-- Partial GIN index: only index rows matching a common filter
-- Reduces index size dramatically for time-bounded data
CREATE INDEX shipments_active_gin_idx
ON shipments
USING GIN (metadata)
WHERE created_at >= now() - interval '90 days';Large JSONB documents with deeply nested structures and hundreds of unique keys per document are a significant performance anti-pattern. If your JSONB schema is effectively fixed — the same keys appear in every row — those keys belong in regular normalized columns, not in a JSONB blob. Normalized columns are faster to query, smaller to store, easier to join, and straightforward to index with standard B-tree indexes. Use JSONB for genuinely variable schemas: user-defined attributes, plugin metadata, event payloads where the keys differ across rows.
Comparison Table
| Criteria | PostgreSQL JSONB | PostgreSQL JSON | MongoDB Documents | EAV Pattern |
|---|---|---|---|---|
| Storage format | Binary decomposed tree; keys sorted; duplicate keys deduplicated. | Raw text string; exact input preserved including whitespace and key order. | BSON (Binary JSON); similar to JSONB in that values are typed and binary-encoded. | Normalized rows: one row per attribute per entity. No binary encoding overhead. |
| Read performance | Fast path navigation; no re-parsing on read. | Must re-parse entire text on every read. Slow at scale. | Fast; native binary format designed for document traversal. | Requires JOIN per attribute. Fast for single attributes; slow for wide reads. |
| Write performance | ~10–20% write overhead for binary decomposition vs. JSON. | Fastest writes: just validate and store text. | Fast; write path is BSON serialization. | Multiple INSERT/UPDATE rows per entity update. Write amplification for wide entities. |
| Indexing | GIN (jsonb_ops, jsonb_path_ops), functional B-tree on extracted paths. | Functional B-tree on extracted paths only. No GIN support. | Rich index support: single-field, compound, multikey (array), text, geo, hashed. | B-tree on attribute name + value columns. Efficient for targeted queries. |
| Schema flexibility | High: any valid JSON structure per row. | High: any valid JSON structure per row. | Very high: schema-free by default, with optional schema validation. | Low: new attribute types require schema changes or a generic value column (text, numeric, etc.). |
| SQL joins & transactions | Full ACID, JOINs, CTEs, window functions, row-level security. | Full ACID, JOINs, CTEs — same as JSONB for relational features. | No SQL joins. Multi-document transactions available but complex. | Full ACID, JOINs — EAV is purely relational. |
| Key ordering preserved | No. Keys are sorted in binary representation. | Yes. Exact input order preserved. | No. BSON does not guarantee key order. | N/A (no keys; attributes are rows). |
| Best use case | Variable-schema attributes, event payloads, plugin metadata, API response caching. | Audit logging where exact input fidelity is required; infrequently queried blobs. | Document-centric applications where the document is the primary retrieval unit. | Highly dynamic user-defined attributes in multi-tenant SaaS where schema varies per tenant. |
Performance characteristics vary significantly by workload, document size, query patterns, and hardware. Always benchmark your specific schema and query mix before making architectural decisions.
Key Takeaways
- Default to
jsonb, notjson. The only valid reason to usejsontoday is preserving exact key order or duplicate keys for round-trip fidelity, which almost no real application actually needs. For everything else,jsonbis strictly better. - Choose your GIN operator class intentionally. Use the default
jsonb_opswhen your queries mix containment (@>) and existence (?,?|,?&) operators. Usejsonb_path_opswhen all your indexed queries use@>containment — it is 30–50% smaller and faster for that operator. - Add functional B-tree indexes for high-frequency single-key filters. A query that always filters on
metadata ->> 'status'is faster with a B-tree index on(metadata ->> 'status')than with a GIN index, especially for range queries and sorts. - JSONB key ordering is not preserved. Code that compares JSONB column output byte-for-byte to the original input will break after migration. Audit your serialization logic before switching column types.
- Large documents with fixed schemas are an anti-pattern. If every row has the same 30 keys in its JSONB column, those 30 keys should be 30 regular columns. Normalized columns are smaller, faster, and far more maintainable over time.
- Use
EXPLAIN ANALYZEwith every JSONB query in development. JSONB queries that look like they should use a GIN index often do not, due to operator mismatch or planner cost estimates. Verify index usage before deploying to production.
Working with JusDB on PostgreSQL
Designing a JSONB schema that performs well at scale requires more than choosing the right column type. Index strategy, document size budgeting, query operator selection, and the boundary between semi-structured JSONB and normalized columns all interact in ways that are difficult to reason about without production data to benchmark against.
JusDB's PostgreSQL consultants have helped SaaS platforms, logistics companies, and fintech teams redesign JSONB schemas, rebuild GIN index strategies, and migrate poorly-performing json columns to properly indexed jsonb columns — without application downtime. We cover schema review, query plan analysis, index design, and ongoing capacity planning.
PostgreSQL Consulting Services Talk to a PostgreSQL Expert
If you are inheriting a PostgreSQL schema with heavily-used json columns, large unindexed JSONB blobs, or GIN indexes that are growing faster than your table, we offer a focused JSONB performance review: query plan analysis, index audit, and a prioritized remediation plan delivered within one week.
Related Reading
- PostgreSQL Explained: A Complete Guide for Modern Applications — foundational PostgreSQL concepts including storage, MVCC, and index types before diving into JSONB specifics
- PostgreSQL Full-Text Search with tsvector and GIN — GIN indexes for
tsvectorfollow the same architectural principles as GIN for JSONB; understanding both deepens your index intuition - PostgreSQL as a Vector Database: A Complete Guide — another PostgreSQL extension that uses a binary column type with specialized index support, comparable to the JSONB pattern