Analytics & OLAP

Data Lakehouse Architecture: Delta Lake, Apache Iceberg, and Apache Hudi

Compare Delta Lake, Apache Iceberg, and Apache Hudi — ACID transactions, time travel, and format interoperability

JusDB Team
January 27, 2026
13 min read
193 views

Data lakes promised cheap, scalable storage — but they delivered a swamp: no ACID guarantees, no schema enforcement, and query engines that choked on small files or stale partitions. The data lakehouse emerged as the answer, layering warehouse-grade reliability directly on object storage. Three open table formats now dominate that layer: Delta Lake, Apache Iceberg, and Apache Hudi. Each solves the same core problem differently, and picking the wrong one for your workload means paying the cost in engineering hours, query latency, or vendor lock-in. This post gives you the technical grounding to make that call confidently.

TL;DR
  • Lakehouse formats bring ACID transactions, schema evolution, and time travel to object storage (S3, GCS, ADLS).
  • Delta Lake (Databricks) uses a transaction log of JSON commit files and offers Z-ordering and OPTIMIZE for file compaction.
  • Apache Iceberg (Netflix) excels at hidden partitioning, partition evolution, and true multi-engine interoperability without vendor coupling.
  • Apache Hudi (Uber) is optimised for high-frequency upserts via copy-on-write and merge-on-read table types.
  • All three now support cross-format reads; Delta's UniForm feature can expose Delta tables as Iceberg metadata simultaneously.
  • Engine support varies — Spark is tier-1 everywhere; Flink, Trino, and Athena support differs in maturity.

What Is a Data Lakehouse?

A data lakehouse is an architecture that combines the low-cost, schema-flexible storage of a data lake with the data management and ACID transaction capabilities traditionally associated with data warehouses. The enabling insight is simple: object storage (S3, GCS, ADLS) is now fast enough and cheap enough to serve as the primary persistence layer, but raw Parquet files on object storage offer no transactional guarantees. If two writers touch the same partition simultaneously, you get corruption. If a job fails halfway through, you get partial data. If you need to understand what the table looked like last Tuesday, you have nothing.

Open table formats solve this by introducing a metadata layer that sits between the query engine and the raw Parquet files. This metadata layer tracks which files belong to the current snapshot, records schema history, enforces partition layout, and writes commit entries that allow atomic reads and writes. The result is full ACID semantics — atomicity, consistency, isolation, durability — on top of plain object storage, without a proprietary storage engine.

The three formats covered here all implement this pattern. Where they diverge is in how they structure that metadata, what operations they optimise for, and how tightly they couple to a specific compute engine.


Delta Lake

Delta Lake was open-sourced by Databricks in 2019 and donated to the Linux Foundation in 2021. It is the default table format on the Databricks Lakehouse Platform, which means it has deep integration with Apache Spark and a large production user base.

The Delta Log

Delta Lake's core abstraction is the Delta Log: a sequence of JSON commit files stored in a _delta_log/ directory alongside the Parquet data files. Each commit file records the set of files added and removed by that transaction. Every ten commits, Delta checkpoints the log into a Parquet snapshot file so readers do not have to replay thousands of JSON entries. The log is the source of truth for what files belong to the current table version.

This design makes time travel straightforward. You can query any prior version of the table using the VERSION AS OF syntax:

text
-- Spark SQL
SELECT * FROM my_table VERSION AS OF 42;

-- or by timestamp
SELECT * FROM my_table TIMESTAMP AS OF '2025-11-01 00:00:00';

As long as the underlying Parquet files have not been vacuumed (the VACUUM command with a retention threshold), older versions remain accessible.

OPTIMIZE and Z-Ordering

Small file accumulation is the canonical data lake performance killer. Delta Lake provides the OPTIMIZE command to compact small Parquet files into larger ones, reducing the number of file-open calls a query must make. You can combine OPTIMIZE with Z-ordering to co-locate related data:

text
OPTIMIZE my_table ZORDER BY (customer_id, event_date);

Z-ordering is a multi-dimensional clustering technique. Rather than sorting by a single column, it interleaves bits from multiple column values so rows with similar values in any of those columns end up in the same files. This dramatically reduces the number of files a predicate filter must open when queries filter on those columns.

Warning

Z-ordering is a proprietary Databricks optimisation within the open Delta Lake spec. The open-source Delta Lake library supports OPTIMIZE for compaction but Z-ordering is only available on the Databricks Runtime. Plan accordingly if you need to stay off Databricks compute.

Schema Enforcement and Evolution

Delta Lake enforces schema on write by default, rejecting data that does not match the table schema. Schema evolution is opt-in: you can add columns with mergeSchema, or use overwriteSchema for more destructive changes. Column mapping (introduced in Delta 1.2) allows renaming and dropping columns without rewriting data files.

UniForm: Cross-Format Interoperability

Delta Lake 3.0 introduced UniForm (Universal Format), which writes Iceberg metadata alongside the Delta Log when a table is created or altered with TBLPROPERTIES ('delta.universalFormat.enabledFormats' = 'iceberg'). Any Iceberg-compatible engine — Trino, Snowflake, BigQuery — can then read the table as a native Iceberg table without any ETL copy. This is a significant interoperability milestone for Delta Lake, which historically suffered from engine lock-in.


Apache Iceberg

Apache Iceberg was created at Netflix to solve a specific problem: Hive metastore partitioning required users to know the physical partition layout of a table and filter on partition columns explicitly, making partition changes a breaking operation. Iceberg was designed from the ground up to decouple logical table semantics from physical file layout.

Metadata Architecture

Iceberg's metadata hierarchy is more layered than Delta Lake's. At the top is a catalog (Hive Metastore, AWS Glue, Nessie, REST), which points to the current metadata file. The metadata file lists all snapshots. Each snapshot references one or more manifest lists, which enumerate manifest files, which in turn list the actual Parquet data files along with per-file column statistics. This layered structure enables partition pruning at the manifest level, avoiding file listing entirely for selective queries.

Hidden Partitioning and Partition Evolution

Iceberg's most celebrated feature is hidden partitioning. Instead of requiring users to create a column like event_year and filter on it, you declare a partition transform:

text
CREATE TABLE events (
  event_id BIGINT,
  occurred_at TIMESTAMP,
  user_id BIGINT
)
USING iceberg
PARTITIONED BY (days(occurred_at));

The engine derives the partition value from occurred_at automatically. Queries that filter on occurred_at get partition pruning transparently — no changes to query syntax, no user knowledge of the physical layout required.

More importantly, partitioning can be evolved without rewriting existing data. You can change the partition spec from days(occurred_at) to hours(occurred_at) going forward. Iceberg tracks which files belong to which partition spec and handles the mixed layout transparently.

Tip

Partition evolution makes Iceberg especially powerful for tables that grow significantly over time. You can start with coarse daily partitions and move to hourly as data volume increases — without a disruptive rewrite of historical data.

Row-Level Deletes

Iceberg supports row-level deletes via delete files — either position deletes (referencing specific row offsets in a data file) or equality deletes (matching rows by column value). This allows DELETE and UPDATE statements to avoid full file rewrites for small changes, which is critical for GDPR right-to-erasure workflows.

Multi-Engine Support

Iceberg's catalog-based design means any engine that implements the Iceberg spec can read and write the same table. Spark, Flink, Trino, Presto, Dremio, Snowflake, BigQuery, AWS Athena, and Databricks (via UniForm or native Iceberg support) all participate in the ecosystem. This is Iceberg's strongest competitive advantage for organisations that run heterogeneous compute environments.


Apache Hudi

Apache Hudi (Hadoop Upserts Deletes and Incrementals) was built at Uber to handle a problem that neither Delta nor Iceberg optimised for at the time: high-frequency, low-latency upserts into large tables. Uber's use case was ingesting database change data capture (CDC) streams — millions of row-level updates per hour — into Hadoop, and doing it with query latency measured in minutes rather than hours.

Table Types: Copy-on-Write vs Merge-on-Read

Hudi's most distinctive architectural choice is offering two table types that trade write cost for read cost:

Copy-on-Write (CoW) rewrites entire Parquet files when rows are inserted or updated. The result is clean, fully merged Parquet files that any reader can consume without Hudi-specific logic. Read performance is equivalent to plain Parquet. The cost is high write amplification — touching one row rewrites the entire file containing it.

Merge-on-Read (MoR) appends delta files (in Avro or Parquet) for new writes and only compacts them into base files during scheduled compaction runs. This dramatically reduces write latency and write amplification, at the cost of more complex reads: queries must merge base files with delta files at read time. Hudi exposes two read paths for MoR tables — a real-time view (merged) and a read-optimised view (base files only, without deltas) for engines that cannot handle the merge.

Warning

Merge-on-Read tables in Hudi require a compaction strategy. Without regular compaction, the number of delta files grows unbounded and read-time merge cost increases proportionally. Factor compaction scheduling into your operational runbook before adopting MoR in production.

Incremental Processing

Hudi's timeline — a log of commits, compactions, and clean operations stored in the .hoodie/ directory — enables incremental queries: you can ask for all records that changed since a given commit instant. This makes Hudi a natural fit for CDC pipelines where downstream consumers only want to process deltas, not re-scan the entire table.

Upsert Performance

Hudi's upsert implementation is built around an index (BloomFilter, HBase, or in-memory) that maps record keys to file groups, allowing the engine to locate exactly which files contain the rows being updated. This avoids the full-table scans that naive upsert implementations require and is the primary reason Hudi remains the preferred choice for high-frequency CDC ingestion pipelines.


Engine Support Matrix

Engine Delta Lake Apache Iceberg Apache Hudi
Apache Spark Tier 1 (native) Tier 1 (native) Tier 1 (native)
Apache Flink Read-only (community) Tier 1 (read + write) Tier 1 (read + write)
Trino / Presto Read-only Tier 1 (read + write) Read-optimised view
AWS Athena Read (v2 tables) Tier 1 (read + write) Read (CoW tables)
Databricks Tier 1 (native) Via UniForm or native Community connector
Snowflake Via UniForm (Iceberg compat) Native Iceberg tables Limited / manual
Google BigQuery No Native BigLake tables No
Dremio Read (Delta connector) Tier 1 (Arctic catalog) Limited
Tip

"Tier 1" here means the engine has first-class support for both reads and writes, including predicate pushdown, schema evolution, and transactional writes. Community connectors may lag behind on newer spec features — always verify version compatibility before assuming full feature parity.


Choosing the Right Format

No single format is universally superior. The right choice depends on your workload characteristics, engine ecosystem, and operational constraints.

Choose Delta Lake if: Your primary compute is Databricks or Spark, you want the most mature production ecosystem with the largest community, and you can accept some degree of Databricks coupling in exchange for deep platform integration. Z-ordering and Databricks Liquid Clustering provide excellent query performance for analytical workloads without manual tuning. UniForm now mitigates the interoperability concern substantially.

Choose Apache Iceberg if: You run a multi-engine environment — Spark for batch, Flink for streaming, Trino for ad-hoc queries, Athena for serverless — and need all engines to read and write the same tables without transformation. Iceberg's spec-driven design and its REST catalog standard make it the most portable choice. It is also the best option for tables where partition strategy needs to evolve over time, or for GDPR deletion workflows that benefit from row-level delete files.

Choose Apache Hudi if: Your dominant workload is high-frequency CDC upserts — think database replication or event stream ingestion with frequent updates to existing records. Hudi's record-key index and MoR table type reduce write amplification in ways the other formats cannot match at the same update frequency. Incremental query support also makes Hudi attractive when downstream consumers need change-data feeds rather than full table snapshots.

Format Interoperability Is Improving
  • Delta UniForm (Delta 3.0+) lets Iceberg-compatible engines read Delta tables natively.
  • Apache XTable (formerly OneTable, now an Apache incubator project) translates metadata between all three formats bidirectionally.
  • The Apache Iceberg REST catalog spec is becoming a de facto standard, with Delta and Hudi adding REST catalog support.

Key Takeaways

Key Takeaways
  • All three formats — Delta Lake, Iceberg, and Hudi — deliver ACID transactions, time travel, and schema evolution on object storage. The differences are in how they achieve those goals and what they optimise for.
  • Delta Lake's Delta Log is simple and battle-tested. Its Z-ordering and OPTIMIZE commands provide strong out-of-the-box performance for Spark-heavy environments, with UniForm closing the interoperability gap.
  • Iceberg's hidden partitioning and partition evolution are architectural differentiators that reduce operational burden as table schemas and data volumes change over time. Its multi-engine support is unmatched in breadth.
  • Hudi's copy-on-write and merge-on-read table types make it the right tool for CDC ingestion and high-frequency upsert pipelines where write amplification is a real operational cost.
  • Engine support matters as much as format features. Map your compute engines against the matrix above before committing to a format in production.
  • Cross-format interoperability (UniForm, XTable) is maturing rapidly. Format lock-in is becoming less permanent, but metadata translation still adds operational complexity.

Query Your Lakehouse Tables with JusDB

Regardless of which open table format you choose, JusDB connects to your lakehouse directly — Delta Lake, Iceberg, and Hudi tables on S3, GCS, or ADLS — without requiring you to move or copy data. Run SQL queries across your object storage tables from a single interface, with the performance optimisations and query push-down your format supports.

If you are evaluating lakehouse table formats or migrating an existing data lake to a structured open format, JusDB gives you a neutral query layer that works across all three. No vendor lock-in, no ETL pipelines, no waiting.

Start querying your lakehouse tables with JusDB — connect in minutes and run your first query today.

Share this article