What is Apache Iceberg?
Apache Iceberg is an open-source table format for large analytical datasets in the cloud. Originally developed by Netflix and Apple, it is now an Apache top-level project backed by all the major cloud vendors.
The core promise of Iceberg
Iceberg solves the fundamental problem of data lakes: how do you manage millions of files on S3 or ADLS as one coherent, reliable table — without locking yourself into a single query engine?
- ACID transactions: Safe concurrent reads and writes
- Time travel: Query data at any historical point
- Schema evolution: Add columns without rewriting data
- Partition evolution: Change partitioning without migration
- Multi-engine: Use Spark, Flink, Trino and Snowflake on the same data at the same time
In 2026, Iceberg is no longer "new" — it has become the industry standard. AWS Athena, Google BigQuery, Snowflake and Databricks all support it natively. If you are building a new data platform today, Iceberg is the logical choice.
How Does Apache Iceberg Work Under the Hood?
Iceberg stores data as ordinary Parquet (or ORC/Avro) files, but adds a metadata layer on top that keeps track of everything.
Data Files
The actual data: Parquet, ORC or Avro files on S3, ADLS or GCS. Iceberg does not write these any differently — it is just cloud storage.
Manifest Files
Lists of data files with statistics (min/max per column). Query engines use this for data pruning — skipping files that are not relevant.
Manifest List
A snapshot of the table at a given moment: which manifest files belong to this version? This is what makes time travel possible.
Metadata File
The master file: schema, partitioning, snapshot history, statistics. This is the starting point for every query engine.
Snapshot-based Architecture
Every write (INSERT, UPDATE, DELETE) creates a new snapshot. Old snapshots are retained until you explicitly remove them with expire_snapshots(). This gives you:
- Atomic commits: A write either succeeds fully or not at all
- Concurrent reads: Readers always see a consistent snapshot
- Time travel: Query any historical snapshot
- Rollback: Revert a bad write in a single command
Apache Iceberg vs Delta Lake vs Apache Hudi
There are three major open table formats. Here is the honest comparison:
| Feature | Apache Iceberg | Delta Lake | Apache Hudi |
|---|---|---|---|
| Origin | Netflix / Apple | Databricks | Uber |
| ACID transactions | ✅ Yes | ✅ Yes | ✅ Yes |
| Time travel | ✅ Yes (snapshots) | ✅ Yes (versions) | ⚠️ Limited |
| Multi-engine support | ✅ Best (Spark, Flink, Trino, Snowflake, BigQuery) | ⚠️ Growing (via UniForm) | ⚠️ Spark-focused |
| Partition evolution | ✅ Yes (without migration) | ❌ No | ❌ No |
| Schema evolution | ✅ Full | ✅ Full | ✅ Limited |
| Vendor neutrality | ✅ Maximum | ⚠️ Databricks-leaning | ✅ Good |
| AWS native support | ✅ Athena, Glue, EMR | ⚠️ Via Databricks/EMR | ✅ EMR |
| Snowflake support | ✅ Native | ⚠️ Via UniForm (limited) | ❌ None |
The Winner in 2026?
For multi-cloud or multi-engine environments, Iceberg is the clear winner thanks to its broad adoption. If you are fully on Databricks, Delta Lake is still perfectly fine. Hudi is strong for high-frequency upserts (CDC use cases).
The Most Powerful Iceberg Features Explained
Partition Evolution — Unique to Iceberg
This is the feature that sets Iceberg apart. With Delta Lake and Hudi you have to rewrite the entire table when you change partitioning. With Iceberg you can change the partitioning strategy without moving a single byte of data.
-- Start with partitioning by day
CREATE TABLE orders (
order_id BIGINT,
customer_id BIGINT,
amount DECIMAL(10,2),
order_date TIMESTAMP
)
USING iceberg
PARTITIONED BY (days(order_date));
-- Later grow to hourly (without rewriting the table!)
ALTER TABLE orders
ADD PARTITION FIELD hours(order_date);
-- Iceberg writes new data in the new partition structure,
-- old data stays unchanged. Query engines understand both.
Time Travel & Rollback
Query historical data or recover accidentally deleted records:
-- Query yesterday's table (Spark SQL)
SELECT * FROM orders
TIMESTAMP AS OF '2026-03-20 09:00:00';
-- Query a specific snapshot
SELECT * FROM orders VERSION AS OF 4521;
-- Roll back to a previous snapshot
CALL catalog.system.rollback_to_snapshot('db.orders', 4521);
-- View snapshot history
SELECT * FROM orders.snapshots;
Schema Evolution Without Downtime
Add, rename or drop columns — safely and without rewriting data:
-- Add a column (takes effect immediately, no rewrite)
ALTER TABLE orders ADD COLUMN discount DECIMAL(5,2);
-- Rename a column
ALTER TABLE orders RENAME COLUMN amount TO order_amount;
-- Reorder a column
ALTER TABLE orders ALTER COLUMN discount AFTER order_amount;
-- Drop a column (data remains, but is no longer read)
ALTER TABLE orders DROP COLUMN legacy_flag;
Row-level Deletes & Updates (Copy-on-Write vs Merge-on-Read)
Iceberg supports two strategies for updates and deletes:
-- Copy-on-Write (CoW): rewrites files on every update
-- → Faster reads, slower writes
-- Good for: batch updates, reporting tables
-- Merge-on-Read (MoR): stores delete/update markers
-- → Faster writes, reads require a merge
-- Good for: frequent CDC / streaming updates
-- MERGE INTO (upsert) — works in Spark, Trino, Flink
MERGE INTO orders t
USING updates s ON t.order_id = s.order_id
WHEN MATCHED AND s.status = 'cancelled' THEN DELETE
WHEN MATCHED THEN UPDATE SET t.amount = s.amount, t.status = s.status
WHEN NOT MATCHED THEN INSERT *;
Multi-engine: Iceberg's Biggest Trump Card
The biggest advantage of Iceberg over the alternatives is that multiple engines can read and write the same table at the same time without coordination:
AWS Stack
- AWS Glue: ETL jobs on Iceberg tables
- Amazon Athena: SQL queries without a cluster
- Amazon EMR: Spark/Hive/Flink on Iceberg
- AWS Lake Formation: Governance + Iceberg
- Amazon Redshift: Spectrum on Iceberg tables
Azure / Microsoft
- Azure Databricks: Native Iceberg support
- Microsoft Fabric: OneLake + Iceberg
- Azure HDInsight: Spark on Iceberg
- Azure Synapse: Via Spark pools
Google Cloud
- BigQuery: Native Iceberg tables (BigLake)
- Dataproc: Spark + Iceberg
- Dataflow: Beam pipelines on Iceberg
Open Source Engines
- Apache Spark: Best integration
- Apache Flink: Streaming into Iceberg
- Trino / Presto: Ad-hoc SQL
- DuckDB: Local queries on Iceberg
- Dremio: BI acceleration on Iceberg
Real-world Scenario: Netflix's Iceberg Setup
Netflix, the inventor of Iceberg, uses it like this:
- Flink: Real-time streaming writes events into Iceberg (millions per second)
- Spark: Batch ETL transformations and compaction jobs
- Trino: Data analysts query the data ad-hoc via SQL
- The same S3 tables — no data copying or syncing
Result: one platform for streaming and batch, with full SQL support and no vendor lock-in.
Getting Started with Apache Iceberg
Test locally with Spark + Iceberg
# pip install pyspark
# Download the Iceberg Spark runtime JAR
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.appName("IcebergDemo") \
.config("spark.jars.packages",
"org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.0") \
.config("spark.sql.extensions",
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
.config("spark.sql.catalog.local", "org.apache.iceberg.spark.SparkCatalog") \
.config("spark.sql.catalog.local.type", "hadoop") \
.config("spark.sql.catalog.local.warehouse", "/tmp/iceberg-warehouse") \
.getOrCreate()
# Create a table
spark.sql("""
CREATE TABLE local.db.orders (
order_id BIGINT,
customer_id BIGINT,
amount DECIMAL(10,2),
order_date DATE
)
USING iceberg
PARTITIONED BY (months(order_date))
""")
# Insert data
spark.sql("""
INSERT INTO local.db.orders VALUES
(1, 101, 99.95, DATE '2026-03-01'),
(2, 102, 249.00, DATE '2026-03-15'),
(3, 101, 59.99, DATE '2026-03-20')
""")
# Query with time travel
spark.sql("SELECT * FROM local.db.orders.snapshots").show()
spark.sql("SELECT * FROM local.db.orders").show()
Iceberg on AWS: Glue + Athena
-- In AWS Glue Data Catalog / Athena:
-- Create a table as Iceberg
CREATE TABLE orders (
order_id BIGINT,
customer_id BIGINT,
amount DOUBLE,
order_date DATE
)
LOCATION 's3://my-datalake/orders/'
TBLPROPERTIES (
'table_type' = 'ICEBERG',
'format' = 'parquet',
'write_compression' = 'snappy'
);
-- Insert data (no S3 copy needed!)
INSERT INTO orders VALUES (1, 101, 99.95, DATE '2026-03-21');
-- UPDATE (Athena engine v3 + Iceberg)
UPDATE orders
SET amount = 89.95
WHERE order_id = 1;
-- DELETE
DELETE FROM orders WHERE order_id = 1;
-- Time travel
SELECT * FROM orders
FOR SYSTEM_TIME AS OF TIMESTAMP '2026-03-20 12:00:00';
Migrating from Parquet to Iceberg (in-place)
-- Migrate an existing Parquet table to Iceberg
-- WITHOUT copying data (only create metadata)
-- In Spark:
from pyspark.sql import SparkSession
# Migrate an existing Parquet table
spark.sql("""
CALL catalog.system.migrate(
'db.existing_parquet_table'
)
""")
-- Or via snapshot (safer: the original stays intact)
spark.sql("""
CALL catalog.system.snapshot(
source_table => 'parquet.db.orders',
table => 'iceberg.db.orders'
)
""")
-- Validate the migration
spark.sql("SELECT COUNT(*) FROM iceberg.db.orders").show()
spark.sql("SELECT * FROM iceberg.db.orders.snapshots").show()
Table Maintenance: Compaction & Cleanup
-- Compact small files together (compaction)
CALL catalog.system.rewrite_data_files(
table => 'db.orders',
strategy => 'sort',
sort_order => 'order_date ASC'
);
-- Remove expired snapshots (data cleanup)
CALL catalog.system.expire_snapshots(
table => 'db.orders',
older_than => TIMESTAMP '2026-03-01 00:00:00',
retain_last => 5
);
-- Remove unused metadata and orphan files
CALL catalog.system.remove_orphan_files(
table => 'db.orders',
older_than => TIMESTAMP '2026-03-01 00:00:00'
);
-- View table statistics
SELECT * FROM db.orders.files LIMIT 20;
SELECT * FROM db.orders.partitions;
Iceberg Catalogs: The Central Point
A catalog is how Iceberg keeps track of which tables exist and where their metadata lives. The choice of catalog is crucial to your architecture:
| Catalog Type | When to use | Examples | Governance |
|---|---|---|---|
| AWS Glue Catalog | AWS environments | Athena, EMR, Glue Jobs | Lake Formation |
| Hive Metastore | On-premise or migration | Spark, Hive, Trino | Ranger / Kerberos |
| Nessie | Git-like data versioning | Dremio, Spark | Branch-based |
| REST Catalog | Multi-cloud / vendor-neutral | Tabular, Snowflake Open Catalog | API-based |
| Unity Catalog | Databricks ecosystem | Databricks, Delta/Iceberg via UniForm | Column-level |
Recommendation: REST Catalog as a Future Strategy
The industry is moving toward an open REST Catalog API (the Apache Iceberg REST spec). This gives you maximum portability: switch query engine or cloud provider without migrating your catalog. Snowflake Open Catalog (based on Polaris) and Tabular are the frontrunners here.
Best Practices for Production
Partitioning Strategy
- Use hidden partitioning: Iceberg adds partition transforms (days, months, hours, bucket, truncate)
- Start conservatively: prefer partitions that are too large over too small
- Avoid high-cardinality columns as a partition key
- Use
bucket(N, id)for evenly-distributed data
File Size Management
- Target file size: 128MB – 512MB per Parquet file
- Schedule daily compaction jobs outside peak hours
- Use
rewrite_data_fileswith sort for better query performance - Monitor
db.table.filesfor file count trends
Snapshot Retention
- Keep snapshots for at least 7 days to recover from mistakes
- Set
history.expire.min-snapshots-to-keep = 5 - Automate
expire_snapshotsandremove_orphan_files - Monitor S3 costs: old snapshots count too
Query Performance
- Use Z-ordering (sort order) on columns you filter on
- Enable bloom filter indexes for high-cardinality lookups
- Write metadata via
write.metadata.metrics.column - Prefer positional deletes over equality deletes (faster reads)
Conclusion: When Should You Choose Apache Iceberg?
Perfect for Iceberg if...
✅ Choose Iceberg
- You use multiple query engines (Spark + Trino + Athena)
- You want no vendor lock-in (e.g. not purely Databricks)
- You are on AWS and use Glue/Athena
- You want Snowflake and Spark on the same data
- You are building a new platform from scratch
- You need partition evolution (growing data)
⚠️ Stick with Delta Lake if...
- You are fully on Databricks and happy with it
- You use Delta Live Tables (DLT)
- Your team is already deep in the Delta ecosystem
- You need Unity Catalog with fine-grained access control
🚀 The Future
- Databricks UniForm: Delta and Iceberg at the same time
- Open REST Catalog: universal interoperability
- Apache XTable: automatic format bridging
- All major vendors are moving toward Iceberg compatibility
Apache Iceberg is no longer a niche choice — it is the safest long-term investment for your data platform in 2026. The broad vendor adoption guarantees you can always deploy the best tools without migrating data again.
Implementing Apache Iceberg?
Looking for a Data Engineer with Iceberg experience, or want advice on the right open table strategy?