Iceberg said 321 GB. S3 said 46 TB.

Same table.

That mismatch is what orphan files look like when nobody catches them early. The table metadata can be technically correct while the object store is quietly carrying the cost of every failed migration, skipped cleanup, and broken heuristic around it.

How this happens

An Iceberg table is not just a folder full of Parquet files. The table is the set of files reachable through metadata: snapshots, manifests, and manifest lists. If a file is no longer referenced by that metadata, the query engine should ignore it.

S3 does not care. It bills for objects.

This distinction is useful until it becomes expensive. A migration can leave files behind. A failed rewrite can strand data. A cleanup job can skip older tables because of a heuristic bug. A maintenance process can report success because it ran, not because it selected the right tables or removed the right files.

That is how a table with hundreds of gigabytes of live data can carry tens of terabytes of unreachable baggage.

Why metadata alone misleads platform owners

Table metadata answers a logical question: what files belong to the current table state?

Object storage answers a physical question: what bytes exist under this path?

Both are true. Neither is enough alone.

If the platform only monitors Iceberg metadata, the table looks healthy. Query planning is fine. Row counts make sense. Storage estimates look manageable. Meanwhile S3 inventory may show a completely different operational reality.

That is not a dashboard problem. It is a control problem.

The compliance angle

Orphan files are not just wasted storage.

If old files contain sensitive data, they may sit outside the lifecycle and governance assumptions used by the table. The catalog says the table was cleaned. The retention policy says old data should be gone. The object store still has files nobody is querying and nobody is looking at.

That is a nasty place to discover undeclared risk.

What to monitor

Start with a boring reconciliation job:

  • live data size from Iceberg metadata
  • physical object size from storage inventory
  • reachable file count versus total file count under the table path
  • tables skipped by cleanup jobs
  • cleanup candidates rejected because of corrupt snapshots
  • cleanup results verified by object inventory after deletion

The important metric is not “did the maintenance job run?” It is “did physical storage converge toward logical table state?”

The practical checklist

For the largest tables, review four things:

  1. Does Iceberg metadata size roughly match object-store size?
  2. Are orphan-file cleanup jobs selecting every table class they should?
  3. Are failures and skipped tables visible, or buried in green job status?
  4. Can you prove deletion happened after cleanup?

If your S3 bill keeps climbing while your Iceberg tables look small, stop trusting table metadata alone.

The table format tells you what is alive. The object store tells you what still costs money.