Your data lake probably has a 50 TB table nobody owns.

That looks like a storage problem. It usually is not.

Cheap storage is the bait. Operational fear is the bill.

Table sprawl is a governance failure

Self-service platforms make creation easy. That is good. They also make abandonment easy. That is expensive.

A table gets created for a migration, a backup, an experiment, a one-off analysis, or a dashboard that nobody opens anymore. The team changes. The owner leaves. The platform keeps paying.

The table may not be dangerous by itself. The habit is dangerous: create without owner, purpose, retention, and deletion path.

The metadata every table should carry

At minimum:

  • owner
  • team
  • purpose
  • retention rule
  • delete or archive rule
  • expected access pattern
  • SLA or business criticality
  • data sensitivity classification

This is not bureaucracy. This is the control plane that lets the platform safely ask, “does this still need to exist?”

Cleanup needs recovery

Teams will resist deletion if deletion feels irreversible. That is rational. A cleanup workflow should include review windows, exclusion lists, and recovery periods.

The goal is not to win arguments about storage. The goal is to make safe deletion normal.

Start with the largest tables

Do not boil the lake.

Take the 100 largest tables and answer:

  1. Who owns it?
  2. What uses it?
  3. When was it last read?
  4. How fast is it growing?
  5. What happens if it is archived?

If those questions are hard to answer, you have found the platform work.

The bill is just the symptom.