Why Chargeback Fails in Shared Data Platforms

Chargeback fails when the model tries to be perfectly fair before it becomes useful.

Shared data platforms break simple cost allocation. The collector, Kafka cluster, query engine, object store, scheduler, and maintenance jobs may serve dozens of teams at once. Tagging the infrastructure tells finance who owns the platform. It does not tell product teams what behavior created the bill.

Showback first

Chargeback changes incentives, so the numbers need trust. Start with showback: publish a monthly view of usage and estimated cost without internal billing.

The first job is not precision. The first job is making waste visible enough that teams believe the direction.

Attribute usage, not boxes

For scheduled workloads, useful attribution usually comes from job metadata:

Airflow DAG owner
Spark application labels
Trino query tags
table owner
event owner
dashboard or notebook owner

For clickstream systems, event volume may be the cleanest starting unit. If total monthly cost and total event count are known, a flat price per 1,000 events gives every team a number they can understand.

Is it perfect? No.

It ships. It changes behavior. That matters more.

When to add complexity

Add weighted pricing only when the simple model exposes a real distortion. Payload-heavy events, abusive ad-hoc queries, and teams using shared platform events may need extra rules.

But starting there is a trap. The attribution model becomes the project, and six months later finance still has one big bill.

The trust test

A chargeback model is ready when teams can answer:

What did we do that drove this cost?
What can we change next month?
Why is this number directionally fair?

If the model cannot answer those, it is theatre with spreadsheets.