Engineering4 min read

Hold-before-execute: How we prevent runaway AI costs

March 18, 2026

Most AI APIs bill you after execution. You find out what it cost when the invoice arrives. If a bug in your code created an infinite loop calling GPT-4, you discover the damage days later.

The hold-before-execute model

ModelRoute uses a billing model inspired by how credit card authorizations work:

1. **Estimate**: When you submit an execution, we estimate the cost based on the model and input parameters. 2. **Hold**: We place a hold on your balance for 1.2x the estimated cost (minimum $0.01). If your balance is insufficient, the execution is rejected immediately. 3. **Execute**: The work proceeds with funds already reserved. 4. **Settle**: On completion, we settle the hold at the actual cost. The difference is released back to your balance. 5. **Release**: If the execution fails, the entire hold is released.

Why this matters

**No surprise bills**: You can't spend more than your balance. Ever.
**Immediate feedback**: Insufficient balance is caught at submission time, not after a $500 batch completes.
**Safe automation**: Your AI agents and batch pipelines can't create unbounded spend.
**Simple reconciliation**: One balance, one bill. No per-provider invoices to reconcile.

Enterprise controls

For enterprise customers, we offer additional billing controls: - **Auto top-up**: Automatically replenish your balance via Stripe when it falls below a threshold. - **Organization-scoped balances**: Billing is per-organization, not per-user. - **Volume discounts**: Available by separate agreement.

This is how billing should work for AI infrastructure: predictable, safe, and transparent.

All posts