Hold-before-execute: How we prevent runaway AI costs
March 18, 2026
Most AI APIs bill you after execution. You find out what it cost when the invoice arrives. If a bug in your code created an infinite loop calling GPT-4, you discover the damage days later.
The hold-before-execute model
ModelRoute uses a billing model inspired by how credit card authorizations work:
1. **Estimate**: When you submit an execution, we estimate the cost based on the model and input parameters. 2. **Hold**: We place a hold on your balance for 1.2x the estimated cost (minimum $0.01). If your balance is insufficient, the execution is rejected immediately. 3. **Execute**: The work proceeds with funds already reserved. 4. **Settle**: On completion, we settle the hold at the actual cost. The difference is released back to your balance. 5. **Release**: If the execution fails, the entire hold is released.
Why this matters
- **No surprise bills**: You can't spend more than your balance. Ever.
- **Immediate feedback**: Insufficient balance is caught at submission time, not after a $500 batch completes.
- **Safe automation**: Your AI agents and batch pipelines can't create unbounded spend.
- **Simple reconciliation**: One balance, one bill. No per-provider invoices to reconcile.
Enterprise controls
For enterprise customers, we offer additional billing controls: - **Auto top-up**: Automatically replenish your balance via Stripe when it falls below a threshold. - **Organization-scoped balances**: Billing is per-organization, not per-user. - **Volume discounts**: Available by separate agreement.
This is how billing should work for AI infrastructure: predictable, safe, and transparent.