AI Billing

What Is Usage-Based Billing?

By Nick Thomson·8 min read·Updated April 2026

Usage-based billing charges customers based on their actual consumption of a product or service during each billing cycle. Instead of flat monthly fees, customers pay in proportion to what they use. This article covers how it works, the five main pricing structures, and what changes when the product you're billing is AI.

How does usage-based billing work?

Usage-based billing converts product actions into charges. A customer performs an action (generates an image, makes an API call, processes a document), and that action becomes a billable event. A pricing rule converts the event into a charge. That loop repeats for every action, every customer, every cycle.

Think of it like an electricity meter. You do not pay a flat monthly fee regardless of how much power you use. You pay for kilowatt-hours consumed. Usage-based billing applies the same principle to software: identify which product actions customers should pay for, set a price per event, and the billing system handles the rest.

The infrastructure underneath that loop has two jobs. First, it records every billable event. Second, it converts those events into charges using your pricing rules. How those two jobs are sequenced matters enormously, particularly for AI products. More on that shortly.

What are the main usage-based pricing models?

Usage-based billing is not one model. It is a family of models, each suited to different products and customer relationships.

ModelHow it worksBest forExample
Per-unitFixed rate per event occurrence, regardless of the work done internallyDiscrete outcomes: image generated, ticket resolved, document processedMidjourney: per image generated; Intercom: per ticket resolved
TieredRate per unit decreases as volume increasesCloud storage, data transferAWS S3 storage tiers
VolumePrice based on a measurable quantity within each eventAI inference where cost scales with input or output sizeOpenAI: tokens consumed per chat_completed event
Prepaid creditsCustomers buy credits in advance; events deduct from balanceAI products, developer toolsAnthropic API credits
HybridSubscription base fee plus usage chargesSaaS with AI features addedClay: unlimited seats with recurring credit allocation

A sixth pattern worth naming separately is dimensional pricing. This is where the price per event varies based on attributes of the event itself, not volume alone. A video generation service might charge $1.00 per minute for fast processing and $0.40 per minute for standard. An AI coding assistant might charge different rates depending on which model handles the request. You define one price structure, assign rates to each dimension value, and the billing system reads the attribute from each incoming event and applies the right rate automatically. For products where infrastructure cost varies by task complexity or model selection, dimensional pricing is the most accurate way to reflect that variability.

Who uses usage-based billing?

Usage-based billing started in cloud infrastructure. AWS, GCP, and Azure built their businesses on it. Customers pay for compute, storage, and data transfer by the unit. It remains the dominant model in that category.

Credit-based consumption pricing has surged across SaaS. The PricingSaaS Trends Report (Q1 2026) tracked 498 companies across 12 software categories and found credit model adoption grew 126% year-over-year in 2025. Companies like Figma and HubSpot added credit systems alongside their existing subscription models as AI features became core functionality rather than optional add-ons. The Metronome Pricing Index (2026) shows that 15 of the 33 major AI and SaaS companies it tracks use usage-based or hybrid pricing, including OpenAI, Anthropic, Cohere, Lovable, and AWS Lambda.

Key trade-offs to understand before committing to the model:

  • Revenue expands automatically with usage. High-consumption customers pay more without manual upgrades or upsells.
  • Customers start at low or zero cost, which reduces adoption friction and churn from customers overpaying for unused seats.
  • Revenue is harder to predict. Customers who use less in a given month generate less revenue, and forecasting requires usage modeling rather than headcount.
  • Bill shock is a significant churn risk. Customers who receive an unexpectedly large invoice tend not to renew. Real-time balance visibility and spend alerts are churn prevention, not a nice-to-have.

Why is billing AI products different?

Billing AI products is different because every product action has a direct, variable infrastructure cost that hits your account before the customer pays. Traditional SaaS billing has a forgiving property: a collaboration tool costs roughly the same per seat whether usage is light or heavy, so invoicing at month-end carries little exposure. AI products break that assumption in three specific ways.

Every product action has a direct infrastructure cost

Inference costs vary by an order of magnitude across models. Anthropic's Claude Opus charges $25 per million output tokens as of early 2026, down from $75 following a 67% price cut (PricingSaaS Trends Report, Q1 2026), while lighter open-source models can cost a fraction of that. That cost hits before you have collected from the customer.

Concurrent requests create credit depletion races

A customer who fires ten simultaneous requests needs all ten checked against their available balance before any proceed. You cannot process them incrementally. A single atomic check determines whether all ten proceed or all ten are blocked. Traditional metering systems were not designed for this.

Your margins change every time you switch models

If you run on Claude, your per-request cost is Y. Switch to GPT-4 and it becomes 3Y. Switch to Llama and it drops to 0.1Y. Cost is tied to model selection, not customer behavior. Traditional billing tracks what customers do. AI billing also needs to track what infrastructure you used to serve them.

Dimensional pricing becomes essential here. When cost varies by model, quality tier, or task complexity, pricing must adapt to the attributes of each event rather than apply a static rate uniformly.

OpenAI prices by tokens consumed per completion: volume within each event. Anthropic sells prepaid API credits customers draw down as they use the API. Midjourney charges per image generated: a flat rate per event, regardless of what the model did internally to produce it. Each is a response to the same underlying reality: AI product costs are variable, and pricing infrastructure must reflect that.

Real-time billing vs post-usage invoicing

Post-usage invoicing captures usage events throughout the billing period and generates charges at cycle end. Real-time billing authorizes and deducts at the moment each event occurs, checking the customer's balance before the work happens. The distinction determines how much spend exposure you carry and whether runaway usage is possible at all.

Post-usage invoicing platforms aggregate events against meters you define upfront, then bill at period close. This works well for predictable SaaS products where costs are stable and customers are unlikely to exhaust their budget overnight.

Real-time billing closes a loop that post-usage invoicing leaves open. Authorization happens before the cost is incurred, not after. When hard spend controls are enabled, a request that would exceed the customer's balance is blocked before the work proceeds.

Post-usage invoicingReal-time billing
When does deduction happen?End of billing periodAt the moment of each event
Balance checked before usage?NoYes
Spend exposureUnbounded until period closesCapped at current balance when hard limits are enabled
Best forPredictable SaaS usage, enterprise invoicingAI products, prepaid credit models
Runaway spend protectionRequires separate limit logicBuilt into the authorization layer

For AI products, post-usage invoicing introduces a specific exposure: a customer's agent can run unconstrained overnight, consuming inference at cost, and you will not know until the billing period closes. Real-time billing closes that loop.

How do you implement usage-based billing?

Implementing usage-based billing well comes down to five decisions made before you write any code.

1. Identify your billable events and outcomes. The starting question is not "what metric do I track?". The starting question is: what product actions should customers pay for? For an AI coding assistant, is it per suggestion accepted? Per model call? Per session? The answer shapes everything downstream. Start with what customers understand and what maps to the value they receive.

2. Decide when billing happens. Post-usage invoicing or real-time deduction. This is a business decision, not a configuration detail. It comes down to two things: how your customers expect to be billed, and whether you can afford to front their usage costs until month-end.

3. Define pricing rules before writing code. Per-unit rates, tiered structures, and dimensional pricing need to be modeled before implementation, not discovered during it. Pricing that seems simple often has edge cases: volume discounts, promotional grants, free tiers, plan entitlements. Map them out first.

4. Give customers real-time balance and usage visibility. Customers who cannot see what they have spent or what they have left will be surprised by their bill. Spend alerts and balance displays are churn prevention tools.

5. Choose infrastructure that does not require rebuilding when pricing evolves. Pricing models change. New AI models get added. Dimensional pricing gets introduced. Scaling teams need it to handle new complexity without rearchitecting.


Stay updated

Monthly updates on AI billing platform changes, pricing updates, and new comparisons.

No spam. Unsubscribe anytime.