LLM API Cost Calculator
Model monthly token spend across Claude, GPT-5, and Gemini. See your effective cost per request, monthly burn, and gross margin impact.
Model & Pricing
Traffic
Optimization
Why LLM Costs Need Their Own Tracking
LLM API spend is the single fastest-growing line item in startup budgets — up 380% year-over-year for the median company in 2026. Unlike per-seat SaaS, API spend scales with product traffic, not headcount. A successful product launch can spike monthly LLM bills from $2,000 to $40,000 within weeks, and founders without dedicated tracking get blindsided by the bill before they understand the unit economics.
The most important question isn't "how much do we spend on AI?" — it's "what percentage of revenue does our LLM spend represent?" If your AI feature is customer-facing, every token is cost of goods sold. Keep that ratio below 12% to maintain SaaS-typical 65-75% gross margins. Above 15%, gross margins compress to 50-60% and investors begin pricing your company as AI-services rather than SaaS — a multiple haircut of 30-50%. Cross-reference with our AI tooling spend benchmark for stage-specific context.
Prompt caching is the highest-leverage optimization most teams skip. Anthropic and OpenAI charge 90% less for cached input tokens, with a cache TTL of roughly 5 minutes. Well-engineered systems with stable system prompts and shared context achieve 60-80% cache hit rates, cutting input token costs by 54-72% with zero behavior change. Use the slider above to see how cache hit rate moves your monthly bill. Pair this with our AI agent operating cost calculator if you also have agent platform fees on top of raw API spend.
Model selection is the other major lever. The cost gap between frontier models (Opus, GPT-5) and fast models (Haiku, GPT-5 mini, Gemini Flash) is 10-30x. Most production workloads do fine on cheap models for 70-80% of requests, reserving frontier models for hard reasoning tasks. Build a model-routing layer once your LLM bill crosses $5,000 per month — payback is typically under 60 days. See how this affects your runway with our burn rate calculator.
Frequently Asked Questions
How do you calculate LLM API costs?+
LLM API costs equal (monthly input tokens / 1,000,000 × input price) plus (monthly output tokens / 1,000,000 × output price). Most providers price input and output tokens differently, with output tokens 3-5x more expensive. Prompt caching can discount cached input tokens by 90% on Anthropic and OpenAI. Multiply per-request token usage by your monthly request volume and apply the formula across all model tiers you use.
What percentage of revenue should LLM API costs be?+
Keep customer-facing LLM API spend below 12% of revenue to maintain SaaS-typical 65-75% gross margins. Above 15% of revenue, gross margins compress to 50-60% and investors begin pricing the company as AI-services rather than SaaS. See our SaaS gross margin improvement guide for tactical levers.
How does prompt caching reduce LLM costs?+
Prompt caching reuses processed input tokens across requests with the same prefix (system prompts, tool definitions, RAG context). Anthropic and OpenAI charge 90% less for cached input tokens, with cache TTL of roughly 5 minutes. Well-engineered systems achieve 40-80% cache hit rates, which can cut input token costs by 36-72% without changing the underlying model behavior.
What is the cheapest LLM API for production?+
Among frontier models, Gemini 2.5 Flash is the cheapest at $0.50/$3 per 1M input/output tokens, followed by Claude Haiku 4.5 at $1/$5. For most production workloads, the right choice is the cheapest model that meets quality requirements. Many production systems use cheap models for routine work and reserve frontier models for hard reasoning tasks via a model-routing layer.
Track LLM Spend in Your Live Financials
Categorize API spend by vendor, model tier, and feature — see real-time gross margin impact alongside every other line item.
Start Free Trial