Skip to main content
Free Finance Tool

LLM API Cost Calculator

Model monthly token spend across Claude, GPT-5, and Gemini. See your effective cost per request, monthly burn, and gross margin impact.

8 model familiesPrompt cache impactCOGS % of revenue

Model & Pricing

Traffic

Optimization

Cached input tokens get a 90% discount on most providers
Monthly LLM API Spend
$5,040
$61,320 per year
Daily spend
$168
Cost per request
$0.0168
% of revenue (COGS impact)
Caution
10.1%
Under 8% maintains SaaS-typical 65-75% gross margins. Above 15% pushes you into AI-services territory.
Token usage
Monthly input tokens750,000,000
Monthly output tokens240,000,000
Effective input price$1.92 / 1M
Output price$15.00 / 1M
Pricing reflects April 2026 published rates and may have changed since. A 40% prompt-cache hit rate is conservative for well-engineered systems; heavy-system-prompt workloads commonly reach 60-80%.

Why LLM Costs Need Their Own Tracking

LLM API spend is the single fastest-growing line item in startup budgets — up 380% year-over-year for the median company in 2026. Unlike per-seat SaaS, API spend scales with product traffic, not headcount. A successful product launch can spike monthly LLM bills from $2,000 to $40,000 within weeks, and founders without dedicated tracking get blindsided by the bill before they understand the unit economics.

The most important question isn't "how much do we spend on AI?" — it's "what percentage of revenue does our LLM spend represent?" If your AI feature is customer-facing, every token is cost of goods sold. Keep that ratio below 12% to maintain SaaS-typical 65-75% gross margins. Above 15%, gross margins compress to 50-60% and investors begin pricing your company as AI-services rather than SaaS — a multiple haircut of 30-50%. Cross-reference with our AI tooling spend benchmark for stage-specific context.

Prompt caching is the highest-leverage optimization most teams skip. Anthropic and OpenAI charge 90% less for cached input tokens, with a cache TTL of roughly 5 minutes. Well-engineered systems with stable system prompts and shared context achieve 60-80% cache hit rates, cutting input token costs by 54-72% with zero behavior change. Use the slider above to see how cache hit rate moves your monthly bill. Pair this with our AI agent operating cost calculator if you also have agent platform fees on top of raw API spend.

Model selection is the other major lever. The cost gap between frontier models (Opus, GPT-5) and fast models (Haiku, GPT-5 mini, Gemini Flash) is 10-30x. Most production workloads do fine on cheap models for 70-80% of requests, reserving frontier models for hard reasoning tasks. Build a model-routing layer once your LLM bill crosses $5,000 per month — payback is typically under 60 days. See how this affects your runway with our burn rate calculator.

Frequently Asked Questions

How do you calculate LLM API costs?+

LLM API costs equal (monthly input tokens / 1,000,000 × input price) plus (monthly output tokens / 1,000,000 × output price). Most providers price input and output tokens differently, with output tokens 3-5x more expensive. Prompt caching can discount cached input tokens by 90% on Anthropic and OpenAI. Multiply per-request token usage by your monthly request volume and apply the formula across all model tiers you use.

What percentage of revenue should LLM API costs be?+

Keep customer-facing LLM API spend below 12% of revenue to maintain SaaS-typical 65-75% gross margins. Above 15% of revenue, gross margins compress to 50-60% and investors begin pricing the company as AI-services rather than SaaS. See our SaaS gross margin improvement guide for tactical levers.

How does prompt caching reduce LLM costs?+

Prompt caching reuses processed input tokens across requests with the same prefix (system prompts, tool definitions, RAG context). Anthropic and OpenAI charge 90% less for cached input tokens, with cache TTL of roughly 5 minutes. Well-engineered systems achieve 40-80% cache hit rates, which can cut input token costs by 36-72% without changing the underlying model behavior.

What is the cheapest LLM API for production?+

Among frontier models, Gemini 2.5 Flash is the cheapest at $0.50/$3 per 1M input/output tokens, followed by Claude Haiku 4.5 at $1/$5. For most production workloads, the right choice is the cheapest model that meets quality requirements. Many production systems use cheap models for routine work and reserve frontier models for hard reasoning tasks via a model-routing layer.

Track LLM Spend in Your Live Financials

Categorize API spend by vendor, model tier, and feature — see real-time gross margin impact alongside every other line item.

Start Free Trial