AI Cost Tools
Back to Calculator

AI API Pricing Comparison

This page compares pricing structures, not just headline model names. The right choice depends on input size, output length, cache reuse, latency needs, and the cost of a wrong answer.

How providers usually differ

ProviderPricing pattern to watchGood fitBudget risk
OpenAIClear input, cached input, and output token prices across model sizes.General product assistants, coding, routing, and production workflows.Premium reasoning models can become expensive when output length is not capped.
ClaudeStrong model tiers with separate cached input pricing.Writing, analysis, long document work, and high-quality support replies.Long answers and document-heavy prompts need careful output and context limits.
GeminiPaid Tier prices may depend on modality and context tier for some models.Multimodal apps, long-context workflows, and cost-sensitive flash use cases.Long-context tiering and non-text features can make simple estimates incomplete.
DeepSeekCache-hit input can be much cheaper than cache-miss input.High-volume chat, coding assistance, and repeated-context workloads.Mixing cache hit and cache miss assumptions can produce unrealistic budgets.

Input vs output vs cached input

Input tokens are the prompt and context you send. Cached input tokens are repeated context that the provider may bill at a lower rate. Output tokens are generated by the model. In many real applications, output tokens dominate the bill because generated text is usually priced higher and can grow when no max output length is set.

Cost driverProduct exampleControl lever
Large inputSummarizing long support tickets or documents.Trim context, summarize history, retrieve fewer chunks.
Large outputGenerating long reports, emails, or code files.Set response length, ask for outlines first, stream follow-up sections only when needed.
Repeated contextSame policy, docs, or system prompt sent on every call.Use provider caching where supported and track cache hit rate separately.

Model selection guidance

Example: support assistant choice

Suppose a support assistant handles 40,000 monthly messages with 900 input tokens and 350 output tokens each. A model with cheaper output may beat a model with cheaper input because the output price is multiplied by every generated reply. If you add a 2,000-token repeated policy prompt, cached input pricing becomes important.

FAQ

Should I always choose the cheapest model?

No. Choose the cheapest model that meets the task quality bar. Cheap failed answers can increase retries, support tickets, or human review cost.

Why not rank all models from best to worst?

Because workloads differ. A model that is best for long writing may not be best for routing, code review, or short extraction.

Does this include every possible provider charge?

No. It focuses on token pricing. Audio, grounding, storage, batch, priority, fine-tuning, and account-specific discounts may differ by provider.