LLM Pricing Comparison

New

Compare input/output token prices across all major LLM providers. Filter by capability, context window, speed, and cost. Estimate your monthly API bill with the request volume calculator.

Input tokens / req

Output tokens / req

Requests / day

Model	$/1M in ↑	$/1M out	Cost/req	30d cost	Context
Googlegemini-2.0-flash fastcheap	$0.1	$0.4	$0.0018	$54	1000K
Mistralmistral-small-3.1 fastcheap	$0.1	$0.3	$0.0016	$48	128K
OpenAIgpt-4o-mini fastcheap	$0.15	$0.6	$0.0027	$81	128K
Googlegemini-2.5-flash fastbalanced	$0.15	$0.6	$0.0027	$81	1000K
Metallama-3.3-70b open-sourcefast	$0.23	$0.4	$0.0031	$93	128K
DeepSeekdeepseek-v3 cheapopen-source	$0.27	$1.1	$0.0049	$147	64K
Metallama-4-maverick open-sourcebalanced	$0.5	$0.77	$0.0065	$196	128K
DeepSeekdeepseek-r1 reasoningcheap	$0.55	$2.19	$0.0099	$296	64K
Anthropicclaude-haiku-4-5 fastcheap	$0.8	$4	$0.0160	$480	200K
OpenAIo4-mini reasoningfast	$1.1	$4.4	$0.0198	$594	200K
Googlegemini-2.5-pro flagshiplong-context	$1.25	$5	$0.0225	$675	1000K
OpenAIgpt-4.1 flagshiplong-context	$2	$8	$0.0360	$1080	1024K
Mistralmistral-large-2 flagshipbalanced	$2	$6	$0.0320	$960	128K
OpenAIgpt-4o flagshipvision	$2.5	$10	$0.0450	$1350	128K
Coherecommand-r-plus ragenterprise	$2.5	$10	$0.0450	$1350	128K
Anthropicclaude-sonnet-4-6 balancedlong-context	$3	$15	$0.0600	$1800	200K
xAIgrok-3 flagshipbalanced	$3	$15	$0.0600	$1800	131K
OpenAIo3 reasoningflagship	$10	$40	$0.1800	$5400	200K
Anthropicclaude-opus-4-6 flagshiplong-context	$15	$75	$0.3000	$9000	200K

Prices as of early 2026. Verify with provider pricing pages before production use. Cost calculations assume the token counts you entered above.

Frequently Asked Questions

Which model is the cheapest for high-volume applications?

For high-volume applications, Gemini 2.0 Flash ($0.10/1M input) and GPT-4o-mini ($0.15/1M input) are among the cheapest options with reasonable quality. DeepSeek-V3 is also very cost-effective for its quality level.

What does 'context window' mean?

The context window is the maximum number of tokens a model can process in a single request, including both input and output. Larger context windows allow you to process longer documents or maintain longer conversation histories.

How is monthly cost calculated?

Monthly cost = (input tokens × input price + output tokens × output price) × requests per day × 30 days. Prices are per 1 million tokens as listed by each provider.