Model directory

Pricing assumptions tracked in the calculator

These reference rates feed the cost calculator. Confirm pricing with the official provider—APIs evolve quickly and regional hosting may introduce premiums. We refresh the catalog monthly and whenever major launches ship.

Model	Pricing	Context window	Strengths	Availability
GPT-5 OpenAI · Proprietary API View docs	$0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens.	400,000 tokens Output max 128,000 tokens	Frontier reasoning Multimodal support Priority search integration	OpenAI API (Standard, Flex, Batch tiers) Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025).
GPT-5 Mini OpenAI · Proprietary API View docs	$0.0003/1K input · $0.0020/1K output Usage billed per 1,000 tokens.	400,000 tokens Output max 128,000 tokens	High throughput Instruction following Lower latency vs GPT-5	OpenAI API (Standard, Flex, Batch tiers) Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost.
GPT-5 Nano OpenAI · Proprietary API View docs	$0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	400,000 tokens Output max 64,000 tokens	Classification and routing Ultra-low cost completions Great for cascaded agents	OpenAI API (Standard, Flex, Batch tiers) Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M.
GPT-5 Pro OpenAI · Proprietary API View docs	$0.0150/1K input · $0.1200/1K output Usage billed per 1,000 tokens.	400,000 tokens Output max 128,000 tokens	Maximum accuracy Long-form planning Advanced code synthesis	OpenAI API (Standard tier) Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads.
GPT-5 Codex OpenAI · Proprietary API View docs	$0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens.	400,000 tokens Output max 128,000 tokens	Code generation Tool-calling friendly Compatible with GPT-5 codex adapters	OpenAI API Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads.
GPT-4o OpenAI · Proprietary API View docs	$0.0025/1K input · $0.0100/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Balanced reasoning Multimodal support High accuracy	OpenAI API, Azure OpenAI Service Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate.
GPT-4o (Aug 2024) OpenAI · Proprietary API View docs	$0.0025/1K input · $0.0100/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Vision-enabled workflows Improved latency vs GPT-4 Turbo Stable production tier	OpenAI API Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier.
GPT-4o (May 2024) OpenAI · Proprietary API View docs	$0.0050/1K input · $0.0150/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Original GPT-4o launch model Full multimodal support Higher price tier	OpenAI API May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios.
GPT-4o mini OpenAI · Proprietary API View docs	$0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 16,384 tokens	Cost-efficient text generation Fast responses Great for prototyping	OpenAI API, Azure OpenAI Service Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens.
GPT-4.1 OpenAI · Proprietary API View docs	$0.0020/1K input · $0.0080/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 8,192 tokens	Unified reasoning + multimodal Improved tool calling Lower cost than GPT-4 Turbo	OpenAI API Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens.
GPT-4.1 mini OpenAI · Proprietary API View docs	$0.0004/1K input · $0.0016/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 16,384 tokens	High throughput JSON-mode ready Great for RAG + agents	OpenAI API Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens.
GPT-4.1 nano OpenAI · Proprietary API View docs	$0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	64,000 tokens Output max 4,096 tokens	Ultra-low cost completions Good for classification and routing Fast latency	OpenAI API Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens.
GPT-4 Turbo (Apr 2024) OpenAI · Proprietary API View docs	$0.0100/1K input · $0.0300/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Legacy GPT-4 tier Strong reasoning JSON output mode	OpenAI API April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows.
GPT-3.5 Turbo (Jan 2025) OpenAI · Proprietary API View docs	$0.0005/1K input · $0.0015/1K output Usage billed per 1,000 tokens.	16,000 tokens Output max 4,096 tokens	Budget-friendly completions Longstanding ecosystem support Great for lightweight tasks	OpenAI API, Azure OpenAI Service Standard tier $0.50 (input) / $1.50 (output) per 1M tokens.
Gemini 2.5 Pro Google · Proprietary API View docs	$0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens.	2,000,000 tokens Output max 8,192 tokens	Ultra-long context Hybrid reasoning Multimodal I/O	Google AI Studio, Vertex AI Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately.
Gemini 2.5 Flash Google · Proprietary API View docs	$0.0006/1K input · $0.0050/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 8,192 tokens	High throughput Low latency Streaming optimized	Google AI Studio, Vertex AI Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate.
Gemini 2.5 Flash-Lite Google · Proprietary API View docs	$0.0001/1K input · $0.0013/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 8,192 tokens	Cost-efficient bulk processing Shared grounding quotas with Flash Optimized for high-volume requests	Google AI Studio Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens.
Gemini 2.0 Flash Google · Proprietary API View docs	$0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 8,192 tokens	Most cost-effective Flash tier Solid reasoning for assistants Great for at-scale automation	Google AI Studio Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens.
Gemini 2.0 Flash-Lite Google · Proprietary API View docs	$0.0001/1K input · $0.0002/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 8,192 tokens	Lowest-cost Gemini text tier Suitable for classification, routing, and summarization at scale	Google AI Studio Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens.
Gemini 1.5 Pro Google · Proprietary API View docs	$0.0070/1K input · $0.0210/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 4,096 tokens	Long-context comprehension Strong reasoning Supports multimodal prompts	Google AI Studio, Vertex AI Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series.
Gemini 1.5 Flash Google · Proprietary API View docs	$0.0003/1K input · $0.0010/1K output Usage billed per 1,000 tokens.	1,000,000 tokens Output max 8,192 tokens	Fast responses Good quality/cost trade-off Useful for streaming workloads	Google AI Studio, Vertex AI Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens.
Claude Sonnet 4.5 Anthropic · Proprietary API View docs	$0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens.	200,000 tokens Output max 4,096 tokens	Long-form reasoning Tool calling High compliance guardrails	Anthropic Console, AWS Bedrock, Google Vertex AI Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing.
Claude Haiku 4.5 Anthropic · Proprietary API View docs	$0.0010/1K input · $0.0050/1K output Usage billed per 1,000 tokens.	200,000 tokens Output max 4,096 tokens	Latency friendly Multilingual support Cost-effective guardrails	Anthropic Console, AWS Bedrock, Google Vertex AI Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan.
Claude 3.5 Sonnet Anthropic · Proprietary API View docs	$0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens.	200,000 tokens Output max 4,096 tokens	Proven enterprise workloads Structured output Strong safety tooling	Anthropic Console, AWS Bedrock, Google Vertex AI Maintained for backwards compatibility; matches current Sonnet 4.5 pricing.
Claude 3.5 Haiku Anthropic · Proprietary API View docs	$0.0008/1K input · $0.0040/1K output Usage billed per 1,000 tokens.	200,000 tokens Output max 4,096 tokens	Low latency Great for customer support chatbots Budget-friendly Claude tier	Anthropic Console, AWS Bedrock, Google Vertex AI Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens.
Claude 3 Opus Anthropic · Proprietary API View docs	$0.0150/1K input · $0.0750/1K output Usage billed per 1,000 tokens.	200,000 tokens Output max 4,096 tokens	Highest accuracy in Claude 3 era Mission-critical workflows Advanced safety guardrails	Anthropic Console, AWS Bedrock, Google Vertex AI Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x.
Qwen-Max Alibaba Cloud · Proprietary API View docs	$0.0016/1K input · $0.0064/1K output Usage billed per 1,000 tokens.	32,768 tokens Output max 8,192 tokens	Strong bilingual ability Function calling ready Agent workflows	Alibaba Cloud DashScope, Tongyi Qianwen Console USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch.
Qwen-Plus Alibaba Cloud · Proprietary API View docs	$0.0004/1K input · $0.0012/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 16,384 tokens	Balanced quality and cost High throughput Fine-tuning support	Alibaba Cloud DashScope, Tongyi Qianwen Console Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans.
Doubao Pro 32K ByteDance · Proprietary API View docs	$0.0008/1K input · $0.0025/1K output Usage billed per 1,000 tokens.	32,000 tokens Output max 4,096 tokens	Marketing and operations content Image-text blended prompts Competitive pricing	Volcengine Doubao API Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ.
GLM-4 Plus Zhipu AI · Proprietary API View docs	$0.0021/1K input · $0.0063/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Structured output Tool-call friendly Chinese and English parity	Zhipu BigModel Platform, Azure Marketplace (China regions) USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers.
DeepSeek V3.2 Exp DeepSeek · Proprietary API View docs	$0.0003/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 8,192 tokens	High efficiency reasoning Reasoning/non-reasoning modes Competitive enterprise pricing	DeepSeek API, SiliconFlow Marketplace Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens.
Mixtral 8x7B Instruct Mistral AI · Open-source (self-hosted) View docs	$2.50/hr • 8,000 tokens/sec Infra: RunPod NVIDIA L4 (8x7B MoE)	32,000 tokens Output max 8,192 tokens	Open-weight deployment Great for summarization Low inference cost	Self-hosted, Mistral API, transferable to cloud GPU providers Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling.
Llama 3.1 70B Instruct Meta · Open-source (self-hosted) View docs	$3.20/hr • 5,500 tokens/sec Infra: AWS g6.8xlarge (1x L4 24GB)	128,000 tokens Output max 8,192 tokens	Strong reasoning for open source Flexible deployment options Fine-tune friendly	Self-hosted, AWS Bedrock, Azure AI Studio Cost assumes steady-state utilization with autoscaling to avoid idle GPU time.
Llama 3.2 1B Instruct Meta · Proprietary API View docs	$0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens.	131,072 tokens Output max 8,192 tokens	Edge-friendly footprint Great for on-device assistants Fast inference via OpenRouter	OpenRouter hosted API Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens.
Llama 3.2 3B Instruct Meta · Proprietary API View docs	$0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens.	16,384 tokens Output max 4,096 tokens	Low-latency chat agents Supports longer prompts vs 1B Cost-efficient hosted option	OpenRouter hosted API OpenRouter pricing converted to USD per 1K tokens (Oct 2025).
Llama 3.2 11B Vision Instruct Meta · Proprietary API View docs	$0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens.	131,072 tokens Output max 8,192 tokens	Multimodal (image+text) support Balanced model size Strong tool-calling compatibility	OpenRouter hosted API Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025).
Llama 3.2 90B Vision Instruct Meta · Proprietary API View docs	$0.0003/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	32,768 tokens Output max 8,192 tokens	High-quality multimodal reasoning Supports complex visual analysis Hosted via OpenRouter	OpenRouter hosted API Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025).
Llama 3.3 70B Instruct Meta · Proprietary API View docs	$0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens.	131,072 tokens Output max 8,192 tokens	High quality instruction following Strong multilingual support Hosted delivery with predictable latency	OpenRouter hosted API Pricing converted from OpenRouter USD/token rates (Oct 2025).
Llama 4 Scout Meta · Proprietary API View docs	$0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens.	327,680 tokens Output max 16,384 tokens	Next-gen lightweight Llama Extended context (320K+) Supports partial vision (image add-on)	OpenRouter hosted API Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025).
Llama 4 Maverick Meta · Proprietary API View docs	$0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens.	1,048,576 tokens Output max 16,384 tokens	Ultra-long context (1M tokens) Advanced reasoning and planning Optional image generation	OpenRouter hosted API Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025).
Mistral Large Mistral AI · Proprietary API View docs	$0.0080/1K input · $0.0240/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 8,192 tokens	Strong coding ability Fast generation European data residency	Mistral API, Azure AI Studio Latency-friendly alternative to GPT-4 tier.
Cohere Command R+ Cohere · Proprietary API View docs	$0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens.	128,000 tokens Output max 4,096 tokens	Retrieval-augmented workflows Enterprise privacy posture Great for customer support	Cohere API, AWS Bedrock Includes 1,000 free requests per month on the Cohere platform.
Grok 4 xAI · Proprietary API View docs	$0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens.	2,000,000 tokens Output max 64,000 tokens	Reasoning-first architecture Search-integrated agents Supports long tool chains	xAI Platform, OpenRouter Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens.
Grok 4 Fast xAI · Proprietary API View docs	$0.0002/1K input · $0.0005/1K output Usage billed per 1,000 tokens.	2,000,000 tokens Output max 32,000 tokens	Reduced latency Grok tier Optimized for conversational agents Lower cost search-augmented flows	xAI Platform, OpenRouter Pricing sourced from OpenRouter USD/token rates (Oct 2025).
Grok Code Fast xAI · Proprietary API View docs	$0.0002/1K input · $0.0015/1K output Usage billed per 1,000 tokens.	2,000,000 tokens Output max 32,000 tokens	Code-focused Grok variant Chain-of-thought reasoning Supports tool-call heavy workloads	xAI Platform, OpenRouter Pricing sourced from OpenRouter USD/token rates (Oct 2025).
ERNIE 4.5 21B A3B Baidu · Proprietary API View docs	$0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens.	120,000 tokens Output max 8,192 tokens	Strong Chinese reasoning Enterprise security posture Tool + retrieval ready	Baidu Qianfan, OpenRouter Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens).
ERNIE 4.5 21B A3B Thinking Baidu · Proprietary API View docs	$0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens.	131,072 tokens Output max 8,192 tokens	Reasoning mode enabled Structured output Supports deep tool-use	Baidu Qianfan, OpenRouter Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion.
ERNIE 4.5 VL 28B A3B Baidu · Proprietary API View docs	$0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens.	30,000 tokens Output max 6,144 tokens	Multimodal (vision + text) Optimized for Chinese OCR and charts Supports thinking mode	Baidu Qianfan, OpenRouter OpenRouter pricing (Oct 2025) converted to USD per 1K tokens.
Hunyuan A13B Instruct Tencent · Proprietary API View docs	$0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens.	32,768 tokens Output max 4,096 tokens	Tencent Cloud integration Chinese + English support Good for chatbots and customer service	Tencent Cloud Hunyuan API, OpenRouter Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB.

Model

Pricing

Context window

Strengths

Availability

GPT-5

OpenAI · Proprietary API

View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

Frontier reasoning
Multimodal support
Priority search integration

OpenAI API (Standard, Flex, Batch tiers)

Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025).

GPT-5 Mini

OpenAI · Proprietary API

View docs

$0.0003/1K input · $0.0020/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

High throughput
Instruction following
Lower latency vs GPT-5

OpenAI API (Standard, Flex, Batch tiers)

Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost.

GPT-5 Nano

OpenAI · Proprietary API

View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 64,000 tokens

Classification and routing
Ultra-low cost completions
Great for cascaded agents

OpenAI API (Standard, Flex, Batch tiers)

Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M.

GPT-5 Pro

OpenAI · Proprietary API

View docs

$0.0150/1K input · $0.1200/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

Maximum accuracy
Long-form planning
Advanced code synthesis

OpenAI API (Standard tier)

Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads.

GPT-5 Codex

OpenAI · Proprietary API

View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

Code generation
Tool-calling friendly
Compatible with GPT-5 codex adapters

OpenAI API

Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads.

GPT-4o

OpenAI · Proprietary API

View docs

$0.0025/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Balanced reasoning
Multimodal support
High accuracy

OpenAI API, Azure OpenAI Service

Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate.

GPT-4o (Aug 2024)

OpenAI · Proprietary API

View docs

$0.0025/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Vision-enabled workflows
Improved latency vs GPT-4 Turbo
Stable production tier

OpenAI API

Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier.

GPT-4o (May 2024)

OpenAI · Proprietary API

View docs

$0.0050/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Original GPT-4o launch model
Full multimodal support
Higher price tier

OpenAI API

May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios.

GPT-4o mini

OpenAI · Proprietary API

View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

Cost-efficient text generation
Fast responses
Great for prototyping

OpenAI API, Azure OpenAI Service

Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens.

GPT-4.1

OpenAI · Proprietary API

View docs

$0.0020/1K input · $0.0080/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

Unified reasoning + multimodal
Improved tool calling
Lower cost than GPT-4 Turbo

OpenAI API

Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens.

GPT-4.1 mini

OpenAI · Proprietary API

View docs

$0.0004/1K input · $0.0016/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

High throughput
JSON-mode ready
Great for RAG + agents

OpenAI API

Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens.

GPT-4.1 nano

OpenAI · Proprietary API

View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

64,000 tokens

Output max 4,096 tokens

Ultra-low cost completions
Good for classification and routing
Fast latency

OpenAI API

Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens.

GPT-4 Turbo (Apr 2024)

OpenAI · Proprietary API

View docs

$0.0100/1K input · $0.0300/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Legacy GPT-4 tier
Strong reasoning
JSON output mode

OpenAI API

April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows.

GPT-3.5 Turbo (Jan 2025)

OpenAI · Proprietary API

View docs

$0.0005/1K input · $0.0015/1K output

Usage billed per 1,000 tokens.

16,000 tokens

Output max 4,096 tokens

Budget-friendly completions
Longstanding ecosystem support
Great for lightweight tasks

OpenAI API, Azure OpenAI Service

Standard tier $0.50 (input) / $1.50 (output) per 1M tokens.

Gemini 2.5 Pro

Google · Proprietary API

View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 8,192 tokens

Ultra-long context
Hybrid reasoning
Multimodal I/O

Google AI Studio, Vertex AI

Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately.

Gemini 2.5 Flash

Google · Proprietary API

View docs

$0.0006/1K input · $0.0050/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

High throughput
Low latency
Streaming optimized

Google AI Studio, Vertex AI

Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate.

Gemini 2.5 Flash-Lite

Google · Proprietary API

View docs

$0.0001/1K input · $0.0013/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

Cost-efficient bulk processing
Shared grounding quotas with Flash
Optimized for high-volume requests

Google AI Studio

Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens.

Gemini 2.0 Flash

Google · Proprietary API

View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

Most cost-effective Flash tier
Solid reasoning for assistants
Great for at-scale automation

Google AI Studio

Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens.

Gemini 2.0 Flash-Lite

Google · Proprietary API

View docs

$0.0001/1K input · $0.0002/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

Lowest-cost Gemini text tier
Suitable for classification, routing, and summarization at scale

Google AI Studio

Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens.

Gemini 1.5 Pro

Google · Proprietary API

View docs

$0.0070/1K input · $0.0210/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 4,096 tokens

Long-context comprehension
Strong reasoning
Supports multimodal prompts

Google AI Studio, Vertex AI

Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series.

Gemini 1.5 Flash

Google · Proprietary API

View docs

$0.0003/1K input · $0.0010/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

Fast responses
Good quality/cost trade-off
Useful for streaming workloads

Google AI Studio, Vertex AI

Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens.

Claude Sonnet 4.5

Anthropic · Proprietary API

View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

Long-form reasoning
Tool calling
High compliance guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing.

Claude Haiku 4.5

Anthropic · Proprietary API

View docs

$0.0010/1K input · $0.0050/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

Latency friendly
Multilingual support
Cost-effective guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan.

Claude 3.5 Sonnet

Anthropic · Proprietary API

View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

Proven enterprise workloads
Structured output
Strong safety tooling

Anthropic Console, AWS Bedrock, Google Vertex AI

Maintained for backwards compatibility; matches current Sonnet 4.5 pricing.

Claude 3.5 Haiku

Anthropic · Proprietary API

View docs

$0.0008/1K input · $0.0040/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

Low latency
Great for customer support chatbots
Budget-friendly Claude tier

Anthropic Console, AWS Bedrock, Google Vertex AI

Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens.

Claude 3 Opus

Anthropic · Proprietary API

View docs

$0.0150/1K input · $0.0750/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

Highest accuracy in Claude 3 era
Mission-critical workflows
Advanced safety guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x.

Qwen-Max

Alibaba Cloud · Proprietary API

View docs

$0.0016/1K input · $0.0064/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 8,192 tokens

Strong bilingual ability
Function calling ready
Agent workflows

Alibaba Cloud DashScope, Tongyi Qianwen Console

USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch.

Qwen-Plus

Alibaba Cloud · Proprietary API

View docs

$0.0004/1K input · $0.0012/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

Balanced quality and cost
High throughput
Fine-tuning support

Alibaba Cloud DashScope, Tongyi Qianwen Console

Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans.

Doubao Pro 32K

ByteDance · Proprietary API

View docs

$0.0008/1K input · $0.0025/1K output

Usage billed per 1,000 tokens.

32,000 tokens

Output max 4,096 tokens

Marketing and operations content
Image-text blended prompts
Competitive pricing

Volcengine Doubao API

Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ.

GLM-4 Plus

Zhipu AI · Proprietary API

View docs

$0.0021/1K input · $0.0063/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Structured output
Tool-call friendly
Chinese and English parity

Zhipu BigModel Platform, Azure Marketplace (China regions)

USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers.

DeepSeek V3.2 Exp

DeepSeek · Proprietary API

View docs

$0.0003/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

High efficiency reasoning
Reasoning/non-reasoning modes
Competitive enterprise pricing

DeepSeek API, SiliconFlow Marketplace

Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens.

Mixtral 8x7B Instruct

Mistral AI · Open-source (self-hosted)

View docs

$2.50/hr • 8,000 tokens/sec

Infra: RunPod NVIDIA L4 (8x7B MoE)

32,000 tokens

Output max 8,192 tokens

Open-weight deployment
Great for summarization
Low inference cost

Self-hosted, Mistral API, transferable to cloud GPU providers

Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling.

Llama 3.1 70B Instruct

Meta · Open-source (self-hosted)

View docs

$3.20/hr • 5,500 tokens/sec

Infra: AWS g6.8xlarge (1x L4 24GB)

128,000 tokens

Output max 8,192 tokens

Strong reasoning for open source
Flexible deployment options
Fine-tune friendly

Self-hosted, AWS Bedrock, Azure AI Studio

Cost assumes steady-state utilization with autoscaling to avoid idle GPU time.

Llama 3.2 1B Instruct

Meta · Proprietary API

View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

Edge-friendly footprint
Great for on-device assistants
Fast inference via OpenRouter

OpenRouter hosted API

Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens.

Llama 3.2 3B Instruct

Meta · Proprietary API

View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

16,384 tokens

Output max 4,096 tokens

Low-latency chat agents
Supports longer prompts vs 1B
Cost-efficient hosted option

OpenRouter hosted API

OpenRouter pricing converted to USD per 1K tokens (Oct 2025).

Llama 3.2 11B Vision Instruct

Meta · Proprietary API

View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

Multimodal (image+text) support
Balanced model size
Strong tool-calling compatibility

OpenRouter hosted API

Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025).

Llama 3.2 90B Vision Instruct

Meta · Proprietary API

View docs

$0.0003/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 8,192 tokens

High-quality multimodal reasoning
Supports complex visual analysis
Hosted via OpenRouter

OpenRouter hosted API

Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025).

Llama 3.3 70B Instruct

Meta · Proprietary API

View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

High quality instruction following
Strong multilingual support
Hosted delivery with predictable latency

OpenRouter hosted API

Pricing converted from OpenRouter USD/token rates (Oct 2025).

Llama 4 Scout

Meta · Proprietary API

View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

327,680 tokens

Output max 16,384 tokens

Next-gen lightweight Llama
Extended context (320K+)
Supports partial vision (image add-on)

OpenRouter hosted API

Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025).

Llama 4 Maverick

Meta · Proprietary API

View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

1,048,576 tokens

Output max 16,384 tokens

Ultra-long context (1M tokens)
Advanced reasoning and planning
Optional image generation

OpenRouter hosted API

Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025).

Mistral Large

Mistral AI · Proprietary API

View docs

$0.0080/1K input · $0.0240/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

Strong coding ability
Fast generation
European data residency

Mistral API, Azure AI Studio

Latency-friendly alternative to GPT-4 tier.

Cohere Command R+

Cohere · Proprietary API

View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

Retrieval-augmented workflows
Enterprise privacy posture
Great for customer support

Cohere API, AWS Bedrock

Includes 1,000 free requests per month on the Cohere platform.

Grok 4

xAI · Proprietary API

View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 64,000 tokens

Reasoning-first architecture
Search-integrated agents
Supports long tool chains

xAI Platform, OpenRouter

Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens.

Grok 4 Fast

xAI · Proprietary API

View docs

$0.0002/1K input · $0.0005/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 32,000 tokens

Reduced latency Grok tier
Optimized for conversational agents
Lower cost search-augmented flows

xAI Platform, OpenRouter

Pricing sourced from OpenRouter USD/token rates (Oct 2025).

Grok Code Fast

xAI · Proprietary API

View docs

$0.0002/1K input · $0.0015/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 32,000 tokens

Code-focused Grok variant
Chain-of-thought reasoning
Supports tool-call heavy workloads

xAI Platform, OpenRouter

Pricing sourced from OpenRouter USD/token rates (Oct 2025).

ERNIE 4.5 21B A3B

Baidu · Proprietary API

View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

120,000 tokens

Output max 8,192 tokens

Strong Chinese reasoning
Enterprise security posture
Tool + retrieval ready

Baidu Qianfan, OpenRouter

Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens).

ERNIE 4.5 21B A3B Thinking

Baidu · Proprietary API

View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

Reasoning mode enabled
Structured output
Supports deep tool-use

Baidu Qianfan, OpenRouter

Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion.

ERNIE 4.5 VL 28B A3B

Baidu · Proprietary API

View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

30,000 tokens

Output max 6,144 tokens

Multimodal (vision + text)
Optimized for Chinese OCR and charts
Supports thinking mode

Baidu Qianfan, OpenRouter

OpenRouter pricing (Oct 2025) converted to USD per 1K tokens.

Hunyuan A13B Instruct

Tencent · Proprietary API

View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 4,096 tokens

Tencent Cloud integration
Chinese + English support
Good for chatbots and customer service

Tencent Cloud Hunyuan API, OpenRouter

Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB.

How we estimate open-source costs

Self-hosted numbers assume on-demand GPU rentals. We approximate throughput using benchmark tokens per second and include warmup latency (autoscaling cold starts, health checks, validation). Running reserved instances or multi-region clusters? Adjust the hourly rate and redundancy buffer in the calculator.

Hourly rate
Based on popular GPU options (AWS g6, RunPod L4, Paperspace A100). Layer in orchestration overhead (Kubernetes, inference gateways) if it meaningfully impacts spend.
Utilization
The calculator divides total tokens by throughput to derive compute hours. Increase redundancy if you expect queueing or batch windows that drive utilization below 60%.