Model directory
Pricing assumptions tracked in the calculator
These reference rates feed the cost calculator. Confirm pricing with the official provider—APIs evolve quickly and regional hosting may introduce premiums. We refresh the catalog monthly and whenever major launches ship.
| Model | Pricing | Context window | Strengths | Availability |
|---|---|---|---|---|
GPT-5 OpenAI · Proprietary API View docs | $0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens. | 400,000 tokens Output max 128,000 tokens |
| OpenAI API (Standard, Flex, Batch tiers) Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025). |
GPT-5 Mini OpenAI · Proprietary API View docs | $0.0003/1K input · $0.0020/1K output Usage billed per 1,000 tokens. | 400,000 tokens Output max 128,000 tokens |
| OpenAI API (Standard, Flex, Batch tiers) Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost. |
GPT-5 Nano OpenAI · Proprietary API View docs | $0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 400,000 tokens Output max 64,000 tokens |
| OpenAI API (Standard, Flex, Batch tiers) Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M. |
GPT-5 Pro OpenAI · Proprietary API View docs | $0.0150/1K input · $0.1200/1K output Usage billed per 1,000 tokens. | 400,000 tokens Output max 128,000 tokens |
| OpenAI API (Standard tier) Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads. |
GPT-5 Codex OpenAI · Proprietary API View docs | $0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens. | 400,000 tokens Output max 128,000 tokens |
| OpenAI API Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads. |
GPT-4o OpenAI · Proprietary API View docs | $0.0025/1K input · $0.0100/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| OpenAI API, Azure OpenAI Service Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate. |
GPT-4o (Aug 2024) OpenAI · Proprietary API View docs | $0.0025/1K input · $0.0100/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| OpenAI API Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier. |
GPT-4o (May 2024) OpenAI · Proprietary API View docs | $0.0050/1K input · $0.0150/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| OpenAI API May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios. |
GPT-4o mini OpenAI · Proprietary API View docs | $0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 16,384 tokens |
| OpenAI API, Azure OpenAI Service Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens. |
GPT-4.1 OpenAI · Proprietary API View docs | $0.0020/1K input · $0.0080/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 8,192 tokens |
| OpenAI API Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens. |
GPT-4.1 mini OpenAI · Proprietary API View docs | $0.0004/1K input · $0.0016/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 16,384 tokens |
| OpenAI API Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens. |
GPT-4.1 nano OpenAI · Proprietary API View docs | $0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 64,000 tokens Output max 4,096 tokens |
| OpenAI API Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens. |
GPT-4 Turbo (Apr 2024) OpenAI · Proprietary API View docs | $0.0100/1K input · $0.0300/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| OpenAI API April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows. |
GPT-3.5 Turbo (Jan 2025) OpenAI · Proprietary API View docs | $0.0005/1K input · $0.0015/1K output Usage billed per 1,000 tokens. | 16,000 tokens Output max 4,096 tokens |
| OpenAI API, Azure OpenAI Service Standard tier $0.50 (input) / $1.50 (output) per 1M tokens. |
Gemini 2.5 Pro Google · Proprietary API View docs | $0.0013/1K input · $0.0100/1K output Usage billed per 1,000 tokens. | 2,000,000 tokens Output max 8,192 tokens |
| Google AI Studio, Vertex AI Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately. |
Gemini 2.5 Flash Google · Proprietary API View docs | $0.0006/1K input · $0.0050/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 8,192 tokens |
| Google AI Studio, Vertex AI Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate. |
Gemini 2.5 Flash-Lite Google · Proprietary API View docs | $0.0001/1K input · $0.0013/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 8,192 tokens |
| Google AI Studio Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens. |
Gemini 2.0 Flash Google · Proprietary API View docs | $0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 8,192 tokens |
| Google AI Studio Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens. |
Gemini 2.0 Flash-Lite Google · Proprietary API View docs | $0.0001/1K input · $0.0002/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 8,192 tokens |
| Google AI Studio Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens. |
Gemini 1.5 Pro Google · Proprietary API View docs | $0.0070/1K input · $0.0210/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 4,096 tokens |
| Google AI Studio, Vertex AI Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series. |
Gemini 1.5 Flash Google · Proprietary API View docs | $0.0003/1K input · $0.0010/1K output Usage billed per 1,000 tokens. | 1,000,000 tokens Output max 8,192 tokens |
| Google AI Studio, Vertex AI Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens. |
Claude Sonnet 4.5 Anthropic · Proprietary API View docs | $0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens. | 200,000 tokens Output max 4,096 tokens |
| Anthropic Console, AWS Bedrock, Google Vertex AI Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing. |
Claude Haiku 4.5 Anthropic · Proprietary API View docs | $0.0010/1K input · $0.0050/1K output Usage billed per 1,000 tokens. | 200,000 tokens Output max 4,096 tokens |
| Anthropic Console, AWS Bedrock, Google Vertex AI Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan. |
Claude 3.5 Sonnet Anthropic · Proprietary API View docs | $0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens. | 200,000 tokens Output max 4,096 tokens |
| Anthropic Console, AWS Bedrock, Google Vertex AI Maintained for backwards compatibility; matches current Sonnet 4.5 pricing. |
Claude 3.5 Haiku Anthropic · Proprietary API View docs | $0.0008/1K input · $0.0040/1K output Usage billed per 1,000 tokens. | 200,000 tokens Output max 4,096 tokens |
| Anthropic Console, AWS Bedrock, Google Vertex AI Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens. |
Claude 3 Opus Anthropic · Proprietary API View docs | $0.0150/1K input · $0.0750/1K output Usage billed per 1,000 tokens. | 200,000 tokens Output max 4,096 tokens |
| Anthropic Console, AWS Bedrock, Google Vertex AI Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x. |
Qwen-Max Alibaba Cloud · Proprietary API View docs | $0.0016/1K input · $0.0064/1K output Usage billed per 1,000 tokens. | 32,768 tokens Output max 8,192 tokens |
| Alibaba Cloud DashScope, Tongyi Qianwen Console USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch. |
Qwen-Plus Alibaba Cloud · Proprietary API View docs | $0.0004/1K input · $0.0012/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 16,384 tokens |
| Alibaba Cloud DashScope, Tongyi Qianwen Console Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans. |
Doubao Pro 32K ByteDance · Proprietary API View docs | $0.0008/1K input · $0.0025/1K output Usage billed per 1,000 tokens. | 32,000 tokens Output max 4,096 tokens |
| Volcengine Doubao API Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ. |
GLM-4 Plus Zhipu AI · Proprietary API View docs | $0.0021/1K input · $0.0063/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| Zhipu BigModel Platform, Azure Marketplace (China regions) USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers. |
DeepSeek V3.2 Exp DeepSeek · Proprietary API View docs | $0.0003/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 8,192 tokens |
| DeepSeek API, SiliconFlow Marketplace Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens. |
Mixtral 8x7B Instruct Mistral AI · Open-source (self-hosted) View docs | $2.50/hr • 8,000 tokens/sec Infra: RunPod NVIDIA L4 (8x7B MoE) | 32,000 tokens Output max 8,192 tokens |
| Self-hosted, Mistral API, transferable to cloud GPU providers Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling. |
Llama 3.1 70B Instruct Meta · Open-source (self-hosted) View docs | $3.20/hr • 5,500 tokens/sec Infra: AWS g6.8xlarge (1x L4 24GB) | 128,000 tokens Output max 8,192 tokens |
| Self-hosted, AWS Bedrock, Azure AI Studio Cost assumes steady-state utilization with autoscaling to avoid idle GPU time. |
Llama 3.2 1B Instruct Meta · Proprietary API View docs | $0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens. | 131,072 tokens Output max 8,192 tokens |
| OpenRouter hosted API Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens. |
Llama 3.2 3B Instruct Meta · Proprietary API View docs | $0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens. | 16,384 tokens Output max 4,096 tokens |
| OpenRouter hosted API OpenRouter pricing converted to USD per 1K tokens (Oct 2025). |
Llama 3.2 11B Vision Instruct Meta · Proprietary API View docs | $0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens. | 131,072 tokens Output max 8,192 tokens |
| OpenRouter hosted API Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025). |
Llama 3.2 90B Vision Instruct Meta · Proprietary API View docs | $0.0003/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 32,768 tokens Output max 8,192 tokens |
| OpenRouter hosted API Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025). |
Llama 3.3 70B Instruct Meta · Proprietary API View docs | $0.0001/1K input · $0.0004/1K output Usage billed per 1,000 tokens. | 131,072 tokens Output max 8,192 tokens |
| OpenRouter hosted API Pricing converted from OpenRouter USD/token rates (Oct 2025). |
Llama 4 Scout Meta · Proprietary API View docs | $0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens. | 327,680 tokens Output max 16,384 tokens |
| OpenRouter hosted API Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025). |
Llama 4 Maverick Meta · Proprietary API View docs | $0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens. | 1,048,576 tokens Output max 16,384 tokens |
| OpenRouter hosted API Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025). |
Mistral Large Mistral AI · Proprietary API View docs | $0.0080/1K input · $0.0240/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 8,192 tokens |
| Mistral API, Azure AI Studio Latency-friendly alternative to GPT-4 tier. |
Cohere Command R+ Cohere · Proprietary API View docs | $0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens. | 128,000 tokens Output max 4,096 tokens |
| Cohere API, AWS Bedrock Includes 1,000 free requests per month on the Cohere platform. |
Grok 4 xAI · Proprietary API View docs | $0.0030/1K input · $0.0150/1K output Usage billed per 1,000 tokens. | 2,000,000 tokens Output max 64,000 tokens |
| xAI Platform, OpenRouter Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens. |
Grok 4 Fast xAI · Proprietary API View docs | $0.0002/1K input · $0.0005/1K output Usage billed per 1,000 tokens. | 2,000,000 tokens Output max 32,000 tokens |
| xAI Platform, OpenRouter Pricing sourced from OpenRouter USD/token rates (Oct 2025). |
Grok Code Fast xAI · Proprietary API View docs | $0.0002/1K input · $0.0015/1K output Usage billed per 1,000 tokens. | 2,000,000 tokens Output max 32,000 tokens |
| xAI Platform, OpenRouter Pricing sourced from OpenRouter USD/token rates (Oct 2025). |
ERNIE 4.5 21B A3B Baidu · Proprietary API View docs | $0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens. | 120,000 tokens Output max 8,192 tokens |
| Baidu Qianfan, OpenRouter Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens). |
ERNIE 4.5 21B A3B Thinking Baidu · Proprietary API View docs | $0.0001/1K input · $0.0003/1K output Usage billed per 1,000 tokens. | 131,072 tokens Output max 8,192 tokens |
| Baidu Qianfan, OpenRouter Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion. |
ERNIE 4.5 VL 28B A3B Baidu · Proprietary API View docs | $0.0001/1K input · $0.0006/1K output Usage billed per 1,000 tokens. | 30,000 tokens Output max 6,144 tokens |
| Baidu Qianfan, OpenRouter OpenRouter pricing (Oct 2025) converted to USD per 1K tokens. |
Hunyuan A13B Instruct Tencent · Proprietary API View docs | $0.0000/1K input · $0.0000/1K output Usage billed per 1,000 tokens. | 32,768 tokens Output max 4,096 tokens |
| Tencent Cloud Hunyuan API, OpenRouter Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB. |
How we estimate open-source costs
Self-hosted numbers assume on-demand GPU rentals. We approximate throughput using benchmark tokens per second and include warmup latency (autoscaling cold starts, health checks, validation). Running reserved instances or multi-region clusters? Adjust the hourly rate and redundancy buffer in the calculator.
Hourly rate
Based on popular GPU options (AWS g6, RunPod L4, Paperspace A100). Layer in orchestration overhead (Kubernetes, inference gateways) if it meaningfully impacts spend.
Utilization
The calculator divides total tokens by throughput to derive compute hours. Increase redundancy if you expect queueing or batch windows that drive utilization below 60%.