Model directory

Pricing assumptions tracked in the calculator

These reference rates feed the cost calculator. Confirm pricing with the official provider—APIs evolve quickly and regional hosting may introduce premiums. We refresh the catalog monthly and whenever major launches ship.

ModelPricingContext windowStrengthsAvailability
GPT-5
OpenAI · Proprietary API
View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

  • Frontier reasoning
  • Multimodal support
  • Priority search integration

OpenAI API (Standard, Flex, Batch tiers)

Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025).

GPT-5 Mini
OpenAI · Proprietary API
View docs

$0.0003/1K input · $0.0020/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

  • High throughput
  • Instruction following
  • Lower latency vs GPT-5

OpenAI API (Standard, Flex, Batch tiers)

Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost.

GPT-5 Nano
OpenAI · Proprietary API
View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 64,000 tokens

  • Classification and routing
  • Ultra-low cost completions
  • Great for cascaded agents

OpenAI API (Standard, Flex, Batch tiers)

Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M.

GPT-5 Pro
OpenAI · Proprietary API
View docs

$0.0150/1K input · $0.1200/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

  • Maximum accuracy
  • Long-form planning
  • Advanced code synthesis

OpenAI API (Standard tier)

Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads.

GPT-5 Codex
OpenAI · Proprietary API
View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

400,000 tokens

Output max 128,000 tokens

  • Code generation
  • Tool-calling friendly
  • Compatible with GPT-5 codex adapters

OpenAI API

Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads.

GPT-4o
OpenAI · Proprietary API
View docs

$0.0025/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Balanced reasoning
  • Multimodal support
  • High accuracy

OpenAI API, Azure OpenAI Service

Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate.

GPT-4o (Aug 2024)
OpenAI · Proprietary API
View docs

$0.0025/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Vision-enabled workflows
  • Improved latency vs GPT-4 Turbo
  • Stable production tier

OpenAI API

Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier.

GPT-4o (May 2024)
OpenAI · Proprietary API
View docs

$0.0050/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Original GPT-4o launch model
  • Full multimodal support
  • Higher price tier

OpenAI API

May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios.

GPT-4o mini
OpenAI · Proprietary API
View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

  • Cost-efficient text generation
  • Fast responses
  • Great for prototyping

OpenAI API, Azure OpenAI Service

Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens.

GPT-4.1
OpenAI · Proprietary API
View docs

$0.0020/1K input · $0.0080/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

  • Unified reasoning + multimodal
  • Improved tool calling
  • Lower cost than GPT-4 Turbo

OpenAI API

Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens.

GPT-4.1 mini
OpenAI · Proprietary API
View docs

$0.0004/1K input · $0.0016/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

  • High throughput
  • JSON-mode ready
  • Great for RAG + agents

OpenAI API

Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens.

GPT-4.1 nano
OpenAI · Proprietary API
View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

64,000 tokens

Output max 4,096 tokens

  • Ultra-low cost completions
  • Good for classification and routing
  • Fast latency

OpenAI API

Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens.

GPT-4 Turbo (Apr 2024)
OpenAI · Proprietary API
View docs

$0.0100/1K input · $0.0300/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Legacy GPT-4 tier
  • Strong reasoning
  • JSON output mode

OpenAI API

April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows.

GPT-3.5 Turbo (Jan 2025)
OpenAI · Proprietary API
View docs

$0.0005/1K input · $0.0015/1K output

Usage billed per 1,000 tokens.

16,000 tokens

Output max 4,096 tokens

  • Budget-friendly completions
  • Longstanding ecosystem support
  • Great for lightweight tasks

OpenAI API, Azure OpenAI Service

Standard tier $0.50 (input) / $1.50 (output) per 1M tokens.

Gemini 2.5 Pro
Google · Proprietary API
View docs

$0.0013/1K input · $0.0100/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 8,192 tokens

  • Ultra-long context
  • Hybrid reasoning
  • Multimodal I/O

Google AI Studio, Vertex AI

Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately.

Gemini 2.5 Flash
Google · Proprietary API
View docs

$0.0006/1K input · $0.0050/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

  • High throughput
  • Low latency
  • Streaming optimized

Google AI Studio, Vertex AI

Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate.

Gemini 2.5 Flash-Lite
Google · Proprietary API
View docs

$0.0001/1K input · $0.0013/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

  • Cost-efficient bulk processing
  • Shared grounding quotas with Flash
  • Optimized for high-volume requests

Google AI Studio

Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens.

Gemini 2.0 Flash
Google · Proprietary API
View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

  • Most cost-effective Flash tier
  • Solid reasoning for assistants
  • Great for at-scale automation

Google AI Studio

Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens.

Gemini 2.0 Flash-Lite
Google · Proprietary API
View docs

$0.0001/1K input · $0.0002/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

  • Lowest-cost Gemini text tier
  • Suitable for classification, routing, and summarization at scale

Google AI Studio

Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens.

Gemini 1.5 Pro
Google · Proprietary API
View docs

$0.0070/1K input · $0.0210/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 4,096 tokens

  • Long-context comprehension
  • Strong reasoning
  • Supports multimodal prompts

Google AI Studio, Vertex AI

Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series.

Gemini 1.5 Flash
Google · Proprietary API
View docs

$0.0003/1K input · $0.0010/1K output

Usage billed per 1,000 tokens.

1,000,000 tokens

Output max 8,192 tokens

  • Fast responses
  • Good quality/cost trade-off
  • Useful for streaming workloads

Google AI Studio, Vertex AI

Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens.

Claude Sonnet 4.5
Anthropic · Proprietary API
View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

  • Long-form reasoning
  • Tool calling
  • High compliance guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing.

Claude Haiku 4.5
Anthropic · Proprietary API
View docs

$0.0010/1K input · $0.0050/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

  • Latency friendly
  • Multilingual support
  • Cost-effective guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan.

Claude 3.5 Sonnet
Anthropic · Proprietary API
View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

  • Proven enterprise workloads
  • Structured output
  • Strong safety tooling

Anthropic Console, AWS Bedrock, Google Vertex AI

Maintained for backwards compatibility; matches current Sonnet 4.5 pricing.

Claude 3.5 Haiku
Anthropic · Proprietary API
View docs

$0.0008/1K input · $0.0040/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

  • Low latency
  • Great for customer support chatbots
  • Budget-friendly Claude tier

Anthropic Console, AWS Bedrock, Google Vertex AI

Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens.

Claude 3 Opus
Anthropic · Proprietary API
View docs

$0.0150/1K input · $0.0750/1K output

Usage billed per 1,000 tokens.

200,000 tokens

Output max 4,096 tokens

  • Highest accuracy in Claude 3 era
  • Mission-critical workflows
  • Advanced safety guardrails

Anthropic Console, AWS Bedrock, Google Vertex AI

Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x.

Qwen-Max
Alibaba Cloud · Proprietary API
View docs

$0.0016/1K input · $0.0064/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 8,192 tokens

  • Strong bilingual ability
  • Function calling ready
  • Agent workflows

Alibaba Cloud DashScope, Tongyi Qianwen Console

USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch.

Qwen-Plus
Alibaba Cloud · Proprietary API
View docs

$0.0004/1K input · $0.0012/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 16,384 tokens

  • Balanced quality and cost
  • High throughput
  • Fine-tuning support

Alibaba Cloud DashScope, Tongyi Qianwen Console

Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans.

Doubao Pro 32K
ByteDance · Proprietary API
View docs

$0.0008/1K input · $0.0025/1K output

Usage billed per 1,000 tokens.

32,000 tokens

Output max 4,096 tokens

  • Marketing and operations content
  • Image-text blended prompts
  • Competitive pricing

Volcengine Doubao API

Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ.

GLM-4 Plus
Zhipu AI · Proprietary API
View docs

$0.0021/1K input · $0.0063/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Structured output
  • Tool-call friendly
  • Chinese and English parity

Zhipu BigModel Platform, Azure Marketplace (China regions)

USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers.

DeepSeek V3.2 Exp
DeepSeek · Proprietary API
View docs

$0.0003/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

  • High efficiency reasoning
  • Reasoning/non-reasoning modes
  • Competitive enterprise pricing

DeepSeek API, SiliconFlow Marketplace

Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens.

Mixtral 8x7B Instruct
Mistral AI · Open-source (self-hosted)
View docs

$2.50/hr • 8,000 tokens/sec

Infra: RunPod NVIDIA L4 (8x7B MoE)

32,000 tokens

Output max 8,192 tokens

  • Open-weight deployment
  • Great for summarization
  • Low inference cost

Self-hosted, Mistral API, transferable to cloud GPU providers

Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling.

Llama 3.1 70B Instruct
Meta · Open-source (self-hosted)
View docs

$3.20/hr • 5,500 tokens/sec

Infra: AWS g6.8xlarge (1x L4 24GB)

128,000 tokens

Output max 8,192 tokens

  • Strong reasoning for open source
  • Flexible deployment options
  • Fine-tune friendly

Self-hosted, AWS Bedrock, Azure AI Studio

Cost assumes steady-state utilization with autoscaling to avoid idle GPU time.

Llama 3.2 1B Instruct
Meta · Proprietary API
View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

  • Edge-friendly footprint
  • Great for on-device assistants
  • Fast inference via OpenRouter

OpenRouter hosted API

Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens.

Llama 3.2 3B Instruct
Meta · Proprietary API
View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

16,384 tokens

Output max 4,096 tokens

  • Low-latency chat agents
  • Supports longer prompts vs 1B
  • Cost-efficient hosted option

OpenRouter hosted API

OpenRouter pricing converted to USD per 1K tokens (Oct 2025).

Llama 3.2 11B Vision Instruct
Meta · Proprietary API
View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

  • Multimodal (image+text) support
  • Balanced model size
  • Strong tool-calling compatibility

OpenRouter hosted API

Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025).

Llama 3.2 90B Vision Instruct
Meta · Proprietary API
View docs

$0.0003/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 8,192 tokens

  • High-quality multimodal reasoning
  • Supports complex visual analysis
  • Hosted via OpenRouter

OpenRouter hosted API

Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025).

Llama 3.3 70B Instruct
Meta · Proprietary API
View docs

$0.0001/1K input · $0.0004/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

  • High quality instruction following
  • Strong multilingual support
  • Hosted delivery with predictable latency

OpenRouter hosted API

Pricing converted from OpenRouter USD/token rates (Oct 2025).

Llama 4 Scout
Meta · Proprietary API
View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

327,680 tokens

Output max 16,384 tokens

  • Next-gen lightweight Llama
  • Extended context (320K+)
  • Supports partial vision (image add-on)

OpenRouter hosted API

Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025).

Llama 4 Maverick
Meta · Proprietary API
View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

1,048,576 tokens

Output max 16,384 tokens

  • Ultra-long context (1M tokens)
  • Advanced reasoning and planning
  • Optional image generation

OpenRouter hosted API

Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025).

Mistral Large
Mistral AI · Proprietary API
View docs

$0.0080/1K input · $0.0240/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 8,192 tokens

  • Strong coding ability
  • Fast generation
  • European data residency

Mistral API, Azure AI Studio

Latency-friendly alternative to GPT-4 tier.

Cohere Command R+
Cohere · Proprietary API
View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

128,000 tokens

Output max 4,096 tokens

  • Retrieval-augmented workflows
  • Enterprise privacy posture
  • Great for customer support

Cohere API, AWS Bedrock

Includes 1,000 free requests per month on the Cohere platform.

Grok 4
xAI · Proprietary API
View docs

$0.0030/1K input · $0.0150/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 64,000 tokens

  • Reasoning-first architecture
  • Search-integrated agents
  • Supports long tool chains

xAI Platform, OpenRouter

Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens.

Grok 4 Fast
xAI · Proprietary API
View docs

$0.0002/1K input · $0.0005/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 32,000 tokens

  • Reduced latency Grok tier
  • Optimized for conversational agents
  • Lower cost search-augmented flows

xAI Platform, OpenRouter

Pricing sourced from OpenRouter USD/token rates (Oct 2025).

Grok Code Fast
xAI · Proprietary API
View docs

$0.0002/1K input · $0.0015/1K output

Usage billed per 1,000 tokens.

2,000,000 tokens

Output max 32,000 tokens

  • Code-focused Grok variant
  • Chain-of-thought reasoning
  • Supports tool-call heavy workloads

xAI Platform, OpenRouter

Pricing sourced from OpenRouter USD/token rates (Oct 2025).

ERNIE 4.5 21B A3B
Baidu · Proprietary API
View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

120,000 tokens

Output max 8,192 tokens

  • Strong Chinese reasoning
  • Enterprise security posture
  • Tool + retrieval ready

Baidu Qianfan, OpenRouter

Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens).

ERNIE 4.5 21B A3B Thinking
Baidu · Proprietary API
View docs

$0.0001/1K input · $0.0003/1K output

Usage billed per 1,000 tokens.

131,072 tokens

Output max 8,192 tokens

  • Reasoning mode enabled
  • Structured output
  • Supports deep tool-use

Baidu Qianfan, OpenRouter

Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion.

ERNIE 4.5 VL 28B A3B
Baidu · Proprietary API
View docs

$0.0001/1K input · $0.0006/1K output

Usage billed per 1,000 tokens.

30,000 tokens

Output max 6,144 tokens

  • Multimodal (vision + text)
  • Optimized for Chinese OCR and charts
  • Supports thinking mode

Baidu Qianfan, OpenRouter

OpenRouter pricing (Oct 2025) converted to USD per 1K tokens.

Hunyuan A13B Instruct
Tencent · Proprietary API
View docs

$0.0000/1K input · $0.0000/1K output

Usage billed per 1,000 tokens.

32,768 tokens

Output max 4,096 tokens

  • Tencent Cloud integration
  • Chinese + English support
  • Good for chatbots and customer service

Tencent Cloud Hunyuan API, OpenRouter

Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB.

How we estimate open-source costs

Self-hosted numbers assume on-demand GPU rentals. We approximate throughput using benchmark tokens per second and include warmup latency (autoscaling cold starts, health checks, validation). Running reserved instances or multi-region clusters? Adjust the hourly rate and redundancy buffer in the calculator.

  • Hourly rate

    Based on popular GPU options (AWS g6, RunPod L4, Paperspace A100). Layer in orchestration overhead (Kubernetes, inference gateways) if it meaningfully impacts spend.

  • Utilization

    The calculator divides total tokens by throughput to derive compute hours. Increase redundancy if you expect queueing or batch windows that drive utilization below 60%.