AI Agent Cost Planner

Model cost comparison tailored to your workload

Configure request volume, token usage, and success targets to understand the total cost of ownership for leading proprietary and open-source LLMs. Benchmark cost per successful task before you commit to scale.

Usage Assumptions

Enter the expected request volume and token usage. Adjust complexity to reflect prompt chaining or tool augmentation, and include a redundancy buffer for retries or fallbacks.

Key takeaways

Compare headline metrics to understand where each model shines. Notes include utilization assumptions for self-hosted options.

  • Llama 3.2 1B Instruct offers the lowest cost per successful task at $0.0000.
  • Llama 3.2 3B Instruct trails by 201.1% versus the leader, while offering low-latency chat agents.
  • Self-hosted models assume GPU utilization that matches throughput. Idle time inflates cost—verify notes against your deployment plan.
ModelMonthly costCost / requestCost / successfulEst. successesNotes
Llama 3.2 1B Instruct
Meta
$0.09 $0.0000 $0.000018,216
  • Pricing sourced from OpenRouter public rate card (Oct 2025); per-token rate converted to USD per 1K tokens.
  • Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 3B Instruct
Meta
$0.27 $0.0000 $0.000018,216
  • OpenRouter pricing converted to USD per 1K tokens (Oct 2025).
  • Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Hunyuan A13B Instruct
Tencent
$0.40 $0.0000 $0.000018,216
  • Pricing derived from OpenRouter USD/token rates (Oct 2025); aligns with Tencent Cloud public tariff converted from RMB.
  • Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 11B Vision Instruct
Meta
$0.65 $0.0000 $0.000018,216
  • Image output incurs $0.00007948 per token equivalent; values converted from OpenRouter USD/token pricing (Oct 2025).
  • Token pricing: $0.0000 (input) & $0.0000 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.0 Flash-Lite
Google
$1.32 $0.0001 $0.000118,216
  • Paid tier $0.05 (input) / $0.20 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.15 per 1M tokens.
  • Token pricing: $0.0001 (input) & $0.0002 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
ERNIE 4.5 21B A3B
Baidu
$1.84 $0.0001 $0.000118,216
  • Pricing from OpenRouter USD/token rates (Oct 2025); aligns with Baidu Qianfan list pricing (~¥0.0005 per 1K tokens).
  • Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
ERNIE 4.5 21B A3B Thinking
Baidu
$1.84 $0.0001 $0.000118,216
  • Reasoning mode pricing sourced from OpenRouter (Oct 2025); matches Baidu published ¥0.0005/¥0.0024 per 1K tokens after FX conversion.
  • Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 4 Scout
Meta
$2.02 $0.0001 $0.000118,216
  • Image add-ons billed at $0.0003342 equivalent; pricing from OpenRouter (Oct 2025).
  • Token pricing: $0.0001 (input) & $0.0003 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Nano
OpenAI
$2.19 $0.0001 $0.000118,216
  • Standard tier $0.05 (input) / $0.40 (output) per 1M tokens; cached input $0.005 per 1M.
  • Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1 nano
OpenAI
$2.63 $0.0001 $0.000118,216
  • Standard tier $0.10 (input) / $0.40 (output) per 1M tokens; cache hits billed at $0.025 per 1M input tokens.
  • Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.0 Flash
Google
$2.63 $0.0001 $0.000118,216
  • Paid tier $0.10 (input) / $0.40 (output) per 1M tokens (text/image/video). Audio inputs billed at $0.30 per 1M tokens.
  • Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.3 70B Instruct
Meta
$2.81 $0.0001 $0.000218,216
  • Pricing converted from OpenRouter USD/token rates (Oct 2025).
  • Token pricing: $0.0001 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Mixtral 8x7B Instruct
Mistral AI
$3.03 $0.0002 $0.000218,216
  • Self-hosted pricing assumes on-demand L4 GPU at ~$2.50/hour with autoscaling.
  • Self-hosted: $2.50/hr on RunPod NVIDIA L4 (8x7B MoE), ~8,000 tokens/sec throughput.
  • Assumes 0.5 compute hours + 0.8 warmup/idle hours per month.
ERNIE 4.5 VL 28B A3B
Baidu
$3.69 $0.0002 $0.000218,216
  • OpenRouter pricing (Oct 2025) converted to USD per 1K tokens.
  • Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o mini
OpenAI
$3.95 $0.0002 $0.000218,216
  • Standard tier pricing $0.15 (input) / $0.60 (output) per 1M tokens.
  • Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 4 Maverick
Meta
$3.95 $0.0002 $0.000218,216
  • Image outputs billed at $0.0006684 equivalent; pricing from OpenRouter (Oct 2025).
  • Token pricing: $0.0001 (input) & $0.0006 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok 4 Fast
xAI
$3.96 $0.0002 $0.000218,216
  • Pricing sourced from OpenRouter USD/token rates (Oct 2025).
  • Token pricing: $0.0002 (input) & $0.0005 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
DeepSeek V3.2 Exp
DeepSeek
$4.32 $0.0002 $0.000218,216
  • Official API pricing: $0.28 (input cache miss) / $0.42 (output) per 1M tokens. Cache hits billed at $0.07 per 1M input tokens.
  • Token pricing: $0.0003 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.2 90B Vision Instruct
Meta
$4.86 $0.0002 $0.000318,216
  • Image outputs billed at $0.0005058 equivalent per token; pricing reflects OpenRouter public rates (Oct 2025).
  • Token pricing: $0.0003 (input) & $0.0004 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Llama 3.1 70B Instruct
Meta
$5.34 $0.0003 $0.000318,216
  • Cost assumes steady-state utilization with autoscaling to avoid idle GPU time.
  • Self-hosted: $3.20/hr on AWS g6.8xlarge (1x L4 24GB), ~5,500 tokens/sec throughput.
  • Assumes 0.7 compute hours + 1.0 warmup/idle hours per month.
Gemini 2.5 Flash-Lite
Google
$6.78 $0.0003 $0.000418,216
  • Paid tier $0.15 (input) / $1.25 (output) per 1M tokens (text/image/video). Audio inputs bill at $0.50 per 1M tokens.
  • Token pricing: $0.0001 (input) & $0.0013 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 1.5 Flash
Google
$7.69 $0.0004 $0.000418,216
  • Legacy Flash pricing $0.35 (input) / $1.05 (output) per 1M tokens.
  • Token pricing: $0.0003 (input) & $0.0010 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok Code Fast
xAI
$8.32 $0.0004 $0.000518,216
  • Pricing sourced from OpenRouter USD/token rates (Oct 2025).
  • Token pricing: $0.0002 (input) & $0.0015 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Qwen-Plus
Alibaba Cloud
$8.79 $0.0004 $0.000518,216
  • Converted from DashScope postpaid plan pricing (approx. ¥0.0028 / ¥0.0085 per 1K tokens). Long-context upgrades and bundles available on enterprise plans.
  • Token pricing: $0.0004 (input) & $0.0012 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1 mini
OpenAI
$10.53 $0.0005 $0.000618,216
  • Standard tier $0.40 (input) / $1.60 (output) per 1M tokens; cache hits billed at $0.10 per 1M input tokens.
  • Token pricing: $0.0004 (input) & $0.0016 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Mini
OpenAI
$10.94 $0.0006 $0.000618,216
  • Standard tier $0.25 (input) / $2.00 (output) per 1M tokens; cached input $0.025 per 1M. Ideal for productized assistants needing GPT-5 alignment at lower cost.
  • Token pricing: $0.0003 (input) & $0.0020 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-3.5 Turbo (Jan 2025)
OpenAI
$10.99 $0.0006 $0.000618,216
  • Standard tier $0.50 (input) / $1.50 (output) per 1M tokens.
  • Token pricing: $0.0005 (input) & $0.0015 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Doubao Pro 32K
ByteDance
$18.46 $0.0009 $0.001018,216
  • Derived from Doubao public list price ¥0.006 (input) / ¥0.018 (output) per 1K tokens at 1 USD ≈ ¥7.05. Regional packages may differ.
  • Token pricing: $0.0008 (input) & $0.0025 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3.5 Haiku
Anthropic
$24.55 $0.0012 $0.001318,216
  • Legacy Haiku pricing $0.80 (input) / $4.00 (output) per 1M tokens.
  • Token pricing: $0.0008 (input) & $0.0040 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.5 Flash
Google
$27.35 $0.0014 $0.001518,216
  • Paid tier $0.625 (input) / $5.00 (output) per 1M tokens for prompts ≤200K. Requests above 200K input tokens bill at twice the rate.
  • Token pricing: $0.0006 (input) & $0.0050 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude Haiku 4.5
Anthropic
$30.69 $0.0016 $0.001718,216
  • Entry-tier Claude model: $1 (input) / $5 (output) per 1M tokens on the on-demand plan.
  • Token pricing: $0.0010 (input) & $0.0050 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Qwen-Max
Alibaba Cloud
$42.13 $0.0021 $0.002318,216
  • USD values converted from DashScope public rate card (¥0.011 / ¥0.044 per 1K tokens at 1 USD ≈ ¥6.9). Confirm tier-based discounts before launch.
  • Token pricing: $0.0016 (input) & $0.0064 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GLM-4 Plus
Zhipu AI
$46.15 $0.0023 $0.002518,216
  • USD values converted from ¥0.015 / ¥0.045 per 1K tokens (exchange rate 1 USD ≈ ¥7.1). Prompt caching available for enterprise tiers.
  • Token pricing: $0.0021 (input) & $0.0063 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4.1
OpenAI
$52.67 $0.0027 $0.002918,216
  • Standard processing tier $2.00 (input) / $8.00 (output) per 1M tokens. Cache hits bill at $0.50 per 1M input tokens.
  • Token pricing: $0.0020 (input) & $0.0080 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5
OpenAI
$54.70 $0.0028 $0.003018,216
  • Standard tier pricing $1.25 (input) / $10.00 (output) per 1M tokens; cached input billed at $0.125 per 1M. Context up to 400K tokens per official pricing (Jan 2025).
  • Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Codex
OpenAI
$54.70 $0.0028 $0.003018,216
  • Shares GPT-5 base pricing; tuned for software engineering and agent coding workloads.
  • Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 2.5 Pro
Google
$54.70 $0.0028 $0.003018,216
  • Paid tier $1.25 (input) / $10.00 (output) per 1M tokens for prompts ≤200K. Above 200K tokens doubles the rate; context caching priced separately.
  • Token pricing: $0.0013 (input) & $0.0100 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o
OpenAI
$65.84 $0.0033 $0.003618,216
  • Standard tier pricing $2.50 (input) / $10.00 (output) per 1M tokens as of Jan 2025. Prompts above 200K input tokens bill at the higher long-context rate.
  • Token pricing: $0.0025 (input) & $0.0100 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o (Aug 2024)
OpenAI
$65.84 $0.0033 $0.003618,216
  • Legacy GPT-4o release (Aug 2024) retained for backwards compatibility. Pricing mirrors the current GPT-4o public tier.
  • Token pricing: $0.0025 (input) & $0.0100 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude Sonnet 4.5
Anthropic
$92.07 $0.0047 $0.005118,216
  • Input $3 / 1M tokens (≤200K) and output $15 / 1M tokens. Requests above 200K input tokens upgrade to long-context pricing.
  • Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3.5 Sonnet
Anthropic
$92.07 $0.0047 $0.005118,216
  • Maintained for backwards compatibility; matches current Sonnet 4.5 pricing.
  • Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Cohere Command R+
Cohere
$92.07 $0.0047 $0.005118,216
  • Includes 1,000 free requests per month on the Cohere platform.
  • Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Grok 4
xAI
$92.07 $0.0047 $0.005118,216
  • Pricing derived from OpenRouter USD/token rates (Oct 2025); converted to USD per 1K tokens.
  • Token pricing: $0.0030 (input) & $0.0150 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4o (May 2024)
OpenAI
$109.89 $0.0056 $0.006018,216
  • May 2024 launch pricing $5 (input) / $15 (output) per 1M tokens. Useful for comparing pre-price-drop scenarios.
  • Token pricing: $0.0050 (input) & $0.0150 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Gemini 1.5 Pro
Google
$153.85 $0.0078 $0.008418,216
  • Legacy pricing $7 (input) / $21 (output) per 1M tokens. Still useful for workloads validated on the 1.5 series.
  • Token pricing: $0.0070 (input) & $0.0210 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Mistral Large
Mistral AI
$175.82 $0.0089 $0.009718,216
  • Latency-friendly alternative to GPT-4 tier.
  • Token pricing: $0.0080 (input) & $0.0240 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-4 Turbo (Apr 2024)
OpenAI
$219.78 $0.0111 $0.012118,216
  • April 2024 Turbo release $10 (input) / $30 (output) per 1M tokens. Useful for historical comparisons and fallback flows.
  • Token pricing: $0.0100 (input) & $0.0300 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
Claude 3 Opus
Anthropic
$460.35 $0.0232 $0.025318,216
  • Premium Claude tier retained for historical comparisons and customers still migrating to Claude 4.x.
  • Token pricing: $0.0150 (input) & $0.0750 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.
GPT-5 Pro
OpenAI
$656.37 $0.0331 $0.036018,216
  • Premium GPT-5 tier at $15 (input) / $120 (output) per 1M tokens; best reserved for mission-critical workloads.
  • Token pricing: $0.0150 (input) & $0.1200 (output) per 1K tokens.
  • Approx. 8,910,000 input tokens & 4,356,000 output tokens billed.