LLM API Cost Calculator

Drag the sliders to size a request, then compare the cost across 18 models from Claude, GPT, Gemini, Grok, DeepSeek, and Mistral — per request and per month, with Batch API and prompt-caching discounts.

Your usage

Input tokens / request 2,000 Output tokens / request 600 Cached input 0% Requests / month Use Batch API (50% off, async)

Caching and batch are modeled as multipliers; providers may not combine them. Estimates only.

Cost comparison — 18 models

Loading pricing…

Estimated API cost per request and per month by model, filterable by provider and sortable by column.

One planned request vs. many follow-ups

Every follow-up re-sends the whole conversation as input. This shows what that costs — and how much prompt caching claws back.

Model Context re-sent every turn 4,000 Tokens you add per turn 300 Tokens returned per turn 500 Number of turns 8 Prompt caching on (re-sent context cached)

Many follow-ups (no cache)—

Many follow-ups (cached)—

One planned request—

—

Model: each follow-up turn re-sends the context plus all prior turns. The "one planned request" path sends the context once with everything asked up front, for the same total work. Caching prices the re-sent prefix at the model's cache-read rate.

Token counter (estimate)

Paste text for a quick token estimate, then push it into the calculator.

0 est. tokens

0 characters

Rough heuristic (~4 chars/token). Real tokenization is model-specific — for exact Claude counts use the count_tokens API; OpenAI and Google have their own tokenizers.