Cost Optimization Guide

Strategies and best practices to reduce your LLM API costs by up to 95%

Prompt Caching

Cache repeated context (system prompts, examples, knowledge base)

75-95%

Supported Providers

OpenAIAnthropicGoogle

Best For

High-frequency requests with repeated context

Implementation

Structure prompts with static content first, variable content last

Claude offers 5-minute TTL, OpenAI and Gemini have similar caching

Batch Processing

Submit requests in batches for 50% discount

50%

Supported Providers

AnthropicGoogle

Best For

Offline processing, analytics, bulk operations

Implementation

Use batch API endpoints for non-urgent requests

Trade latency for cost savings

Model Right-sizing

Use smallest model that meets quality requirements

80-95%

Supported Providers

All

Best For

All use cases

Implementation

Test mini/nano models before using flagship models

GPT-4o Mini vs GPT-5: 88% cost reduction

Optimize Chunking

Choose chunking strategy based on use case

15-40%

Supported Providers

All

Best For

Document processing tasks

Implementation

Use semantic chunking over fixed-size when possible

Avoid unnecessary overlap and reassembly costs

Context Window Tiers

Use appropriate tier for prompt length

50%

Supported Providers

AnthropicGoogle

Best For

Long-context tasks

Implementation

Keep prompts under 200K tokens when possible

Claude Sonnet: $3 vs $6 input, Gemini Pro: $1.25 vs $2.50