Cost Optimization Guide
Strategies and best practices to reduce your LLM API costs by up to 95%
Prompt Caching
Cache repeated context (system prompts, examples, knowledge base)
Supported Providers
OpenAIAnthropicGoogle
Best For
High-frequency requests with repeated context
Implementation
Structure prompts with static content first, variable content last
Claude offers 5-minute TTL, OpenAI and Gemini have similar caching
Batch Processing
Submit requests in batches for 50% discount
Supported Providers
AnthropicGoogle
Best For
Offline processing, analytics, bulk operations
Implementation
Use batch API endpoints for non-urgent requests
Trade latency for cost savings
Model Right-sizing
Use smallest model that meets quality requirements
Supported Providers
All
Best For
All use cases
Implementation
Test mini/nano models before using flagship models
GPT-4o Mini vs GPT-5: 88% cost reduction
Optimize Chunking
Choose chunking strategy based on use case
Supported Providers
All
Best For
Document processing tasks
Implementation
Use semantic chunking over fixed-size when possible
Avoid unnecessary overlap and reassembly costs
Context Window Tiers
Use appropriate tier for prompt length
Supported Providers
AnthropicGoogle
Best For
Long-context tasks
Implementation
Keep prompts under 200K tokens when possible
Claude Sonnet: $3 vs $6 input, Gemini Pro: $1.25 vs $2.50
