How It Works
When caching is enabled:- The gateway generates a cache key from the request (model, messages, parameters)
- If a cached response exists and hasn’t expired, it’s returned immediately
- If not, the request goes to the provider and the response is cached
Enabling Caching
Caching is configured per-project in the dashboard:- Go to your project Settings
- Navigate to Gateway → Caching
- Toggle caching on
- Set your preferred TTL (time-to-live)
Cache TTL
The TTL determines how long responses are cached:| TTL | Use case |
|---|---|
| 1 hour | Frequently changing data |
| 24 hours | Stable content, good for most use cases |
| 7 days | Static content, maximum cost savings |
Cache Keys
The cache key is generated from:- Model name
- Messages/prompt content
- Temperature (if set)
- Other generation parameters
Requests with
temperature > 0 are still cached, but you may want shorter TTLs since you might want varied responses.Cache Headers
The gateway adds headers to indicate cache status:Bypassing Cache
To force a fresh response, add the header:Cost Savings
Cached responses are free—you only pay for the original request. For applications with repeated queries, caching can significantly reduce costs. Example savings:- 1000 requests, 60% cache hit rate
- Only 400 requests billed to the provider
- 60% cost reduction
Viewing Cache Stats
In the dashboard, you can see:- Cache hit rate over time
- Cost savings from caching
- Most frequently cached requests