Skip to main content
The Muxx Gateway can cache LLM responses, returning cached results for identical requests. This reduces both latency and costs.

How It Works

When caching is enabled:
  1. The gateway generates a cache key from the request (model, messages, parameters)
  2. If a cached response exists and hasn’t expired, it’s returned immediately
  3. If not, the request goes to the provider and the response is cached
Request --> Cache Hit?
            |-- Yes --> Return cached response (fast, free)
            |-- No  --> Forward to provider --> Cache response --> Return

Enabling Caching

Caching is configured per-project in the dashboard:
  1. Go to your project Settings
  2. Navigate to GatewayCaching
  3. Toggle caching on
  4. Set your preferred TTL (time-to-live)

Cache TTL

The TTL determines how long responses are cached:
TTLUse case
1 hourFrequently changing data
24 hoursStable content, good for most use cases
7 daysStatic content, maximum cost savings

Cache Keys

The cache key is generated from:
  • Model name
  • Messages/prompt content
  • Temperature (if set)
  • Other generation parameters
Requests with temperature > 0 are still cached, but you may want shorter TTLs since you might want varied responses.

Cache Headers

The gateway adds headers to indicate cache status:
X-Muxx-Cache: HIT    # Response from cache
X-Muxx-Cache: MISS   # Response from provider (now cached)

Bypassing Cache

To force a fresh response, add the header:
client = OpenAI(
    base_url="https://gateway.muxx.dev/v1",
    default_headers={
        "X-Muxx-Api-Key": "muxx_sk_live_xxxxxxxxxxxx",
        "X-Muxx-Cache-Control": "no-cache"
    }
)

Cost Savings

Cached responses are free—you only pay for the original request. For applications with repeated queries, caching can significantly reduce costs. Example savings:
  • 1000 requests, 60% cache hit rate
  • Only 400 requests billed to the provider
  • 60% cost reduction

Viewing Cache Stats

In the dashboard, you can see:
  • Cache hit rate over time
  • Cost savings from caching
  • Most frequently cached requests