Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.muxx.dev/llms.txt

Use this file to discover all available pages before exploring further.

The Muxx Gateway can cache LLM responses, returning cached results for identical requests. This reduces both latency and costs.

How It Works

When caching is enabled:
  1. The gateway generates a cache key from the request (model, messages, parameters)
  2. If a cached response exists and hasn’t expired, it’s returned immediately
  3. If not, the request goes to the provider and the response is cached
Request --> Cache Hit?
            |-- Yes --> Return cached response (fast, free)
            |-- No  --> Forward to provider --> Cache response --> Return

Enabling Caching

Caching is configured per-project in the dashboard:
  1. Go to your project Settings
  2. Navigate to GatewayCaching
  3. Toggle caching on
  4. Set your preferred TTL (time-to-live)

Cache TTL

The TTL determines how long responses are cached:
TTLUse case
1 hourFrequently changing data
24 hoursStable content, good for most use cases
7 daysStatic content, maximum cost savings

Cache Keys

The cache key is generated from:
  • Model name
  • Messages/prompt content
  • Temperature (if set)
  • Other generation parameters
Requests with temperature > 0 are still cached, but you may want shorter TTLs since you might want varied responses.

Cache Headers

The gateway adds headers to indicate cache status:
X-Muxx-Cache: HIT    # Response from cache
X-Muxx-Cache: MISS   # Response from provider (now cached)

Bypassing Cache

To force a fresh response, add the header:
client = OpenAI(
    base_url="https://gateway.muxx.dev/v1",
    default_headers={
        "X-Muxx-Api-Key": "muxx_sk_live_xxxxxxxxxxxx",
        "X-Muxx-Cache-Control": "no-cache"
    }
)

Cost Savings

Cached responses are free—you only pay for the original request. For applications with repeated queries, caching can significantly reduce costs. Example savings:
  • 1000 requests, 60% cache hit rate
  • Only 400 requests billed to the provider
  • 60% cost reduction

Viewing Cache Stats

In the dashboard, you can see:
  • Cache hit rate over time
  • Cost savings from caching
  • Most frequently cached requests