Configuring Rate Limits
Set rate limits in your project settings:- Go to your project Settings
- Navigate to Gateway → Rate Limiting
- Configure your limits
Limit Types
Requests Per Minute (RPM)
Limit the total number of requests per minute:Tokens Per Minute (TPM)
Limit the total tokens (input + output) per minute:Daily Request Limit
Limit total requests per day:Rate Limit Headers
When rate limiting is enabled, responses include headers:Handling Rate Limits
When a rate limit is exceeded, the gateway returns a429 Too Many Requests response:
Implementing Retries
Per-User Rate Limits
You can apply rate limits per user by including user metadata:Rate Limit Alerts
Set up alerts to notify you when approaching limits:- Go to Settings → Alerts
- Add a rate limit alert
- Choose threshold (e.g., 80% of limit)
- Select notification channel (email, Slack, webhook)