Rate Limits
Per-endpoint rate limiting using the token bucket algorithm. Configure burst and sustained limits independently at the route level.
How it works
Endpointwise uses a token bucket algorithm for rate limiting. Each endpoint has a bucket with a capacity (burst) and a refill rate (sustained). Requests consume one token; when the bucket is empty, requests receive a 429 response until tokens refill.
Configuring limits
Define limits in your endpointwise.yaml config file:
routes:
- path: /v1/invoices
methods: [GET]
rate_limit:
sustained: 100/min # refill rate
burst: 20/s # bucket capacity
- path: /v1/reports
methods: [GET]
rate_limit:
sustained: 5/min
burst: 2/s
Limit types
| Type | Meaning | Example |
|---|---|---|
sustained | Long-term average request rate | 100/min |
burst | Peak rate allowed for short spikes | 20/s |
daily | Hard cap on calls per 24-hour window | 10000/day |
Response headers
Every gateway response includes rate limit headers so your partners can monitor their usage:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 84
X-RateLimit-Reset: 1717200060
Retry-After: 12 (only on 429)
429 response body
When a request is rate limited:
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "You have exceeded the rate limit for GET /v1/invoices.",
"limit": 100,
"remaining": 0,
"reset_at": 1717200060
}
Tip: Rate limit configuration is surfaced on the reference docs page for each endpoint. Partners can read the limit before writing code, not after hitting a wall.
Per-key limits
You can also set limits per API key, which stack with (are lower than) the endpoint-level limit:
npx endpointwise keys create \
--name "Limited Partner" \
--scope read \
--rate-limit 20/min
If a key-level limit and an endpoint-level limit are both configured, the more restrictive of the two applies.
Global limits
A global limit applies across all routes and serves as a backstop against unexpected traffic:
gateway:
global_rate_limit:
sustained: 1000/min