Rate Limits

Per-endpoint rate limiting using the token bucket algorithm. Configure burst and sustained limits independently at the route level.

How it works

Endpointwise uses a token bucket algorithm for rate limiting. Each endpoint has a bucket with a capacity (burst) and a refill rate (sustained). Requests consume one token; when the bucket is empty, requests receive a 429 response until tokens refill.

Configuring limits

Define limits in your endpointwise.yaml config file:

routes:
  - path: /v1/invoices
    methods: [GET]
    rate_limit:
      sustained: 100/min   # refill rate
      burst: 20/s          # bucket capacity

  - path: /v1/reports
    methods: [GET]
    rate_limit:
      sustained: 5/min
      burst: 2/s

Limit types

Type	Meaning	Example
`sustained`	Long-term average request rate	`100/min`
`burst`	Peak rate allowed for short spikes	`20/s`
`daily`	Hard cap on calls per 24-hour window	`10000/day`

Response headers

Every gateway response includes rate limit headers so your partners can monitor their usage:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 84
X-RateLimit-Reset: 1717200060
Retry-After: 12   (only on 429)

429 response body

When a request is rate limited:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "You have exceeded the rate limit for GET /v1/invoices.",
  "limit": 100,
  "remaining": 0,
  "reset_at": 1717200060
}

Tip: Rate limit configuration is surfaced on the reference docs page for each endpoint. Partners can read the limit before writing code, not after hitting a wall.

Per-key limits

You can also set limits per API key, which stack with (are lower than) the endpoint-level limit:

npx endpointwise keys create \
  --name "Limited Partner" \
  --scope read \
  --rate-limit 20/min

If a key-level limit and an endpoint-level limit are both configured, the more restrictive of the two applies.

Global limits

A global limit applies across all routes and serves as a backstop against unexpected traffic:

gateway:
  global_rate_limit:
    sustained: 1000/min