What are the OpenRouter API rate limits by account tier?

OpenRouter applies rate limits by account tier: Free accounts receive 20 requests per minute, Standard accounts get 200 requests per minute, and Enterprise accounts can be configured with custom limits reaching thousands of requests per minute. Token-per-minute limits also apply and scale with the request limits at each tier.

How do I handle a 429 Too Many Requests response?

When you receive a 429 response, check the Retry-After header for the number of seconds to wait before retrying. Implement exponential backoff with jitter in your client code — wait the specified duration, and if the next request also returns 429, double the wait time up to a maximum of 60 seconds. The OpenRouter SDKs handle this retry logic automatically.

How can I increase my rate limit beyond the default tier?

Upgrade your account from Free to Standard through the billing dashboard for an immediate rate limit increase. For Enterprise-tier limits, contact OpenRouter support with your expected request volume and use case. Enterprise customers receive custom rate limit configurations tailored to their production workload requirements.

Are rate limits applied per API key or per account?

Rate limits on OpenRouter are applied per API key for request limits, allowing you to distribute load across multiple keys within the same account. Token-per-minute limits are applied at the account level to prevent runaway costs, providing a safety net regardless of how many keys are in use simultaneously.

Do rate limits vary by model or endpoint?

Standard request rate limits apply uniformly across all models and endpoints. However, free-tier models may have additional per-model token limits to ensure fair access for all users. Premium models accessed through paid tiers do not carry per-model rate restrictions beyond the account-level limits.

API Rate Limits | OpenRouter

Rate Management Strategy

Rate limiting is not a punitive measure — it is a traffic management system that protects platform stability for every user while preventing individual applications from consuming disproportionate resources. Configuring your client to handle rate limit responses correctly is the single most important reliability decision you will make for any application that operates at scale. The configuration details below cover every tier, every response code, and every retry strategy you need to keep your integration running smoothly.

Rate Limit Tiers & Quotas

OpenRouter structures rate limits across Free, Standard, and Enterprise account tiers with per-minute request quotas and per-minute token budgets that scale with your usage level and payment plan.
Compare rate limit tiers →
429 Response Handling

When your application exceeds its rate limit, the platform returns HTTP 429 with a Retry-After header. Proper handling of this response prevents cascading failures and unnecessary retry storms.
Read 429 handling guide →
Exponential Backoff Strategy

Implementing exponential backoff with jitter distributes retry attempts across time and prevents the thundering herd problem that can occur when multiple clients retry simultaneously.
Learn backoff strategies →
Monitoring & Alerting

Track rate limit proximity through response headers and usage API endpoints. Set up alerts before your application approaches its limit to avoid surprise 429 responses during peak traffic.
Set up rate limit monitoring →

Rate Limit Tiers Explained

OpenRouter rate limits operate on two dimensions: requests per minute and tokens per minute. The request limit governs how many individual API calls your application can make in a sixty-second window. The token limit caps the total number of input and output tokens consumed across all requests within the same window, providing a cost-protection mechanism that prevents a misconfigured application from accumulating unexpected charges.

Free tier accounts receive 20 requests per minute and 40,000 tokens per minute — sufficient for prototyping, personal projects, and low-traffic integrations. The Standard tier increases these limits to 200 requests per minute and 400,000 tokens per minute, suitable for production applications with moderate user bases. Enterprise accounts negotiate custom limits based on projected workload, with many reaching thousands of requests per minute and token budgets in the millions. All tiers share the same underlying infrastructure, so the quality of model responses does not degrade at lower tiers — only the throughput ceiling changes.

Rate limits reset on a rolling sixty-second window rather than a fixed clock boundary. If your application sends 200 requests across a sixty-second period that starts at 10:03:27, the limit resets gradually — the request sent at 10:03:27 ages out at 10:04:27, making room for a new request. This rolling-window approach smooths traffic more effectively than fixed-window reset points, which can create burst patterns that stress infrastructure at the top of each minute.

Plan	Requests / Minute	Tokens / Minute
Free	20	40,000
Standard	200	400,000
Growth	600	1,200,000
Business	1,500	3,000,000
Enterprise	Custom (5,000+)	Custom (10,000,000+)

Handling 429 Too Many Requests

A 429 response from the OpenRouter API signals that your application has exceeded its rate limit for the current window. The response includes a Retry-After header specifying the number of seconds to wait before sending another request. Your client code should respect this header exactly — sending requests before the specified duration elapses will result in additional 429 responses, wasting bandwidth and delaying recovery.

The 429 response body includes a JSON error object with a code field set to rate_limit_exceeded and a message field describing which limit was hit. If the request limit triggered the 429, the message identifies the requests-per-minute cap. If the token limit caused it, the message references the tokens-per-minute budget. Your application can parse this distinction to decide whether to reduce request frequency, switch to a model with shorter responses, or both.

Do not treat 429 responses as fatal errors. They are flow-control signals, not failures. A well-designed client handles them transparently — pausing, waiting, and resuming — while logging the event so operations teams can identify trends. If your application consistently hits rate limits, the correct response is not to retry more aggressively but to upgrade your account tier or optimize your request patterns to consume fewer resources per operation.

Distinguishing Rate Limits from Other 4xx Errors

Rate limit errors can be confused with authentication failures in poorly designed error handling. A 401 response means your API key is invalid or expired — retrying with the same key will never succeed. A 429 response means your key is valid but you have sent too many requests — retrying after a delay will succeed. Always check the HTTP status code before deciding on a retry strategy, and never retry 401 errors with the same credentials.

Implementing Exponential Backoff

Exponential backoff is the standard retry strategy for rate-limited APIs. When a request returns 429, your client waits for the duration specified in the Retry-After header, retries the request, and if another 429 arrives, doubles the wait time before the next attempt. Adding random jitter — a small, random variation in the wait time — prevents multiple client instances from synchronizing their retry attempts and creating a thundering herd that overwhelms the rate limiter again.

The OpenRouter SDKs implement backoff with jitter automatically. In the Python SDK, configure retry behavior through the client initialization options: max_retries=3, backoff_factor=0.5, and jitter=True. The JavaScript SDK accepts similar configuration via retryConfig. The Go SDK uses the standard retry package pattern with functional options. If you are writing a custom HTTP client rather than using an SDK, implement backoff manually using the algorithm: wait = min(retry_after * (2 ^ attempt) + random_jitter, max_wait_seconds).

For guidance on building resilient API integrations, the NIST Artificial Intelligence standards program includes resources on reliable system design patterns that apply broadly to AI service integrations. The Consumer Financial Protection Bureau's technical documentation provides additional context on designing consumer-facing services with appropriate rate management and error transparency.

Monitoring Rate Limit Usage

Every API response from OpenRouter includes rate limit headers that your application can use to track proximity to its limits without querying a separate endpoint. The X-RateLimit-Remaining header shows how many requests remain in the current sixty-second window. X-RateLimit-Reset provides a Unix timestamp indicating when the window resets. Monitoring these headers in your application's observability stack — whether Prometheus, Datadog, or CloudWatch — gives operations teams visibility into rate limit consumption trends before they cause 429 responses.

For programmatic access to historical usage data, the /api/v1/usage endpoint returns token consumption and request counts filtered by time range. Build a dashboard query that compares recent usage against your tier's limits as a percentage. Set an alert threshold at 80% of your limit so your team can investigate or request a tier upgrade before the application starts receiving 429 responses during production traffic.

Understanding the rate limit architecture before we launched saved us from an embarrassing production incident. We configured our client with exponential backoff and jitter based on the documentation, ran load tests against the Standard tier limits, and went live knowing exactly what would happen when we hit our ceiling. When Black Friday traffic tripled our normal request volume, the system handled 429 responses gracefully — users saw slightly slower responses instead of errors.

Aisha Rahman — Director of Engineering, Celestial Networks (Dallas, TX)

Frequently Asked Questions About Rate Limits

What are the rate limits for each OpenRouter account tier?

Free accounts receive 20 requests per minute with a 40,000 token budget. Standard accounts get 200 requests and 400,000 tokens per minute. Growth and Business tiers provide 600 and 1,500 requests per minute respectively. Enterprise accounts negotiate custom limits based on projected workload requirements.

How should my application handle a 429 response?

Read the Retry-After header from the 429 response, wait the specified number of seconds, and retry the request. If subsequent requests also return 429, double the wait time with jitter up to a maximum of 60 seconds. The OpenRouter SDKs implement this retry logic automatically — no custom code required.

Can I increase my rate limit mid-project without downtime?

Yes — upgrading from Free to Standard takes effect immediately through the billing dashboard with no service interruption. Enterprise custom limits are configured by OpenRouter support after a brief consultation. Rate limit changes apply instantly to all existing API keys on your account.

Are token limits and request limits enforced independently?

They are enforced simultaneously — either limit can trigger a 429 response. A burst of short requests might hit the request limit while staying under the token budget, while a few requests with very long responses could exhaust the token limit without approaching the request cap. Monitor both dimensions in your observability stack.

Do rate limits affect streaming responses differently?

Streaming requests count as a single request against your request limit regardless of how many tokens the stream delivers. However, the tokens delivered through the stream count against your token-per-minute budget. Long-running streams on Free tier accounts may need to be mindful of the 40,000 token ceiling.

Ready to Scale Your API Usage?

Upgrade your account tier for higher rate limits, or start with the Free tier to test your integration at no cost.

Get Started Now

API Rate Limits