Router API Guide

Understand the core routing engine that powers OpenRouter — request flow, provider selection logic, fallback configuration, and the parameters that give you fine-grained control over how your AI requests are served.

Core Routing Mechanics

The Router API is the request-handling backbone of the platform. It accepts your application's API calls, evaluates routing parameters to determine the optimal provider for each request, manages fallback when primary providers fail, and returns uniformly formatted responses — all while your client code interacts with a single stable endpoint.

How the Router API Works

Every API request sent to OpenRouter passes through the routing layer before reaching a model provider. This is the engine that transforms a multi-provider AI ecosystem into a single coherent API surface. When your application sends a chat completions request, the router evaluates several factors: which model you requested, which providers offer that model, current provider health and latency metrics, any routing preferences you specified, and your configured fallback strategy. Based on this evaluation, the router selects a provider, formats the request appropriately, forwards it, and returns the response in a consistent structure regardless of which provider handled it.

This abstraction is the platform's core value proposition. Without it, your application would need to maintain separate client configurations for each provider, handle unique error responses from each API, and implement its own load balancing and fallback logic. With the Router API, your code sends one request format and receives one response format — the complexity of the multi-provider landscape stays on the platform side.

Request Flow Through the Router

Each request follows a consistent path from your application through the routing layer to the selected provider and back.

When a request arrives at the chat completions endpoint, the router first authenticates your API key and checks that your account has sufficient credits for the requested model. It then evaluates the model parameter to identify candidate providers. If you specified a particular provider preference using the provider routing parameters, the router prioritizes that provider. If the primary provider is unavailable — due to an outage, rate limiting, or elevated latency — the router consults your fallback configuration to determine the next provider or model to try. Once a suitable provider is selected, the router formats the request, sends it, and streams or collects the response before returning it to your application with standardized metadata.

This entire flow completes in milliseconds beyond the model's generation time. The routing overhead is minimal because provider availability and latency data are cached and updated continuously in the background rather than checked synchronously on every request. The NIST AI standards program provides frameworks for evaluating the reliability of AI system components — principles that the router's fallback architecture directly addresses by ensuring no single provider dependency creates a failure mode for dependent applications.

Provider Selection Logic

The routing layer decides which provider handles your request based on availability, performance, and your optional preferences.

By default, when you specify a model without additional routing parameters, the router selects the provider that offers the best combination of availability and latency for that model. This default behavior is optimized for reliability — the router aims to return a response rather than an error. You can override this behavior with routing parameters that give you explicit control over provider selection.

Provider preference ordering lets you specify a ranked list of providers for a given model. If your application has observed better performance or pricing from a specific provider for your use case, you can instruct the router to always try that provider first. Cost optimization parameters direct the router to prefer the least expensive provider offering the requested model. Quality threshold parameters let you set minimum benchmarks that the selected provider must meet before the router accepts the response.

Fallback Configuration

Fallback routing is the router's most powerful reliability feature for production applications.

You can configure fallback at two levels: provider-level fallback and model-level fallback. Provider-level fallback tells the router to try alternative providers when the primary provider for your chosen model is unavailable. Model-level fallback tells the router to switch to an entirely different model if the requested model is unavailable across all providers. A common production configuration specifies a primary model with a primary provider, a ranked list of fallback providers for that model, and then a ranked list of fallback models — each potentially with their own provider preferences.

The router processes this fallback chain automatically. Your application sends a single request; the router may internally attempt multiple providers or models before returning a response. This internal retry logic is invisible to your client code, which simply receives the eventual response along with metadata indicating which provider and model ultimately served it. This transparency lets you monitor fallback frequency and adjust your configuration if particular providers or models are triggering fallback more often than expected.

Routing Parameters Reference

The table below describes the key routing parameters available through the API and their effects on request handling.

ParameterDescriptionValues
modelThe primary model identifier for the requestString: e.g. "openai/gpt-4o", "anthropic/claude-sonnet"
provider_preferenceRanked list of preferred providers for the selected modelObject: ordered provider identifiers with optional weights
fallback_modelsAlternative models to try if primary is unavailableArray: ordered list of model identifiers
cost_optimizationDirect router to prefer least expensive provider for modelBoolean: true or false
quality_thresholdMinimum quality score provider must meetNumber: 0.0 to 1.0 based on benchmark aggregation
max_latency_msMaximum acceptable response latency for provider selectionNumber: milliseconds
streamEnable server-sent events streaming for the responseBoolean: true or false

Response Metadata and Observability

Every API response includes routing metadata that tells you exactly how the request was handled. The response headers and body include the model that served the request, the provider that handled it, the token counts for input and output, the cost incurred, and the latency breakdown. This metadata is essential for monitoring application behavior in production — it lets you verify that routing preferences are being respected, track fallback frequency, and correlate user-facing issues with specific providers or models.

The analytics dashboard aggregates this response metadata across all your requests, giving you a platform-wide view of routing behavior. You can see which providers handle what percentage of your traffic, how often fallback is triggered, and what the cost implications are of different routing configurations. This observability transforms routing from a black-box operation into a tunable system that you can optimize based on real data about your application's behavior and costs.

Streaming and the Router API

Streaming responses follow the same routing logic as non-streaming requests with one important difference: fallback decisions are made before the response stream begins. Once the router has selected a provider and started streaming tokens, it cannot switch providers mid-stream if that provider becomes slow or unavailable. For applications where uninterrupted streaming is critical, the router supports pre-flight provider health checks that validate availability before beginning the stream — this adds a small latency cost to the first token but reduces the risk of mid-stream failures.

The streaming response format is compatible with the OpenAI streaming specification, so existing client libraries and UI components that handle SSE streams work without modification. The router adds routing metadata at the end of each stream, allowing your application to capture provider and model information even for streamed responses where the full metadata is not available until the stream completes.

Frequently Asked Questions

What does the Router API do?

The Router API is the request-handling layer that receives API calls, evaluates routing parameters to select the optimal provider and model, manages fallback when primary choices fail, and returns responses in a consistent format. Your application interacts with one stable endpoint while the router handles the complexity of the multi-provider landscape.

How does provider selection work?

When you specify a model, the router selects a provider based on availability, latency, and your optional routing preferences including provider ordering, cost optimization flags, and quality thresholds. The default behavior prioritizes availability and response quality without requiring application-level orchestration.

What fallback options are available?

Provider-level fallback tries alternative providers for the same model. Model-level fallback switches to entirely different models. Both can be configured as ranked lists, and the router processes them automatically — your client code sends one request and receives one response regardless of how many fallback attempts occur internally.

How does streaming work with the routing API?

Streaming uses standard SSE protocol compatible with OpenAI's format. The router proxies tokens from the selected provider. Fallback decisions are made before streaming begins to avoid mid-stream provider switches, with optional pre-flight health checks for applications that need maximum stream reliability.