Essential Technical Context
Every AI model accessible through the OpenRouter API accepts a standard set of generation parameters that control output behavior. Understanding these parameters — what each one does, how they interact, and when to adjust them — is fundamental to getting consistent, production-quality results from language models. This reference covers every parameter the API supports, with practical guidance for common configuration scenarios.
Understanding Model Parameters: The Full Reference
Language models generate text by predicting tokens one at a time based on the input prompt and the parameters you supply. These parameters act as control knobs that shape how the model selects each successive token. The default settings produce reasonable output for general-purpose use, but production applications almost always require parameter tuning specific to their use case. A customer support chatbot needs different sampling settings than a creative writing assistant. A code generation tool requires different stop sequence configuration than a summarization pipeline. The parameter table below provides the complete reference, followed by detailed guidance on configuring each parameter for specific workloads.
Before adjusting any parameter, it is worth understanding the cost of getting parameters wrong. An overly high temperature setting in a factual Q&A system produces hallucinated answers that erode user trust. A max_tokens value set too low truncates responses mid-sentence, creating a broken user experience. Missing stop sequences in a structured data extraction workflow causes the model to generate irrelevant text beyond the target output, increasing token costs and complicating downstream parsing. The time spent understanding these parameters during development pays for itself many times over in production reliability. For a deeper framework on responsible parameter configuration, the NIST AI Risk Management Framework provides guidance on systematic evaluation of model behavior under varied parameter settings.
How Sampling Parameters Shape Output Behavior
Sampling parameters — temperature, top_p, and top_k — control the randomness of token selection during response generation. They determine whether the model produces consistent, predictable output (low randomness) or varied, creative output (high randomness). These parameters are interdependent: adjusting temperature changes the shape of the entire probability distribution, while top_p truncates the distribution to the most likely tokens. Teams new to LLM integration often adjust both parameters simultaneously and then struggle to attribute output changes to the correct control. The recommended approach is to set top_p to its default of 1.0 and adjust temperature alone until the desired randomness profile is achieved.
Complete Parameter Reference
The table below documents every generation parameter accepted by the OpenRouter API. Default values shown are platform defaults; individual models may override certain defaults based on provider specifications. Parameters marked as provider-specific may not be supported by all models in the catalog — verify support for your target model before depending on these parameters in production code.
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
| temperature | float | 0.0 – 2.0 | 1.0 | Controls output randomness; lower values produce more deterministic responses |
| top_p | float | 0.0 – 1.0 | 1.0 | Nucleus sampling threshold; limits token selection to cumulative probability mass |
| max_tokens | integer | 1 – context limit | varies | Maximum tokens the model can generate in a single response |
| stop | string / array | any string(s) | none | Sequence(s) at which the model stops generating further tokens |
| presence_penalty | float | -2.0 – 2.0 | 0.0 | Penalizes tokens that have appeared in the text so far, reducing repetition |
| frequency_penalty | float | -2.0 – 2.0 | 0.0 | Penalizes tokens proportional to their existing frequency in the text |
| logit_bias | object | -100 – 100 | none | Per-token probability adjustment; positive values increase likelihood |
| seed | integer | any integer | none | Deterministic sampling seed for reproducible outputs |
| response_format | object | text / json_object | text | Structured output mode; forces valid JSON when set to json_object |
| top_k | integer | 1+ | varies | Limits sampling to the k most likely next tokens (provider-specific) |
Parameter combinations can produce emergent behaviors that neither parameter produces in isolation. For instance, setting temperature to 0 means the model always selects the highest-probability token — but if top_p is simultaneously set to 0.1, the sampling pool shrinks to only tokens capturing the top 10% of probability mass. In this configuration, the model might be forced to select from a pool of equally unlikely tokens, producing erratic output despite the deterministic temperature setting.
Practical Parameter Configuration Patterns
The following configuration patterns represent starting points that teams have found effective across the most common production use cases. These are not universal prescriptions — every application should validate its parameter settings against actual workload data — but they provide reasonable defaults that reduce the experimentation surface for new integrations.
Factual Q&A and Knowledge Retrieval
Applications that answer factual questions from a knowledge base benefit from low-temperature, high-determinism configurations. Set temperature between 0.0 and 0.3, top_p at 1.0, and presence_penalty at 0.0. These settings encourage the model to stay anchored to the provided context rather than inventing plausible-sounding but unsupported claims. Max_tokens should be set based on expected answer length — 256 to 512 tokens covers most Q&A responses without waste.
Code Generation and Technical Content
Code generation requires a careful balance. Setting temperature too low produces repetitive, pattern-matched output that may fail to solve novel problems. Setting it too high introduces syntax errors and hallucinated API calls. A temperature of 0.2 to 0.5 with top_p at 0.95 works well for most code generation tasks. Stop sequences are particularly important here: set stop tokens to terminate generation after a code block closure or function end to prevent the model from generating irrelevant commentary beyond the requested code.
Creative Writing and Content Generation
For applications where output diversity is desirable — marketing copy, creative storytelling, ideation support — higher temperature settings between 0.7 and 0.9 with top_p at 0.9 produce more varied and interesting output. Presence_penalty can be raised to 0.3 to 0.6 to discourage repetitive phrasing. Frequency_penalty at a modest 0.2 reduces the tendency for models to loop on favored phrases without making output feel forced or unnatural.
Structured Data Extraction
When the goal is extracting structured JSON from unstructured text, the response_format parameter becomes the most important control. Set it to json_object and provide a JSON schema that defines the expected output shape. Temperature should be set low (0.0 to 0.2) to maximize consistency. Max_tokens should be generous enough to accommodate the largest expected JSON output plus a safety margin. Stop sequences should include the JSON closing bracket to prevent trailing content generation.
Getting the parameter configuration right eliminated about 80% of the post-processing code in our deployment pipeline. We spent two days tuning temperature and stop sequences for our document extraction workflow and the result was output so consistent that we removed an entire validation layer. The structured output mode alone saved us from a parsing error rate that had been hovering around 3% across millions of documents.Kwame Osei — DevOps Lead, Ascend Tech (Nashville, TN)
Frequently Asked Questions About Model Parameters
Why does setting temperature to 0 not always produce identical output?
Even at temperature 0, floating-point arithmetic differences across GPU hardware and provider backend implementations can introduce minor output variation. For applications requiring cryptographic-level determinism, additionally set the seed parameter to a fixed integer value. Even with both temperature at 0 and a fixed seed, subtle provider-side differences in tokenization or model serving infrastructure may produce small variations. Treat deterministic parameters as strong preferences rather than absolute guarantees.
How should I configure stop sequences for multi-turn conversations?
For chat applications, stop sequences should include tokens that indicate the model has completed a coherent response and that the next turn should begin. Common stop sequences include double newlines, the user role prefix, or a custom delimiter token. Stop sequences prevent the model from generating beyond its turn into content that should come from the user or the next conversational step. Test stop sequence configurations with a diverse set of conversation flows before deploying to production.
Do all models on OpenRouter support the same parameters?
Most models support the standard set of temperature, top_p, max_tokens, stop, presence_penalty, and frequency_penalty. Advanced parameters like logit_bias and structured output formatting may have provider-specific support. The OpenRouter API response includes a supported_parameters field for each model that enumerates exactly which parameters are accepted, allowing applications to conditionally enable features based on model capabilities.
What parameter settings minimize hallucination risk?
No parameter setting eliminates hallucination entirely, but a combination of low temperature (0.0 to 0.2), top_p at 1.0, and presence_penalty at 0.0 reduces the model's tendency to generate novel claims unsupported by the input context. More effective than parameter tuning alone is pairing these settings with system prompts that explicitly instruct the model to state when information is unavailable rather than fabricating answers, and implementing retrieval-augmented generation that grounds responses in verified source material.
Experiment With Parameters in Real Time
Test parameter configurations across multiple models simultaneously in the OpenRouter Playground and observe how each setting affects output.
Open the Playground