Model
Overview
Llumen allows you to configure individual models with specific capabilities and parameters. This enables fine-tuned control over how each AI model behaves.
Configuration Format
Model configurations are stored in TOML format. Each model has:
- Display name - Human-readable name shown in UI
- Model ID - Identifier used by the API provider
- Capabilities - What the model supports(optional)
- Parameters - Inference settings(optional)
Basic Configuration
Minimal Example
display_name = "GPT-OSS 20B"
# openrouter suffix are supported
model_id = "openai/gpt-oss-20b:nitro"
Complete Example
display_name = "Claude 4.5 Sonnet"
model_id = "anthropic/claude-4.5-sonnet"
[capability]
image = true # upload image
audio = false # upload audio
ocr = "Mistral"
tool = true
json = true
reasoning = true
[parameter]
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1
Model Capabilities
Configure what features the model supports, image generation are always auto-detect(openrouter only).
note
You don't need this if you are using openrouter. Openrouter API allow llumen to detect it for you.
Input/Upload
Set image, audio to true to override auto-detection
Structured output
- When set to true: deep research mode will be more accurate. Eliminate error like
"Here is research..." is not a valid plan. - When set to false: deep research mode will retry once if error.
- When not set: Llumen will guess its support
OCR Engine
[capability]
ocr = "Mistral" # Options: "Native", "Text", "Mistral", "Disabled"
Tool Use
- When set to true: search/deep research mode will be enable.
- When set to false: normal mode only.
- When not set: Llumen will guess its support
Reasoning
- When set to true: enable reasoning, generate nothing if model doesn't support it.
- When set to false: no reasoning.
- When not set: Llumen will guess its support, enable if supported.
note
Llumen support interleaved thinking(If model support it)
Model Parameters
Fine-tune inference behavior:
Temperature
[parameter]
temperature = 0.7 # Range: 0.0 - 2.0
Examples:
# For coding assistance
temperature = 0.2
# For creative writing
temperature = 0.9
# For balanced chat
temperature = 0.7
Top P (Nucleus Sampling)
[parameter]
top_p = 0.9 # Range: 0.0 - 1.0
Controls diversity by limiting token selection:
- 0.1 - Very focused, predictable
- 0.5 - Moderate diversity
- 0.9 - High diversity (recommended)
- 1.0 - All tokens considered
note
Use either temperature OR top_p, not both. OpenRouter recommends top_p.
Top K
[parameter]
top_k = 40 # Range: 1 - 100+
Repeat Penalty
[parameter]
repeat_penalty = 1.1 # Range: 1.0 - 2.0
Reduces repetition in responses:
- 1.0 - No penalty (repetitive)
- 1.1 - Light penalty (recommended)
- 1.2-1.3 - Moderate penalty
- 1.5+ - Strong penalty (may affect quality)
Configuring Models in Llumen
Via Web Interface
- Log in to llumen
- Go to Settings -> Openrouter
- Add or edit model configurations
- Save changes