Model

Overview

Llumen allows you to configure individual models with specific capabilities and parameters. This enables fine-tuned control over how each AI model behaves.

Configuration Format

Model configurations are stored in TOML format. Each model has:

Display name - Human-readable name shown in UI
Model ID - Identifier used by the API provider
Capabilities - What the model supports(optional)
Parameters - Inference settings(optional)

Basic Configuration

Minimal Example

display_name = "GPT-OSS 20B"
# openrouter suffix are supported
model_id = "openai/gpt-oss-20b:nitro"

Complete Example

display_name = "Claude 4.5 Sonnet"
model_id = "anthropic/claude-4.5-sonnet"

[capability]
image = true # upload image
audio = false # upload audio
ocr = "Mistral"
tool = true
json = true
reasoning = true

[parameter]
temperature = 0.7
top_p = 0.9
top_k = 40
repeat_penalty = 1.1

Model Capabilities

Configure what features the model supports, image generation are always auto-detect(openrouter only).

note

You don't need this if you are using openrouter. Openrouter API allow llumen to detect it for you.

Input/Upload

Set image, audio to true to override auto-detection

Structured output

When set to true: deep research mode will be more accurate. Eliminate error like "Here is research..." is not a valid plan.
When set to false: deep research mode will retry once if error.
When not set: Llumen will guess its support

OCR Engine

[capability]
ocr = "Mistral"  # Options: "Native", "Text", "Mistral", "Disabled"

Tool Use

When set to true: search/deep research mode will be enable.
When set to false: normal mode only.
When not set: Llumen will guess its support

Reasoning

When set to true: enable reasoning, generate nothing if model doesn't support it.
When set to false: no reasoning.
When not set: Llumen will guess its support, enable if supported.

note

Llumen support interleaved thinking(If model support it)

Model Parameters

Fine-tune inference behavior:

Temperature

[parameter]
temperature = 0.7  # Range: 0.0 - 2.0

Examples:

# For coding assistance
temperature = 0.2

# For creative writing
temperature = 0.9

# For balanced chat
temperature = 0.7

Top P (Nucleus Sampling)

[parameter]
top_p = 0.9  # Range: 0.0 - 1.0

Controls diversity by limiting token selection:

0.1 - Very focused, predictable
0.5 - Moderate diversity
0.9 - High diversity (recommended)
1.0 - All tokens considered

note

Use either temperature OR top_p, not both. OpenRouter recommends top_p.

Top K

[parameter]
top_k = 40  # Range: 1 - 100+

Repeat Penalty

[parameter]
repeat_penalty = 1.1  # Range: 1.0 - 2.0

Reduces repetition in responses:

1.0 - No penalty (repetitive)
1.1 - Light penalty (recommended)
1.2-1.3 - Moderate penalty
1.5+ - Strong penalty (may affect quality)

Configuring Models in Llumen

Via Web Interface

Log in to llumen
Go to Settings -> Openrouter
Add or edit model configurations
Save changes

Overview​

Configuration Format​

Basic Configuration​

Minimal Example​

Complete Example​

Model Capabilities​

Input/Upload​

Structured output​

OCR Engine​

Tool Use​

Reasoning​

Model Parameters​

Temperature​

Top P (Nucleus Sampling)​

Top K​

Repeat Penalty​

Configuring Models in Llumen​

Via Web Interface​

Overview

Configuration Format

Basic Configuration

Minimal Example

Complete Example

Model Capabilities

Input/Upload

Structured output

OCR Engine

Tool Use

Reasoning

Model Parameters

Temperature

Top P (Nucleus Sampling)

Top K

Repeat Penalty

Configuring Models in Llumen

Via Web Interface