Models
How Sonzai picks a model for chat and a (different) model for the background post-processing — providers, BYOK, and the scope cascade.
Every Sonzai chat turn fans out into two model calls:
- Chat completion — what the user sees, streamed back live. Pick this for personality and quality.
- Post-processing — the latency-insensitive batch work that runs after the reply ships: fact extraction, deduplication, mood updates, personality drift, summarisation, diary, constellation. Pick this for cost and throughput.
The two are configured independently. A frontier chat model can pair with a cheap flash-lite extractor, and Sonzai resolves both per call through a five-layer cascade that lets you override at agent, project, account (tenant), and session scope.
Supported providers
| ID | Provider | Default model | Notes |
|---|---|---|---|
gemini | Google Gemini | gemini-3.1-flash-lite-preview | Platform default — also the fallback wildcard for post-processing |
openai | OpenAI | gpt-5.5 | 5.4 / 5 / mini / nano in the same family for fallback |
xai | xAI (Grok) | grok-4-1-fast-non-reasoning | Reasoning + non-reasoning Grok 4 / 4.20 variants |
custom | Bring-your-own LLM | — | Point Sonzai at any OpenAI-compatible endpoint — see Custom LLM |
The sonzai.providers module exports these IDs as constants — import
them rather than hand-typing strings, so the IDs stay in sync as the
catalog evolves. client.list_models() returns the live set enabled on
your tenant for runtime model-picker UIs.
Internal fallback
The platform also speaks openrouter for its own internal failover
paths. Customers don't pick openrouter directly today; Sonzai handles
failover on its side when the primary provider quota is exhausted.
BYOK — bring your own key
Use Sonzai's hosted infrastructure but bill provider tokens to your own account. Drop a key per provider against your project; subsequent requests on that project route through your key for the matching provider. Keys are encrypted at rest and never echoed back through the API.
BYOK setup → /docs/en/models/byok
How a model gets picked
For both chat and post-processing, Sonzai walks a five-layer cascade. First non-empty hit wins.
- Per-call —
provider/modelonagents.chat,agents.process, orsessions.start - Per-agent —
AgentProfile.ModelConfig(chat) andAgentProfile.PostProcessingProvider/Model(post-processing) - Per-project —
project_config.post_processing_model_map(post-processing); chat defaults are typically agent-level - Per-account / tenant —
account_config.post_processing_model_map(post-processing) - System default —
gemini-3.1-flash-lite-previewfor both
Read the full layer-by-layer rules at Model scope.