Skip to main content

Models

How Sonzai picks a model for chat and a (different) model for the background post-processing — providers, BYOK, and the scope cascade.

Every Sonzai chat turn fans out into two model calls:

  • Chat completion — what the user sees, streamed back live. Pick this for personality and quality.
  • Post-processing — the latency-insensitive batch work that runs after the reply ships: fact extraction, deduplication, mood updates, personality drift, summarisation, diary, constellation. Pick this for cost and throughput.

The two are configured independently. A frontier chat model can pair with a cheap flash-lite extractor, and Sonzai resolves both per call through a five-layer cascade that lets you override at agent, project, account (tenant), and session scope.

Supported providers

IDProviderDefault modelNotes
geminiGoogle Geminigemini-3.1-flash-lite-previewPlatform default — also the fallback wildcard for post-processing
openaiOpenAIgpt-5.55.4 / 5 / mini / nano in the same family for fallback
xaixAI (Grok)grok-4-1-fast-non-reasoningReasoning + non-reasoning Grok 4 / 4.20 variants
customBring-your-own LLMPoint Sonzai at any OpenAI-compatible endpoint — see Custom LLM

The sonzai.providers module exports these IDs as constants — import them rather than hand-typing strings, so the IDs stay in sync as the catalog evolves. client.list_models() returns the live set enabled on your tenant for runtime model-picker UIs.

Internal fallback

The platform also speaks openrouter for its own internal failover paths. Customers don't pick openrouter directly today; Sonzai handles failover on its side when the primary provider quota is exhausted.

BYOK — bring your own key

Use Sonzai's hosted infrastructure but bill provider tokens to your own account. Drop a key per provider against your project; subsequent requests on that project route through your key for the matching provider. Keys are encrypted at rest and never echoed back through the API.

BYOK setup → /docs/en/models/byok

How a model gets picked

For both chat and post-processing, Sonzai walks a five-layer cascade. First non-empty hit wins.

  1. Per-callprovider / model on agents.chat, agents.process, or sessions.start
  2. Per-agentAgentProfile.ModelConfig (chat) and AgentProfile.PostProcessingProvider/Model (post-processing)
  3. Per-projectproject_config.post_processing_model_map (post-processing); chat defaults are typically agent-level
  4. Per-account / tenantaccount_config.post_processing_model_map (post-processing)
  5. System defaultgemini-3.1-flash-lite-preview for both

Read the full layer-by-layer rules at Model scope.

Pages

On this page