How Sonzai picks a model for chat and a (different) model for the background post-processing — providers, BYOK, and the scope cascade.

Every Sonzai chat turn fans out into two model calls:

Chat completion — what the user sees, streamed back live. Pick this for personality and quality.
Post-processing — the latency-insensitive batch work that runs after the reply ships: fact extraction, deduplication, mood updates, personality drift, summarisation, diary, constellation. Pick this for cost and throughput.

The two are configured independently. A frontier chat model can pair with a cheap flash-lite extractor, and Sonzai resolves both per call through a five-layer cascade that lets you override at agent, project, account (tenant), and session scope.

Supported providers

ID	Provider	Default model	Notes
`gemini`	Google Gemini	`gemini-3.1-flash-lite-preview`	Platform default — also the fallback wildcard for post-processing
`openai`	OpenAI	`gpt-5.5`	5.4 / 5 / mini / nano in the same family for fallback
`xai`	xAI (Grok)	`grok-4-1-fast-non-reasoning`	Reasoning + non-reasoning Grok 4 / 4.20 variants
`custom`	Bring-your-own LLM	—	Point Sonzai at any OpenAI-compatible endpoint — see Custom LLM

The sonzai.providers module exports these IDs as constants — import them rather than hand-typing strings, so the IDs stay in sync as the catalog evolves. client.list_models() returns the live set enabled on your tenant for runtime model-picker UIs.

Internal fallback

The platform also speaks openrouter for its own internal failover paths. Customers don't pick openrouter directly today; Sonzai handles failover on its side when the primary provider quota is exhausted.

BYOK — bring your own key

Use Sonzai's hosted infrastructure but bill provider tokens to your own account. Drop a key per provider against your project; subsequent requests on that project route through your key for the matching provider. Keys are encrypted at rest and never echoed back through the API.

BYOK setup → /docs/en/models/byok

How a model gets picked

For both chat and post-processing, Sonzai walks a five-layer cascade. First non-empty hit wins.

Per-call — provider / model on agents.chat, agents.process, or sessions.start
Per-agent — AgentProfile.ModelConfig (chat) and AgentProfile.PostProcessingProvider/Model (post-processing)
Per-project — project_config.post_processing_model_map (post-processing); chat defaults are typically agent-level
Per-account / tenant — account_config.post_processing_model_map (post-processing)
System default — gemini-3.1-flash-lite-preview for both