Model scope
Where the chat model and the post-processing model are configured — tenant, project, agent, session, and per-call — and which layer wins.
A Sonzai chat turn picks two models: the chat-completion model the user sees, and the post-processing model that runs the background work afterwards. Each goes through its own resolver cascade. The cascades share the same scope hierarchy:
1. per-call (highest precedence — passed to agents.chat / sessions.start / agents.process)
2. per-agent (AgentProfile fields)
3. per-project (project_config rows in CockroachDB)
4. per-account/tenant (account_config rows in CockroachDB)
5. system default (Go constant compiled into the binary)First non-empty layer wins. Layer 5 always exists, so resolution always produces a concrete answer.
Chat model
What the user sees. Resolved per chat turn.
| Layer | Where it lives | Set with |
|---|---|---|
| Per-call | provider / model arg on agents.chat, agents.chat_stream, agents.process, sessions.start | the SDK call itself |
| Per-agent | AgentProfile.ModelConfig.{provider,model} | client.agents.update(agent_id, model_config={...}) |
| Per-project | Default model for an unconfigured agent in a project | Project settings on the dashboard or client.providers.set(project_id, ...) |
| Per-account / tenant | Org-wide default | (admin endpoint, see Reference) |
| System default | gemini-3.1-flash-lite-preview (providers.DEFAULT_MODEL) | constant, not configurable |
Setting at each layer
// Per-call: pin a single chat call
await client.agents.chat({
agent: "agent_abc",
provider: "openai", model: "gpt-5.5",
messages: [{ role: "user", content: "Hello" }],
});
// Per-session: set defaults that every session.turn() inherits
const session = await client.agents.sessions.start("agent_abc", {
userId: "user_123",
sessionId: "session_abc",
provider: "xai",
model: "grok-4-1-fast-non-reasoning",
});
// Per-agent: persist on the AgentProfile
await client.agents.update("agent_abc", {
modelConfig: { provider: "gemini", model: "gemini-3.1-pro-preview" },
});
// Per-project: chat default for unpinned agents in this project
await client.providers.set("project_xyz", { provider: "gemini", model: "gemini-3.1-flash-lite-preview" });Post-processing model
The cheaper-model fleet that runs the batch work behind every turn: fact extraction, dedup, mood updates, personality drift, summarisation, diary, constellation. Resolved per task, per turn, independently of the chat model.
The cascade is documented exhaustively at Post-processing model map. The short version:
| Layer | Where it lives |
|---|---|
| Per-agent | AgentProfile.PostProcessingProvider + PostProcessingModel (direct override; bypasses the cascade entirely when both set) |
| Per-project | project_config.post_processing_model_map JSONB — chat_model → {provider, model} map |
| Per-account / tenant | account_config.post_processing_model_map JSONB — same shape |
| System default | Go constant, ships with the binary |
| Wildcard fallback | gemini-3.1-flash-lite-preview |
The map keys are chat-completion model IDs (or * for wildcard), so
post-processing routing depends on what chat model just ran.
Setting at each layer
// Per-agent — direct override that bypasses the rest of the cascade
await client.agents.updatePostProcessingModel("agent_abc", {
post_processing_provider: "gemini",
post_processing_model: "gemini-3.1-flash-lite-preview",
});
// Per-project — full chat→post map (write-through replacement)
await client.projectConfig.setPostProcessingModelMap("project_xyz", {
"claude-opus-4.6": { provider: "openrouter", model: "anthropic/claude-haiku-4.5" },
"*": { provider: "gemini", model: "gemini-3.1-flash-lite-preview" },
});
// Per-account/tenant
await client.accountConfig.setPostProcessingModelMap({
"*": { provider: "gemini", model: "gemini-3.1-flash-lite-preview" },
});Previewing the resolved model
For UI ("which model would run my diary tonight?") you can ask the resolver what it would pick without firing inference:
const effective = await client.agents.effectivePostProcessingModel("agent_abc", {
chatModel: "claude-opus-4.6",
taskType: "fact_extraction",
});
console.log(effective.provider, effective.model);Common patterns
- One frontier model per agent, one cheap extractor per project. Set
agent
ModelConfigto your premium model; set the project post-processing map's*wildcard togemini/gemini-3.1-flash-lite-preview. - A/B test extractors. Two projects, same agents, different
account_config.post_processing_model_mapentries — compare quality on the same traffic. - Per-tenant pricing tiers. Free tier defaults the post-processing map to flash-lite at the tenant level; paid tier overrides per-project to a stronger extractor.
- One-off override. Pass
provider/modelon a singleagents.chatcall without persisting anything.
Reference
- Providers — provider IDs and model lists.
- BYOK — bring your own provider key per project.
- Post-processing — full cascade rules + the system-default map.
- Reference → API — REST shape for every endpoint above.