Skip to main content

Model scope

Where the chat model and the post-processing model are configured — tenant, project, agent, session, and per-call — and which layer wins.

A Sonzai chat turn picks two models: the chat-completion model the user sees, and the post-processing model that runs the background work afterwards. Each goes through its own resolver cascade. The cascades share the same scope hierarchy:

1. per-call            (highest precedence — passed to agents.chat / sessions.start / agents.process)
2. per-agent           (AgentProfile fields)
3. per-project         (project_config rows in CockroachDB)
4. per-account/tenant  (account_config rows in CockroachDB)
5. system default      (Go constant compiled into the binary)

First non-empty layer wins. Layer 5 always exists, so resolution always produces a concrete answer.

Chat model

What the user sees. Resolved per chat turn.

LayerWhere it livesSet with
Per-callprovider / model arg on agents.chat, agents.chat_stream, agents.process, sessions.startthe SDK call itself
Per-agentAgentProfile.ModelConfig.{provider,model}client.agents.update(agent_id, model_config={...})
Per-projectDefault model for an unconfigured agent in a projectProject settings on the dashboard or client.providers.set(project_id, ...)
Per-account / tenantOrg-wide default(admin endpoint, see Reference)
System defaultgemini-3.1-flash-lite-preview (providers.DEFAULT_MODEL)constant, not configurable

Setting at each layer

// Per-call: pin a single chat call
await client.agents.chat({
agent:    "agent_abc",
provider: "openai", model: "gpt-5.5",
messages: [{ role: "user", content: "Hello" }],
});

// Per-session: set defaults that every session.turn() inherits
const session = await client.agents.sessions.start("agent_abc", {
userId:   "user_123",
sessionId: "session_abc",
provider: "xai",
model:    "grok-4-1-fast-non-reasoning",
});

// Per-agent: persist on the AgentProfile
await client.agents.update("agent_abc", {
modelConfig: { provider: "gemini", model: "gemini-3.1-pro-preview" },
});

// Per-project: chat default for unpinned agents in this project
await client.providers.set("project_xyz", { provider: "gemini", model: "gemini-3.1-flash-lite-preview" });

Post-processing model

The cheaper-model fleet that runs the batch work behind every turn: fact extraction, dedup, mood updates, personality drift, summarisation, diary, constellation. Resolved per task, per turn, independently of the chat model.

The cascade is documented exhaustively at Post-processing model map. The short version:

LayerWhere it lives
Per-agentAgentProfile.PostProcessingProvider + PostProcessingModel (direct override; bypasses the cascade entirely when both set)
Per-projectproject_config.post_processing_model_map JSONB — chat_model → {provider, model} map
Per-account / tenantaccount_config.post_processing_model_map JSONB — same shape
System defaultGo constant, ships with the binary
Wildcard fallbackgemini-3.1-flash-lite-preview

The map keys are chat-completion model IDs (or * for wildcard), so post-processing routing depends on what chat model just ran.

Setting at each layer

// Per-agent — direct override that bypasses the rest of the cascade
await client.agents.updatePostProcessingModel("agent_abc", {
post_processing_provider: "gemini",
post_processing_model:    "gemini-3.1-flash-lite-preview",
});

// Per-project — full chat→post map (write-through replacement)
await client.projectConfig.setPostProcessingModelMap("project_xyz", {
"claude-opus-4.6": { provider: "openrouter", model: "anthropic/claude-haiku-4.5" },
"*":               { provider: "gemini",     model: "gemini-3.1-flash-lite-preview" },
});

// Per-account/tenant
await client.accountConfig.setPostProcessingModelMap({
"*": { provider: "gemini", model: "gemini-3.1-flash-lite-preview" },
});

Previewing the resolved model

For UI ("which model would run my diary tonight?") you can ask the resolver what it would pick without firing inference:

const effective = await client.agents.effectivePostProcessingModel("agent_abc", {
chatModel: "claude-opus-4.6",
taskType:  "fact_extraction",
});
console.log(effective.provider, effective.model);

Common patterns

  • One frontier model per agent, one cheap extractor per project. Set agent ModelConfig to your premium model; set the project post-processing map's * wildcard to gemini/gemini-3.1-flash-lite-preview.
  • A/B test extractors. Two projects, same agents, different account_config.post_processing_model_map entries — compare quality on the same traffic.
  • Per-tenant pricing tiers. Free tier defaults the post-processing map to flash-lite at the tenant level; paid tier overrides per-project to a stronger extractor.
  • One-off override. Pass provider/model on a single agents.chat call without persisting anything.

Reference

On this page