Where the chat model and the post-processing model are configured — tenant, project, agent, session, and per-call — and which layer wins.

A Sonzai chat turn picks two models: the chat-completion model the user sees, and the post-processing model that runs the background work afterwards. Each goes through its own resolver cascade. The cascades share the same scope hierarchy:

1. per-call            (highest precedence — passed to agents.chat / sessions.start / agents.process)
2. per-agent           (AgentProfile fields)
3. per-project         (project_config rows in CockroachDB)
4. per-account/tenant  (account_config rows in CockroachDB)
5. system default      (Go constant compiled into the binary)

First non-empty layer wins. Layer 5 always exists, so resolution always produces a concrete answer.

Chat model

What the user sees. Resolved per chat turn.

Layer	Where it lives	Set with
Per-call	`provider` / `model` arg on `agents.chat`, `agents.chat_stream`, `agents.process`, `sessions.start`	the SDK call itself
Per-agent	`AgentProfile.ModelConfig.{provider,model}`	`client.agents.update(agent_id, model_config={...})`
Per-project	Default model for an unconfigured agent in a project	Project settings on the dashboard or `client.providers.set(project_id, ...)`
Per-account / tenant	Org-wide default	(admin endpoint, see Reference)
System default	`gemini-3.1-flash-lite-preview` (`providers.DEFAULT_MODEL`)	constant, not configurable

Setting at each layer

// Per-call: pin a single chat call
await client.agents.chat({
agent:    "agent_abc",
provider: "openai", model: "gpt-5.5",
messages: [{ role: "user", content: "Hello" }],
});

// Per-session: set defaults that every session.turn() inherits
const session = await client.agents.sessions.start("agent_abc", {
userId:   "user_123",
sessionId: "session_abc",
provider: "xai",
model:    "grok-4-1-fast-non-reasoning",
});

// Per-agent: persist on the AgentProfile
await client.agents.update("agent_abc", {
modelConfig: { provider: "gemini", model: "gemini-3.1-pro-preview" },
});

// Per-project: chat default for unpinned agents in this project
await client.providers.set("project_xyz", { provider: "gemini", model: "gemini-3.1-flash-lite-preview" });

Post-processing model

The cheaper-model fleet that runs the batch work behind every turn: fact extraction, dedup, mood updates, personality drift, summarisation, diary, constellation. Resolved per task, per turn, independently of the chat model.

The cascade is documented exhaustively at Post-processing model map. The short version:

Layer	Where it lives
Per-agent	`AgentProfile.PostProcessingProvider` + `PostProcessingModel` (direct override; bypasses the cascade entirely when both set)
Per-project	`project_config.post_processing_model_map` JSONB — `chat_model → {provider, model}` map
Per-account / tenant	`account_config.post_processing_model_map` JSONB — same shape
System default	Go constant, ships with the binary
Wildcard fallback	`gemini-3.1-flash-lite-preview`

The map keys are chat-completion model IDs (or * for wildcard), so post-processing routing depends on what chat model just ran.

Setting at each layer

// Per-agent — direct override that bypasses the rest of the cascade
await client.agents.updatePostProcessingModel("agent_abc", {
post_processing_provider: "gemini",
post_processing_model:    "gemini-3.1-flash-lite-preview",
});

// Per-project — full chat→post map (write-through replacement)
await client.projectConfig.setPostProcessingModelMap("project_xyz", {
"claude-opus-4.6": { provider: "openrouter", model: "anthropic/claude-haiku-4.5" },
"*":               { provider: "gemini",     model: "gemini-3.1-flash-lite-preview" },
});

// Per-account/tenant
await client.accountConfig.setPostProcessingModelMap({
"*": { provider: "gemini", model: "gemini-3.1-flash-lite-preview" },
});

Previewing the resolved model

For UI ("which model would run my diary tonight?") you can ask the resolver what it would pick without firing inference:

const effective = await client.agents.effectivePostProcessingModel("agent_abc", {
chatModel: "claude-opus-4.6",
taskType:  "fact_extraction",
});
console.log(effective.provider, effective.model);

Common patterns

One frontier model per agent, one cheap extractor per project. Set agent ModelConfig to your premium model; set the project post-processing map's * wildcard to gemini/gemini-3.1-flash-lite-preview.
A/B test extractors. Two projects, same agents, different account_config.post_processing_model_map entries — compare quality on the same traffic.
Per-tenant pricing tiers. Free tier defaults the post-processing map to flash-lite at the tenant level; paid tier overrides per-project to a stronger extractor.
One-off override. Pass provider/model on a single agents.chat call without persisting anything.

Reference

Providers — provider IDs and model lists.
BYOK — bring your own provider key per project.
Post-processing — full cascade rules + the system-default map.
Reference → API — REST shape for every endpoint above.

Model scope

Chat model

Setting at each layer

Post-processing model

Setting at each layer

Previewing the resolved model

Common patterns

Reference

On this page