Providers
The four supported chat-completion providers — gemini, openai, xai, custom — with model IDs, context windows, and how to pick one at runtime.
Sonzai routes chat completions through one of four providers. The IDs
are exported as constants from the sonzai.providers module in the
SDKs — import those rather than hand-typing strings, so they stay in
sync as the catalog evolves. Use client.list_models() for the live
set enabled on your tenant at runtime.
gemini — Google Gemini (default)
The platform default. gemini-3.1-flash-lite-preview is providers.DEFAULT_MODEL,
and is also the wildcard fallback for the post-processing
cascade.
| Model | Context window | Notes |
|---|---|---|
gemini-3.1-flash-lite-preview | 1M | Default. Vision + tools + JSON mode + streaming. Compaction at 450k / 500k. |
gemini-3-flash-preview | 2M | Fallback on 429. Same feature set. |
gemini-3.1-pro-preview | 2M | Fallback on 429. Strongest Gemini model — pair with a cheaper post-processing entry. |
openai — OpenAI
Default gpt-5.5; the 5.4 family is the cheaper workhorse and 5 / 5-mini /
5-nano cover even cheaper or smaller-context tiers. The fallback chain on
quota exhaustion is gpt-5.5 → gpt-5.4 → gpt-5.4-mini → gpt-5.
| Model | Context window | Use it when |
|---|---|---|
gpt-5.5 | 1.05M | Default. The current OpenAI frontier — vision + tools + streaming + JSON mode. |
gpt-5.4 | 1.05M | Cheaper than 5.5, same context window. |
gpt-5.4-mini | 1.05M | The cheap workhorse. Recommended for high-throughput tenants. |
gpt-5 | 400k | Frozen Aug-2025 snapshot. Kept for tenants pinned to it; new agents should default to 5.5. |
gpt-5-mini / gpt-5-nano | 400k | Smaller-context tiers; same generation as gpt-5. |
xai — xAI (Grok)
Reasoning and non-reasoning variants in the Grok 4 family.
grok-4-1-fast-non-reasoning is the default; reasoning models are
opt-in for tasks that benefit from deeper chain-of-thought.
| Model | Context window | Reasoning |
|---|---|---|
grok-4-1-fast-non-reasoning | 2M | No |
grok-4-1-fast-reasoning | 2M | Yes |
grok-4.20-0309-non-reasoning | 2M | No |
grok-4.20-0309-reasoning | 2M | Yes |
All Grok 4 entries support streaming, tools, and JSON mode. None support vision today.
custom — bring-your-own-LLM (BYOM)
Point Sonzai at any OpenAI-compatible chat-completions endpoint. The Mind Layer keeps owning memory, personality, mood, and post-processing — only the chat-completion call gets routed through your endpoint.
See Custom LLM for the full setup. This is distinct from BYOK — BYOK uses Sonzai's provider integrations but with your billing key; BYOM uses your own inference stack entirely.
Picking a provider in code
Pass provider and model on the chat call. Both are optional — omit
them and Sonzai uses the agent's default, falling back through the
scope cascade.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
await client.agents.chat({
agent: "agent_abc",
messages: [{ role: "user", content: "Hello" }],
provider: "openai",
model: "gpt-5.5",
});Listing what's available at runtime
client.list_models() (Python / TS / Go expose the same shape) returns
the live set of providers and models enabled on your tenant — useful for
building a model-picker UI or for asserting that a provider you depend on
is wired up before a deploy.
const result = await client.listModels();
for (const p of result.providers) {
console.log(p.provider, p.models.map((m) => m.id));
}Reference
- BYOK — drop your own provider keys per project.
- Custom LLM — point Sonzai at your own endpoint entirely.
- Model scope — how
provider/modelis resolved per call. - Post-processing — what runs in the background, on what model.