Skip to main content

Providers

The four supported chat-completion providers — gemini, openai, xai, custom — with model IDs, context windows, and how to pick one at runtime.

Sonzai routes chat completions through one of four providers. The IDs are exported as constants from the sonzai.providers module in the SDKs — import those rather than hand-typing strings, so they stay in sync as the catalog evolves. Use client.list_models() for the live set enabled on your tenant at runtime.

gemini — Google Gemini (default)

The platform default. gemini-3.1-flash-lite-preview is providers.DEFAULT_MODEL, and is also the wildcard fallback for the post-processing cascade.

ModelContext windowNotes
gemini-3.1-flash-lite-preview1MDefault. Vision + tools + JSON mode + streaming. Compaction at 450k / 500k.
gemini-3-flash-preview2MFallback on 429. Same feature set.
gemini-3.1-pro-preview2MFallback on 429. Strongest Gemini model — pair with a cheaper post-processing entry.

openai — OpenAI

Default gpt-5.5; the 5.4 family is the cheaper workhorse and 5 / 5-mini / 5-nano cover even cheaper or smaller-context tiers. The fallback chain on quota exhaustion is gpt-5.5 → gpt-5.4 → gpt-5.4-mini → gpt-5.

ModelContext windowUse it when
gpt-5.51.05MDefault. The current OpenAI frontier — vision + tools + streaming + JSON mode.
gpt-5.41.05MCheaper than 5.5, same context window.
gpt-5.4-mini1.05MThe cheap workhorse. Recommended for high-throughput tenants.
gpt-5400kFrozen Aug-2025 snapshot. Kept for tenants pinned to it; new agents should default to 5.5.
gpt-5-mini / gpt-5-nano400kSmaller-context tiers; same generation as gpt-5.

xai — xAI (Grok)

Reasoning and non-reasoning variants in the Grok 4 family. grok-4-1-fast-non-reasoning is the default; reasoning models are opt-in for tasks that benefit from deeper chain-of-thought.

ModelContext windowReasoning
grok-4-1-fast-non-reasoning2MNo
grok-4-1-fast-reasoning2MYes
grok-4.20-0309-non-reasoning2MNo
grok-4.20-0309-reasoning2MYes

All Grok 4 entries support streaming, tools, and JSON mode. None support vision today.

custom — bring-your-own-LLM (BYOM)

Point Sonzai at any OpenAI-compatible chat-completions endpoint. The Mind Layer keeps owning memory, personality, mood, and post-processing — only the chat-completion call gets routed through your endpoint.

See Custom LLM for the full setup. This is distinct from BYOK — BYOK uses Sonzai's provider integrations but with your billing key; BYOM uses your own inference stack entirely.

Picking a provider in code

Pass provider and model on the chat call. Both are optional — omit them and Sonzai uses the agent's default, falling back through the scope cascade.

import { Sonzai } from "@sonzai-labs/agents";

const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

await client.agents.chat({
agent:    "agent_abc",
messages: [{ role: "user", content: "Hello" }],
provider: "openai",
model:    "gpt-5.5",
});

Listing what's available at runtime

client.list_models() (Python / TS / Go expose the same shape) returns the live set of providers and models enabled on your tenant — useful for building a model-picker UI or for asserting that a provider you depend on is wired up before a deploy.

const result = await client.listModels();
for (const p of result.providers) {
console.log(p.provider, p.models.map((m) => m.id));
}

Reference

  • BYOK — drop your own provider keys per project.
  • Custom LLM — point Sonzai at your own endpoint entirely.
  • Model scope — how provider / model is resolved per call.
  • Post-processing — what runs in the background, on what model.

On this page