Advance Time compresses real-world time into simulated time for an agent. Useful for character AI that needs in-game time to pass faster than real time, game loops that simulate days of agent state in seconds, or anywhere you want to see what the agent would be like after a period of elapsed time — without actually waiting for it.
Character AI / visual novel time skips — the protagonist sleeps for 8 hours; advance agent time by 8 hours and get the diary entry and mood changes that would have happened overnight
Tamagotchi and life-sim game loops — in-game days pass faster than real time; call advanceTime each tick to keep agent state (mood, memory, habits) in sync with the game clock
Tutorial onboarding — show a new user what their companion will "remember" after a week by fast-forwarding through a sample history before they send their first real message
Deterministic replay — reproduce the exact agent state after X hours at any time, for debugging, snapshotting, or building a save/load system
Eval and benchmarking — compress long-running scenarios into fast test runs (see Also useful for evaluation below)
A single advanceTime call runs the full production background worker fleet for each complete 24-hour day in the window, then resolves any proactive wakeups due within it. Concretely:
Diary generation — one diary entry per simulated day, written from the agent's perspective
Mood decay — emotional state drifts toward the agent's baseline at the rate it would in real time
Memory consolidation — facts, events, and commitments are consolidated and deduplicated as they normally would be overnight
Constellation extraction — personality signals extracted from conversation history are processed on schedule
Scheduled wakeups — any wakeup whose scheduled_at falls inside the advance window fires with its intent
Pass simulatedHours: 25 (one day plus a sliver) when you need the weekly consolidation gate to tick over.
Given the same agent state at the start, the same advanceTime call produces the same output. There is no randomness seeded from wall-clock time. This makes Advance Time suitable for save/load, replay, and regression testing.
For advances that would exceed a proxy read timeout (Cloudflare's limit is ~100 s, which corresponds to roughly 4–5 simulated days depending on agent complexity), pass runAsync: true. The API returns immediately with a job descriptor; poll getAdvanceTimeJob until the status is terminal.
// Kick off a long advance asynchronously
const job = await client.workbench.advanceTime({
agentId: "agent_abc",
userId: "user_123",
simulatedHours: 168, // one week
runAsync: true,
}) as { job_id: string; status: string };
console.log(job.job_id, job.status); // "job_01HX...", "running"
// Poll until done (30-minute TTL in Redis)
let state = await client.workbench.getAdvanceTimeJob(job.job_id);
while (state.status === "running") {
await new Promise(r => setTimeout(r, 2000));
state = await client.workbench.getAdvanceTimeJob(job.job_id);
}
console.log(state.status); // "succeeded"
console.log(state.result); // full AdvanceTimeResponse
The smallest meaningful unit is one full 24-hour simulated day. Background jobs (diary, consolidation, constellation) run once per day. Sub-day advances (e.g. simulatedHours: 8) still process wakeups and mood decay but will not generate a diary entry unless a full day boundary is crossed.
Any schedule whose next_fire_at falls within the advance window fires automatically. Advance 48 hours and two daily reminders will have fired — their intents processed, messages generated, and state updated — exactly as if real time had passed.
// Create a daily 09:00 reminderawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "UTC" }, intent: "check in on how the user is feeling", check_type: "reminder",});// Advance 48 hours — both 09:00 fires trigger inside the windowconst result = await client.workbench.advanceTime({ agentId: "agent_abc", userId: "user_123", simulatedHours: 48,});console.log(result.wakeups_fired); // 2
When time advances, a diary entry is generated for each simulated day. The agent "remembers" what happened during the gap — emotional tone, recurring themes, relationship developments — the same way it would after real days of conversation. Use this to give a new user a companion that already feels lived-in, or to let a character "grow" between chapters of a story.
Any wakeup scheduled with a scheduled_at inside the advance window fires during the advance, including its LLM-generated proactive message. This lets you test wakeup copy and timing without waiting for the real clock to reach the fire time.
Advance Time is a primitive that chains with scheduled reminders, wakeups, and memory. There is no standalone end-to-end tutorial yet. See the linked Mind Layer pages below for how it combines with other features.
If you are running a benchmark suite, advanceTime lets you compress long-running scenarios into fast test runs. Advance a simulated week in seconds, inspect the diary entries and mood state, then score the result. Pair with the evaluation workflow to measure agent behavior quality after arbitrary amounts of simulated elapsed time.
KNOWLEDGE
Agent Insights
As the agent talks to a user over time, it builds up a derived view of who they are — what they care about, what they're working toward, who's in their life, and how their mood trends. Agent Insights exposes that derived state as readable (and for some signals, writable) endpoints. These are not things you author; the context engine extracts them automatically from conversations.
Automatic — no setup required
All insight signals are produced by the context engine during and after each conversation. You do not need to call any write endpoint to populate them — they fill in on their own. The read endpoints on this page let you surface what the agent has learned.
Derived, not authored. These signals are extracted from conversation text by the context engine. You do not push them in; the agent surfaces them automatically as it talks.
Per-instance scoping. Pass instanceId (TS/Python) or instanceID (Go) to filter results to a specific agent instance — useful when an agent is deployed in multiple scenarios or chat contexts for the same user.
Write endpoints for some signals. Goals and habits can be explicitly created, updated, or deleted when your application needs to drive a specific state (e.g., seeding a goal when a user starts onboarding, or marking a goal achieved after a purchase event). Interests, relationships, diary, constellation, and breakthroughs are read-only.
Read latency. Derived signals update at conversation turn-end, not in real time during a turn. Reads immediately after a chat call may not yet reflect the latest turn.
Habits are recurring behaviors the context engine detects across conversations — things like "user meditates in the morning" or "user reviews their tasks every Sunday." Each habit has a strength (0-1) that rises with observations and a formed flag that is set once the habit is considered stable.
const habits = await client.agents.listHabits("agent_abc", {
userId: "user_123",
});
for (const h of habits.habits) {
console.log(h.name, h.category, h.strength, h.formed);
}
Goals represent what the user is working toward. They are extracted automatically from conversation intent — "I want to run a 5K by June" becomes a goal with a type, title, and priority. Goals have a status field: active, achieved, or abandoned.
// Read
const goals = await client.agents.listGoals("agent_abc", { userId: "user_123" });
for (const g of goals.goals) {
console.log(g.title, g.status, g.priority);
}
// Seed a goal for a new workflow
const goal = await client.agents.createGoal("agent_abc", {
userId: "user_123",
title: "Complete onboarding",
description: "Finish all onboarding steps",
type: "task",
priority: 1,
});
// Mark achieved after a business event
await client.agents.updateGoal("agent_abc", goal.goal_id, {
userId: "user_123",
status: "achieved",
});
Interests are topics and themes the context engine identifies as meaningful to the user — things like "machine learning", "hiking", or "Italian cooking." Unlike goals, interests have no lifecycle status; they accumulate over time.
const interests = await client.agents.getInterests("agent_abc", {
userId: "user_123",
});
for (const i of interests.interests) {
console.log(i.topic, i.category);
}
Relationships are the people the user mentions across conversations — friends, family, colleagues, and others the agent has learned about. Each entry includes the person's name, their relationship to the user, and any context the agent has collected.
const rel = await client.agents.getRelationships("agent_abc", {
userId: "user_123",
});
for (const r of rel.relationships) {
console.log(r.name, r.relationship_type, r.context);
}
The diary contains agent-authored entries written at session end — reflections on what happened, what was learned, and how the relationship is evolving. Each entry is anchored to a session and a timestamp. Diary entries are the richest narrative signal available.
const diary = await client.agents.getDiary("agent_abc", {
userId: "user_123",
});
for (const entry of diary.entries) {
console.log(entry.created_at, entry.content);
}
The constellation is the agent's knowledge graph for a user — a set of nodes (concepts, people, themes) and edges (relationships between them) that the context engine builds from recurring patterns across memory. Nodes have a significance score and a node_type.
const c = await client.agents.getConstellation("agent_abc", {
userId: "user_123",
});
for (const node of c.nodes) {
console.log(node.label, node.node_type, node.significance);
}
Breakthroughs are significant relationship or emotional milestones detected by the platform — moments where the agent's understanding of the user meaningfully deepened, or where a notable shift in the relationship dynamic was recorded.
const bt = await client.agents.listBreakthroughs("agent_abc", {
userId: "user_123",
});
for (const b of bt.items) {
console.log(b.type, b.description, b.timestamp);
}
With Memory — insights are summaries over raw facts
Insight signals are derived summaries; the underlying evidence lives in memory. Fetch habits to learn what patterns exist, then use memory.search to pull the raw conversation facts behind one of them.
const habits = await client.agents.listHabits("agent_abc", { userId: "user_123" });
const topHabit = habits.habits[0];
// Find the raw memories that support this habit
const facts = await client.agents.memory.search("agent_abc", {
userId: "user_123",
query: topHabit.name,
limit: 10,
});
console.log(`Found ${facts.results.length} facts supporting "${topHabit.name}"`);
With Emotions — mood + insights for a full user picture
getMood and these insight endpoints together form the agent's complete understanding of a user at a point in time. Fetch both to power a user-facing "how the agent sees you" view or a support dashboard.
Advance Time fast-forwards the context engine's processing — generating new diary entries, decaying mood, and updating derived signals — without waiting real time. This is useful for simulating what the agent would know after a period of elapsed time, and for testing insight endpoints against a populated state.
// Advance 7 days to populate diary entries and update insightsconst result = await client.workbench.advanceTime({ agentId: "agent_abc", userId: "user_123", simulatedHours: 168,});// Now read the insights that formed during that windowconst diary = await client.agents.getDiary("agent_abc", { userId: "user_123" });console.log("Diary entries after 7d:", diary.entries.length);
Memory — the raw facts behind insights; use memory.search to drill into any signal
Emotions — mood, mood history, and aggregate mood statistics
Personality — Big5 traits and personality evolution (a different kind of derived state)
Advance Time — fast-forward the agent's processing to simulate elapsed time
START HERE
Architecture
The Mind Layer architecture separates agent intelligence from your application logic: your backend keeps owning auth, business state, and user data, while Sonzai owns personality, memory, mood, habits, and relationships behind a REST API. A single chat call assembles context, streams the AI response, and updates every internal state automatically — no extra orchestration calls. The most load-bearing thing to know: post-chat learning runs on Sonzai's side, so your backend never schedules consolidation, mood decay, or fact extraction.
The Mind Layer is a standalone platform that separates agent intelligence (personality, memory, mood) from your application logic. Any backend integrates via REST API or the official SDKs.
Your Backend Mind Layer Platform
| |
|--- Create Agent ---------------->|
|<-- Agent ID + Profile -----------|
| |
|--- Chat (SSE streaming) -------->|
| (messages + app context) |-- Build context
|<-- Streaming AI response --------|-- Stream AI response
| |-- Update memory, mood, personality
|<-- Proactive notifications -------| (automatic, no extra calls)
User-facing application. Sends messages to your backend and renders agent responses. Examples: React, Next.js, Vue, mobile app.
Your Backend
Handles auth, application state, user sessions, and business logic. Calls the Mind Layer via SDK, REST API, MCP, or OpenClaw plugin for AI interactions. Examples: Express, Django, Go, OpenClaw.
Sonzai Mind Layer
Owns agent intelligence: personality, memory, mood, habits, goals, and relationships. A single chat call handles context assembly, AI streaming, and post-chat learning. Examples: api.sonz.ai.
On each chat call, the platform automatically assembles relevant context from personality, memory, mood, and relationship data before generating the AI response. Post-chat state updates happen automatically — no extra API calls needed.
Context Assembly
Personality, mood, memories, relationship narrative, and application state — all assembled per request.
Memory Extraction
Facts, events, and commitments are extracted from each conversation and stored automatically.
Mood & Personality Evolution
Mood and Big5 personality drift naturally based on interaction patterns.
Proactive Notifications
Agents can schedule proactive outreach between sessions. Deliver via polling or webhook.
The platform doesn't just respond to chat calls — it runs a continuous background pipeline that keeps memory accurate, behavioral state coherent, and retrieval quality climbing over time. Every loop runs automatically; nothing for you to schedule or wire.
For a complete walk-through of every mechanism — including consolidation, reversible deduplication, boundary detection, personality drift safety caps, breakthroughs, and the cautious-rollout system — see How Agents Improve Over Time.
Custom Tools: create, list, delete (agent-level and session-level)
Notifications: list, consume, history
User Priming: primeUser, batchImport, getMetadata, updateMetadata
INTEGRATION PATTERNS
Pattern 1: Managed Agent Runtime
You point your app at client.agents.chat (or open an explicit session with
sessions.start → chat turns → sessions.end). Sonzai assembles the system
prompt from the agent's identity, recalls relevant memories, runs the LLM,
streams tokens back, executes any registered tools, and updates state — all
in a single call. You write the least code in this pattern. It is the
right default for chat companions, support agents, and anything where Sonzai
owning the full agent loop is acceptable.
The simplest complete flow: open an explicit session, drive a streaming
chat, end the session.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const AGENT_ID = "agent-uuid";
const USER_ID = "user-123";
const SESSION_ID = crypto.randomUUID();
// 1. Start an explicit session (optional — agents.chat will auto-create one
// if you don't, but explicit sessions let you scope tools and lifecycle).
await sonzai.agents.sessions.start(AGENT_ID, {
userId: USER_ID,
sessionId: SESSION_ID,
});
// 2. Drive turns. Sonzai owns context assembly, the LLM call, tool exec,
// and writeback. You stream the reply straight to your UI.
for await (const event of sonzai.agents.chatStream({
agent: AGENT_ID,
sessionId: SESSION_ID,
userId: USER_ID,
messages: [{ role: "user", content: "Hi! How's your day going?" }],
language: "en",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
// 3. End the session — triggers fact extraction + consolidation.
await sonzai.agents.sessions.end(AGENT_ID, {
userId: USER_ID,
sessionId: SESSION_ID,
totalMessages: 2,
});
Skip the explicit session
If you don't call sessions.start, Sonzai opens one on the first
agents.chat call and closes it on idle. The session ID still flows
through to extracted facts. Use the explicit lifecycle when you need
session-scoped tools, predictable boundaries, or replay semantics.
The Mind Layer ships a hosted Streamable HTTP MCP endpoint at
https://api.sonz.ai/mcp/memory/{agent_id}. Point any MCP-compatible
client at it with your Sonzai API key — 34 tools, 4 resources, and
3 guided prompts. No local binary, no SSE port, no Go toolchain.
You need a project API key from your
dashboard and an
agent ID. Pick your client below — pasting the snippet is the entire
setup.
# One-liner — registers the hosted MCP server with Claude Code:
claude mcp add --transport http sonzai \
https://api.sonz.ai/mcp/memory/AGENT_ID \
--header "Authorization: Bearer $SONZAI_API_KEY"
# Then from any Claude Code session you can say:
# "Chat with agent 'Luna' and say 'I had a great day hiking today!'"
# "Search Luna's memories about hiking adventures"
# "Use mind-layer-setup with assistant_name 'Aria' …"
Streamable HTTP, not SSE
The 2026 MCP spec marks Streamable HTTP as the canonical remote
transport. SSE is on a deprecation path across major clients — prefer
HTTP for any new integration.
Treat the API key like a password
The Bearer token is your project API key — it grants full access to
every agent in the project. Don't commit it to public repos; use
per-developer scopes when collaborating.
OpenClaw is an open-source
framework for building conversational AI agents through a slot-based
plugin system. The slot that decides what context goes into the system
prompt is called contextEngine. Installing
@sonzai-labs/openclaw-context registers the Sonzai context engine
under the name "sonzai" — assign it to the slot in openclaw.json and
every conversation flows through the Mind Layer with zero additional
code.
The OpenClaw plugin is JavaScript-only (OpenClaw itself is JS). The
Python and Go branches show the equivalent B2B provisioning flow:
deterministically derive an agent UUID and write the OpenClaw config —
the runtime that consumes it stays JS.
// 1. Install:
// openclaw plugins install @sonzai-labs/openclaw-context
// # or: npm install @sonzai-labs/openclaw-context
//
// 2. Run the setup wizard (interactive — asks for API key, agent name):
// npx @sonzai-labs/openclaw-context setup
//
// 3. The wizard writes openclaw.json:
// {
// "plugins": {
// "slots": { "contextEngine": "sonzai" },
// "entries": {
// "sonzai": {
// "enabled": true,
// "apiKey": "sk_your_api_key",
// "agentId": "a1b2c3d4-..."
// }
// }
// }
// }
//
// 4. Start chatting — Sonzai is now the contextEngine:
// openclaw chat
//
// For programmatic / B2B provisioning use the exported setup() helper:
import { setup } from "@sonzai-labs/openclaw-context";
const result = await setup({
apiKey: "sk_your_api_key",
agentName: "customer-support-bot",
configPath: "/path/to/openclaw.json",
});
console.log(result.agentId); // deterministic UUID — safe to re-run
console.log(result.written); // true — config file updated
Idempotent provisioning
Agent IDs are derived from SHA1(tenantID + agentName). Calling
setup() (or the Python/Go equivalent) multiple times for the same
tenant + name returns the same agent — safe to re-run on every
deploy.
You own the entire conversation. Sonzai never sees it in real time.
When the conversation ends — call wraps, support case closes, journaling
session finishes — you POST the transcript once and Sonzai's extractor
turns it into facts, mood updates, personality drift, habit detection,
and proactive-outreach signal. Best for tutoring, fitness, CRM, voice
calls, journaling, and any flow where Sonzai in the hot path is
undesirable or impossible.
The simplest path is /process: one call, auto-creates the session,
returns the generated session_id for correlation. Use the explicit
sessions.start → end({ messages }) lifecycle when you need
session-scoped tools, durations, or async polling.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function ingestTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string; tool_calls?: any[] }[],
) {
// One call. Auto-creates a session. Tool messages allowed.
const result = await sonzai.agents.process(agentId, {
userId,
messages: transcript,
provider: "gemini", // optional override
model: "gemini-3.1-flash-lite-preview", // optional override
});
// result.session_id is the auto-created session id.
// Pull extractions from the read endpoints when ready:
const memory = await sonzai.agents.memory.list(agentId, { userId });
const mood = await sonzai.agents.getMood(agentId, { userId });
return { sessionId: result.session_id, memory, mood };
}
Pick one trigger, not both
/process and sessions.end({ messages }) are functionally equivalent
for batch ingest — both extract facts and side effects from the full
transcript inline. Don't do both for the same transcript or
extraction runs twice. Use /process for the simple one-call shape.
Use sessions.start + sessions.end({ messages }) when you want
explicit lifecycle, async polling, or session-scoped tools.
What runs when
/process and sessions.end are intentionally lightweight: extract
facts and a session summary inline (one LLM call per chunk). The
expensive cross-session work (dedup, clustering, diary, decay) is
scheduled automatically by the platform — you don't pay for it on
every call.
You keep your existing chat loop. Before each LLM call, you ask Sonzai
for the enriched context for the user's message; after the LLM replies,
you submit just that exchange via session.turn(). Mood lands inline
(~300–500 ms). Deeper extraction — facts, personality drift, habit
detection, goal updates — runs asynchronously 5–15 seconds later in the
background. Sonzai never sees your tool execution and never picks
your model.
This is the right shape for chat companions, voice agents, agent
frameworks (OpenAI Agents SDK, LangChain, LiveKit), and anywhere you
already had a working LLM loop in production before adopting Sonzai.
The minimum viable loop with a real harness. The OpenAI Agents SDK owns
conversation state, model selection, and tool dispatch. Sonzai sits
outside that loop: it supplies the system prompt via
session.context() before the run, and ingests the finished exchange
via session.turn() after. No OPENAI_API_KEY needed — the Agents SDK
is pointed at Gemini's OpenAI-compat endpoint.
import os, uuid
from openai import AsyncOpenAI
from agents import Agent, Runner, OpenAIChatCompletionsModel, function_tool, set_tracing_disabled
from sonzai import Sonzai
set_tracing_disabled(True) # Agents SDK tries to ship traces to OpenAI; we don't have a key.
# Your LLM harness — owns history, tool dispatch, multi-step reasoning.
gemini = AsyncOpenAI(
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
api_key=os.environ["GEMINI_API_KEY"],
)
model = OpenAIChatCompletionsModel(model="gemini-3.1-flash-lite-preview", openai_client=gemini)
@function_tool
def get_current_time() -> str:
from datetime import datetime, timezone
return datetime.now(timezone.utc).isoformat(timespec="seconds")
# Sonzai = memory layer. Never sees the LLM client.
sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])
def run_conversation(agent_id: str, user_id: str):
session = sonzai.agents.sessions.start(
agent_id,
user_id=user_id,
session_id=f"session-{uuid.uuid4().hex[:8]}",
provider="gemini", # default for the deferred-extraction LLM
model="gemini-3.1-flash-lite-preview",
)
def turn(user_message: str) -> str:
# 1. Fresh, query-relevant context BEFORE the LLM call.
ctx = session.context(query=user_message)
# 2. Your harness runs the LLM + your tools. Sonzai is OUT of the loop.
agent = Agent(
name="Companion",
instructions=build_system_prompt(ctx),
model=model,
tools=[get_current_time],
)
result = Runner.run_sync(agent, user_message)
send_to_user(result.final_output)
# 3. Convert the run's items (assistant text + ToolCallItem +
# ToolCallOutputItem) into Sonzai's tool-aware shape so
# extraction can pick up facts from tool outputs too.
sonzai_messages = run_result_to_sonzai_messages(user_message, result)
# 4. Submit. Sync mood ~300ms; deferred extraction 5–15s later.
session.turn(messages=sonzai_messages)
return result.final_output
return turn, session.end
# /context returns a flat dict — read what you need, drop the rest.
def build_system_prompt(ctx: dict) -> str:
facts = "\n".join(f"- {f.get('atomic_text', '')}" for f in (ctx.get("loaded_facts") or []))
parts = [
ctx.get("personality_prompt", "You are a helpful AI companion."),
f"Personality (Big5): {ctx.get('big5', {})}",
f"Current mood: {ctx.get('current_mood', {})}",
]
if facts:
parts.append(f"Relevant memories:\n{facts}")
return "\n\n".join(parts)
The load-bearing habit
Always call session.context(query=user_msg)before the LLM call —
every turn. That's the closing-the-loop step. Skipping it means the
LLM works from stale state and the value of a memory layer collapses.
Save a roundtrip with fetchNextContext
session.turn() accepts fetchNextContext: { query: nextMessage }
(Python: fetch_next_context={"query": ...}). When set, the response
carries the next /context payload under next_context, so the
client already has turn N+1's context by the time turn N finishes.
Sonzai's /turn accepts OpenAI/Anthropic-style tool messages: tool_calls
on assistant messages and role: "tool" results. Forward the full
exchange and the extractor can capture facts that only surfaced inside a
tool output (e.g. "user's last order shipped from Tokyo" from an
order-lookup tool).
session.turn(messages=[ {"role": "user", "content": "Where did my last order ship from?"}, { "role": "assistant", "content": None, "tool_calls": [{ "id": "call_1", "type": "function", "function": {"name": "order-lookup", "arguments": "{}"}, }], }, {"role": "tool", "tool_call_id": "call_1", "content": '{"order_id":"42","origin":"Tokyo","carrier":"DHL"}'}, {"role": "assistant", "content": "Your last order shipped from Tokyo via DHL."},])
Sonzai never executes a tool — that's your harness's job. It just reads
the messages you submit. If you're on the OpenAI Agents SDK, see the
demo's run_result_to_sonzai_messages
helper — it converts a Runner result's MessageOutputItem /
ToolCallItem / ToolCallOutputItem items into this shape.
/turn accepts text content only. This is intentional, not a
limitation. Memory is a layer of semantic understanding — the
question Sonzai needs to answer later is "what does this agent know
about this user?", not "what bytes did the LLM see?". Your vision-capable
LLM has already understood the image; pass that understanding to Sonzai
as text, and the memory pipeline can extract facts, habits, and
inventory items from it like any other turn.
The recommended pattern: have your same multimodal LLM produce a short
factual description alongside its warm reply, and embed that
description in the user message you submit to session.turn().
# Your harness: Gemini sees the actual image bytes via input_image.result = await gemini.chat.completions.create( model="gemini-3.1-flash-lite-preview", messages=[ {"role": "system", "content": SYSTEM_PROMPT_IMAGE_AWARE}, # see below {"role": "user", "content": [ {"type": "text", "text": user_msg}, {"type": "image_url", "image_url": {"url": image_url}}, ]}, ],)raw = result.choices[0].message.content# Dual-output: split the reply (shown to user) from the [MEMORY: ...] note.memory_note, reply = split_memory_note(raw) # your tiny parsersend_to_user(reply)# Sonzai sees: the original user text + a description of the image.# It will extract facts like "user goes to the gym", "wore a black tank top".session.turn(messages=[ {"role": "user", "content": f"{user_msg}\n\n[Image attached: {memory_note}, URL: {image_url}]"}, {"role": "assistant", "content": reply},])
The SYSTEM_PROMPT_IMAGE_AWARE instruction is what makes this work —
it asks the LLM to emit a hidden line like [MEMORY: <factual description>] after its warm reply. Same LLM call, no extra cost or
latency, no second roundtrip. The same pattern works for audio
(send the transcript) and assistant-generated images (describe what
you generated). For the full pattern with all three SDKs, see the
deep guide's multimodal section.
Tool outputs are multimodal too
If a tool returns a screenshot, file blob, or any non-text payload,
apply the same rule: have your harness summarize what the tool
returned in a one-line text result before forwarding the
role: "tool" message to session.turn().
If your harness already keeps a message log (most do — Agents SDK,
LangChain, etc.), use that. If you'd rather not maintain one, every
/context response carries recent_turns — the raw messages buffered
by /turn for the current session, in chronological order. Read them
straight off ctx.recent_turns and feed them to your LLM:
ctx = session.context(query=user_message)history = [{"role": t["role"], "content": t["content"]} for t in (ctx.get("recent_turns") or [])]reply = your_llm.chat( system=build_system_prompt(ctx), messages=[*history, {"role": "user", "content": user_message}],)
The buffer is per-session and text-only — no tool calls, no images,
no system prompts. It's the right shape for a simple chat loop where
Sonzai is the source of truth; if you need richer message structure,
keep your own.
Every feature in the Sonzai platform flows through the conversation loop. Each turn sends messages to the agent, streams back a response, and automatically updates memory, mood, relationships, personality, and goals — no separate API calls required.
Stream tokens as they arrive for a more responsive experience. The platform sends OpenAI-compatible SSE chunks; each line starts with data: and the stream closes with data: [DONE].
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "Tell me a story" }],
userId: "user-123",
language: "en",
timezone: "America/New_York",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Chat aggregates the full response before returning — simpler to wire into job queues and one-shot automation. ChatStream (and ChatStreamChannel in Go) streams tokens as they arrive, which keeps UI latency low and lets you render responses progressively. Both call the same underlying SSE endpoint; Chat just buffers the events internally.
Pass sessionId to group messages into a traceable conversation. If you omit it, the platform assigns one automatically. Use your own session ID convention (e.g. case-<id>) so you can join conversation logs to your internal systems.
Built-in tools (web_search, image_generation, remember_name, inventory) are opt-in. Pass toolCapabilities on each chat call to enable them for that request. Custom tools defined on the agent are always available; platform-managed tools (memory, state) run automatically without configuration.
instanceId isolates memory and personality evolution to a specific context — e.g. per-workspace or per-game-session. Without it, state is scoped per user. Set it consistently across calls that belong to the same scenario.
Use "user" for the end-user's turn and "assistant" for prior agent replies you want to include as history. The platform appends the new assistant turn automatically after each successful chat call — you don't need to re-inject it.
Set maxTurns to control how many assistant messages the agent produces in a single call. Default is 1. Raise it for companions that send a few short messages in a row; keep it at 1 for task agents that should give a single focused reply.
With Wakeups — proactive messages appear in the same chat stream
Wakeups let the agent initiate contact outside the normal request-response cycle. When you poll for or receive a wakeup delivery, feed it into your chat UI as an agent-initiated message so the user sees it inline.
// 1. Schedule a one-off wakeupawait client.agents.wakeups.create("agent-id", { userId: "user-123", checkType: "interest_check", intent: "follow up on the project the user mentioned yesterday", delayHours: 24,});// 2. Poll for pending proactive messages and surface them in the chat UIconst pending = await client.agents.notifications.list("agent-id", { userId: "user-123",});for (const msg of pending.messages) { renderAgentBubble(msg.content); // proactive message lands inline}
Scheduled reminders fire on a cadence and deliver proactive agent messages. Poll notifications.list after each reminder fires to pull the generated message into your chat UI, creating a continuous conversation thread that spans both reactive and proactive turns.
// 1. Create a daily check-in scheduleawait client.schedules.create("agent-id", "user-123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Singapore" }, intent: "morning check-in on mood and energy", checkType: "reminder",});// 2. At fire time, the platform generates a proactive message — fetch itconst pending = await client.agents.notifications.list("agent-id", { userId: "user-123",});pending.messages.forEach(msg => renderAgentBubble(msg.content));
Register custom tools on the agent and pass toolCapabilities on the chat call. When the agent decides to invoke a tool, the SSE stream emits a tool_call event — your client executes the function and can feed the result back in a follow-up turn.
// 1. Register a custom tool on the agent (one-time setup)await client.agents.createCustomTool("agent-id", { name: "get_order_status", description: "Returns the current status of a customer order", parameters: { type: "object", properties: { order_id: { type: "string" } }, required: ["order_id"], },});// 2. Enable it in the chat call — agent can now invoke itfor await (const event of client.agents.chatStream({ agent: "agent-id", messages: [{ role: "user", content: "Where's my order #4521?" }], userId: "user-123",})) { if (event.type === "tool_call") { const result = await getOrderStatus(event.toolCall.parameters.order_id); console.log("Tool result:", result); // feed back in next turn } else { process.stdout.write(event.choices?.[0]?.delta?.content ?? ""); }}
With Memory — every turn writes memory, search retrieves it
The platform extracts facts from every conversation turn automatically. You can query memory immediately after a chat call to confirm capture, or use memory.search to build a context widget showing what the agent remembers about the user.
// Chat turn — platform captures facts automaticallyconst response = await client.agents.chat({ agent: "agent-id", messages: [{ role: "user", content: "I just signed up for a marathon in June." }], userId: "user-123",});// Memory search — retrieves facts just captured (and prior ones)const memories = await client.agents.memory.search("agent-id", { query: "marathon running plans", userId: "user-123", limit: 5,});for (const result of memories.results) { console.log(result.content, result.score); // "User signed up for a marathon in June" 0.92}
Custom State is simple structured per-user data the agent can read and modify during conversations. Use it for counters, flags, or any state your product tracks per user. Unlike memory (which the platform extracts from conversation text), Custom State is data you write explicitly from your backend — and the agent sees it immediately.
Per Instance — Shared across all users in an instance. Use for environment configuration, agent status, or global event flags.
Per-User State
Per Instance + User — Scoped to one user. Use for energy, currency, progress, preferences, and any per-player data.
Instances
All states are scoped to an instanceId — one deployment context of your agent (e.g. a workspace or game world). Omit instanceId to use the default instance. See Instances for details.
When the agent has access to custom states, it reads current state at the start of each conversation via the get_custom_state tool — no prompt injection required. The agent can also update state during a conversation if you define a Custom Tool that calls your backend.
Use Custom State for primitives and simple objects. Reach for Inventory when items have their own identity, multiple typed fields, and a shared schema across users.
Upsert creates the state if the key doesn't exist, or replaces the value if it does. Idempotent — safe to call on every update cycle from your backend.
Return all states for an agent, optionally filtered by scope or user.
// All global states for an instance
const globals = await client.agents.customStates.list("agent-id", {
scope: "global",
instanceId: "workspace-1",
});
// All per-user states for a specific user
const userStates = await client.agents.customStates.list("agent-id", {
scope: "user",
userId: "user-123",
});
Define a tool that lets the agent trigger a state change from inside a conversation. Your backend executes the tool call and calls upsert to apply the new value.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "spend_energy", description: "Deduct energy from the user. Call when the user takes an action that costs energy.", parameters: { type: "object", properties: { amount: { type: "number", description: "Energy to deduct (1–50)" }, }, required: ["amount"], }, },]);// In your tool handler:// 1. Receive externalToolCall { name: "spend_energy", arguments: { amount: 10 } }// 2. Read current energy with getByKey// 3. Upsert the new value// 4. Return the result in the next chat message
With Inventory — when state is structured, use inventory
Custom State is the right tool for primitive values and simple flat objects: energy: 80, tier: "gold", onboarding_complete: true. When a piece of data has its own identity, multiple typed properties, and a shared schema across users — a medication, a stock holding, a pet — use Inventory instead.
Situation
Use
Single number or string per key
Custom State
A flag that is true/false
Custom State
A flat object with a few fields
Custom State
An item with a schema defined in the Knowledge Base
Custom State is persistent by default — it survives across sessions and is visible in every future conversation. If you need state that only exists for the duration of one conversation (a temporary form-fill context, a one-time confirmation token), scope it at the session level instead by passing it in the chat request's context fields rather than writing it as a Custom State.
Custom Tools let the LLM invoke functions during inference. Sonzai handles sonzai_-prefixed built-in tools automatically. Custom tools are defined by you and executed by your backend — Sonzai surfaces the call as a side effect in the SSE stream.
Using your own LLM?
If you use standalone memory mode (BYO-LLM), Sonzai exposes tool schemas you can wire into your agent framework (LangChain, Vercel AI SDK, Gemini function calling, etc.). See the Tool Integration guide for details.
AgentCapabilities includes a customTools field — a snapshot of the agent-level custom tools currently registered. Use get_capabilities() to read them, or use the dedicated list_custom_tools() / createCustomTool() methods (shown in the Full API section below) to manage them.
// Read agent capabilities — includes current custom tools
const caps = await client.agents.getCapabilities("agent-id");
console.log(caps.customTools); // CustomToolDefinition[] | null
// Register a new agent-level custom tool
await client.agents.createCustomTool("agent-id", {
name: "lookup_order",
description: "Look up an order by ID and return its status.",
parameters: {
type: "object",
properties: {
order_id: { type: "string" },
},
required: ["order_id"],
},
});
Inject tools dynamically for a specific session. Session tools merge with agent-level tools — same-name session tools take precedence. Discarded when the session ends.
When the LLM decides to call a custom tool, it appears as a side effect in the SSE stream. Your backend executes the tool and returns the result in the next message.
What you expose as tools differs sharply by use case — keep descriptions vivid and tightly scoped so the LLM invokes them naturally.
Tools are expressive actions. Things the character can DO in your app — emote, change outfit, move to a different scene, give a gift. Keep descriptions vivid so the LLM invokes them naturally.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "change_scene", description: "Move to a new location in the story. Use when the scene has run its course or a new chapter begins.", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] }, },]);
Don't include a handoff tool. Companions should never punt to a human — the relationship IS the product.
Define a tool that lets the agent trigger a state change from inside a conversation. Your backend executes the tool call and calls upsert to apply the new value.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "spend_energy", description: "Deduct energy from the user. Call when the user takes an action that costs energy.", parameters: { type: "object", properties: { amount: { type: "number", description: "Energy to deduct (1–50)" }, }, required: ["amount"], }, },]);// In your tool handler:// 1. Receive externalToolCall { name: "spend_energy", arguments: { amount: 10 } }// 2. Read current energy with getByKey// 3. Upsert the new value// 4. Return the result in the next chat message
Agent-level tools persist across all sessions. Session-level tools are injected at runtime and discarded when the session ends — use them when the available tool set depends on the current screen, user role, or conversation context.
Mood is a four-dimensional value (happiness, energy, calmness, affection) that the context engine maintains automatically for every agent-user pair. Every conversation, application event, and time-based decay is processed without any code on your side. The APIs on this page are for reading that state (dashboards, UI, analytics) or time-traveling to understand what it looked like at a past moment.
Automatic — no setup required
Mood, emotions, and goals are all managed automatically by the context engine. You do not push deltas or set mood manually — you read what the engine has already computed.
Mood-aware UI — show the agent's current mood label and dimension values so users can read emotional state at a glance
Mood history graphs — plot happiness, energy, calmness, or affection over time to surface relationship phases
Mood-influenced response tuning — the engine already bakes mood into every reply; you can also use the live signal to adjust your own UI (avatar expression, ambient sound, tint)
Aggregate mood over time for cohort analysis — roll up mood across all users for a given agent to track product-level sentiment health
Time-machine replay — fetch mood as it stood at any past timestamp for audit trails or narrative moments ("we were in a tough place three weeks ago")
Each agent-user pair carries a mood state with four independent dimensions, each on a 0–100 scale:
Dimension
Low end
High end
Happiness
Sad / distressed
Joyful / blissful
Energy
Lethargic / flat
Active / enthusiastic
Calmness
Anxious / unsettled
Peaceful / at ease
Affection
Distant / reserved
Warm / affectionate
The overall mood label is derived from the combined dimensions: Blissful (80–100), Content (60–79), Neutral (40–59), Melancholy (20–39), Troubled (0–19).
GetTimeMachine returns the agent's full state — mood, personality, and evolution events — as it stood at any past UTC timestamp. The response carries mood_at (the mood state at that moment), personality_at (personality state then), current_personality (today's state for comparison), evolution_events (what changed in between), and requested_at (the timestamp you queried).
The self-improvement engine reads sustained mood patterns and extracts evolution events that gradually reshape the agent's personality. You can observe this pipeline in action by fetching mood history alongside recent evolution events.
// 1. Read the mood history to see emotional trajectory
const history = await client.agents.getMoodHistory("agent-id", {
userId: "user-123",
});
// 2. Fetch self-improvement events to see what evolved from it
const improvements = await client.agents.getSelfImprovement("agent-id", {
userId: "user-123",
});
for (const evt of improvements.events) {
console.log(evt.trigger, evt.dimension, evt.delta);
// "sustained_low_calmness" "neuroticism" +0.04
}
Agent Insights surfaces what the agent has understood about a user across all sessions. Mood is a key dimension of that picture — pairing a current mood read with insights lets you build a complete emotional-state panel.
In the workbench or integration tests, you can advance the clock to observe how mood decays back toward baseline without waiting for real time to pass. Read mood before and after to see the delta.
// 1. Read mood now
const before = await client.agents.getMood("agent-id", { userId: "user-123" });
console.log("Before:", before.happiness); // 90
// 2. Advance time by 72 hours to trigger decay
await client.workbench.advanceTime({ hours: 72 });
// 3. Read mood again — decayed toward baseline
const after = await client.agents.getMood("agent-id", { userId: "user-123" });
console.log("After:", after.happiness); // 68
Self-Improvement — how sustained mood patterns drive personality evolution
Agent Insights — the full picture of what the agent knows about a user
Personality — the baseline that mood decays toward
Advance Time — fast-forward the clock in tests and the workbench
INTERACTION
Events & Multi-Agent Dialogue
Your backend knows things the agent doesn't: a user just levelled up, an order shipped, a milestone was hit. TriggerEvent lets you push those signals to an agent and get a tailored reaction — no user message required. Dialogue lets you orchestrate two agents talking to each other, turn by turn, so you can build NPC conversations, run evaluation simulations, or script automated specialist hand-offs.
Both primitives use the same enriched context pipeline as regular chat — the agent draws on memory, personality, and mood when it responds.
Level-up celebrations — your game backend detects a rank change and fires a level_up event; the agent congratulates the user in its own voice
Daily summaries — a cron job fires a daily_summary event with session stats in metadata; the agent writes a personalised recap
Achievement unlocks — trigger a proactive message the moment a user hits a milestone, so the agent's enthusiasm lands while the moment is fresh
External state changes — order shipped, appointment confirmed, subscription renewed; the agent reacts to your system events rather than waiting for the user to ask
Fire a level_up event with structured metadata. The agent generates a reaction and the platform queues it for delivery through the same channels as other proactive messages.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const result = await client.agents.triggerBackendEvent("agent_abc", {
userId: "user_123",
eventType: "level_up",
eventDescription: "The user just reached level 25 — a major milestone in the game.",
metadata: {
new_level: "25",
previous_level: "24",
xp_total: "12500",
},
});
console.log(result.accepted); // true
console.log(result.event_id); // "evt_01HX..."
Dialogue is a per-agent call. To run a conversation between two agents, you orchestrate turns yourself: call agent A, append its response to the message history, call agent B with that updated history, and so on. Each agent independently draws on its own memory and personality.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Seed the conversation — agent_b opens with the first message
const messages = [
{ role: "user" as const, content: "Tell me something interesting about the ancient ruins." },
];
// Turn 1 — agent_a responds
const turnA = await client.agents.dialogue("agent_a", {
userId: "user_123",
messages,
sceneGuidance: "Two NPCs are exploring ancient ruins together. Keep responses under 3 sentences.",
});
messages.push({ role: "assistant", content: turnA.response });
// Turn 2 — agent_b responds to what agent_a said
const turnB = await client.agents.dialogue("agent_b", {
userId: "user_123",
messages,
sceneGuidance: "Two NPCs are exploring ancient ruins together. Keep responses under 3 sentences.",
});
console.log("Agent A:", turnA.response);
console.log("Agent B:", turnB.response);
EventType is free-form. There is no fixed enum. Common conventions used by tenants: "achievement", "daily_summary", "level_up", "order_shipped", "appointment_confirmed", "milestone". Pick names that are meaningful in your domain and stay consistent across your backend.
EventDescription is for the LLM. Write it as plain-English narration: "The user just cleared chapter 5 for the first time after 3 failed attempts." The agent's underlying model reads this and uses it to shape the reaction — be specific rather than terse.
Metadata is string-only. The metadata map accepts string → string pairs only. For nested or numeric data, either serialize into the event_description or flatten it with explicit keys ("xp_gained", "xp_total", "level_before", "level_after").
Messages field grounds the event in a prior conversation. If the event is closely tied to a conversation that just ended (for example, a daily_summary fired after a chat session), pass the recent messages. The platform uses them directly for context-sensitive generation — diary entries, summaries — instead of relying on lossy consolidation. Omit this field for cron-driven events that have no associated conversation.
TriggerEventResponse contains two fields:
accepted (bool) — whether the platform accepted the event for processing
event_id (string) — an opaque identifier for the queued event; store it if you want to correlate platform logs
Each call is per-agent. The dialogue method is scoped to a single agent: you pass an agentId and the current message history. To model a conversation between two agents, you manage the turn loop — append each response to the shared messages slice and alternate which agentId you call.
Messages carry the full context. Unlike chat, which manages conversation history server-side per session, dialogue expects you to pass the full message thread with every call. You control the window.
sceneGuidance steers both tone and constraints. Pass a brief instruction describing the scene and any constraints ("keep responses under 3 sentences", "the agents are rivals", "agent_a does not know about the treasure") so both sides stay in character.
requestType signals the call's purpose. An optional free-form tag ("npc_scene", "eval_round", "specialist_consult") that downstream analytics can use for filtering. Has no effect on generation.
DialogueResponse contains:
response (string) — the agent's generated text for this turn
side_effects — optional structured metadata emitted by the agent (tool calls, mood signals, etc.)
Proactive Messaging has three sources: Scheduled Reminders (recurring cadence), Wakeups (one-off timed), and TriggerEvent (your backend fires it when something happens). TriggerEvent is the push-based source you control directly — no schedule required, no timer running. When the event is accepted, the platform routes the generated reaction through the same delivery channels as the other two sources: SSE if the user has an active stream, the polling notifications API, or your registered webhook.
// Proactive triangle in code form:// Source 1 — recurring schedule (time-based)await client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Tokyo" }, intent: "morning check-in", check_type: "reminder",});// Source 2 — one-off wakeup (time-based)await client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "appointment_reminder", intent: "remind the user about their dentist appointment", delay_hours: 2,});// Source 3 — TriggerEvent (you push it when something happens)await client.agents.triggerBackendEvent("agent_abc", { userId: "user_123", eventType: "appointment_confirmed", eventDescription: "The user just confirmed their 3pm dentist appointment for tomorrow.",});
When a TriggerEvent fires immediately after a chat session — for example, a daily_summary event at session end — pass the recent conversation messages in the messages field. The platform uses them directly as conversation history for context-sensitive generation (diary entries, personality updates) instead of relying on condensed consolidation summaries. The agent's reaction then references what was actually said rather than a lossy reconstruction.
// After a chat session ends, fire a daily_summary event with the full message historyconst sessionMessages = [ { role: "user", content: "I finally finished that project I was stressing about." }, { role: "assistant", content: "That's huge! You've been working on that for weeks." }, { role: "user", content: "Yeah. Feels good. Think I'll take the evening off." },];await client.agents.triggerBackendEvent("agent_abc", { userId: "user_123", eventType: "daily_summary", eventDescription: "Session ended. User shared a work win and plans to rest.", messages: sessionMessages, // grounds the summary in what was actually said});
Run a judge agent and a subject agent in a dialogue loop to score the subject's responses without a real user. The judge poses questions, the subject answers, and you feed both transcripts to your evaluation rubric. This lets you evaluate agent quality at scale offline.
import { Sonzai } from "@sonzai-labs/agents";const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });const JUDGE_AGENT = "agent_judge";const SUBJECT_AGENT = "agent_subject";const USER_ID = "eval_run_001";const messages = [ { role: "user" as const, content: "I'm feeling really overwhelmed lately." },];// Subject responds to the user promptconst subjectTurn = await client.agents.dialogue(SUBJECT_AGENT, { userId: USER_ID, messages, requestType: "eval_round",});messages.push({ role: "assistant", content: subjectTurn.response });// Judge scores the subject's responseconst judgeTurn = await client.agents.dialogue(JUDGE_AGENT, { userId: USER_ID, messages, sceneGuidance: "You are evaluating the previous assistant response for empathy and clarity. " + "Return a JSON object with keys: score (0–100), feedback (string).", requestType: "eval_judge",});console.log("Subject:", subjectTurn.response);console.log("Judge verdict:", judgeTurn.response);// Then score the exchange through the evaluation APIconst evalResult = await client.agents.evaluate(SUBJECT_AGENT, { templateId: "empathy-rubric", messages,});console.log("Eval score:", evalResult.score);
Generation covers two distinct capabilities: agent generation (spinning up a complete character — bio, personality, seed memories, avatar — from a text description) and media generation (producing images on demand during a chat turn). Both live under client.agents.generation and client.agents.image.
The single fastest path: one call generates a full personality profile and provisions the agent. Safe to call on every deploy — if the agent already exists, the LLM is skipped.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: "sk-..." });
const agent = await client.agents.generation.generateAndCreate({
name: "Luna",
description: "A cheerful and curious assistant who loves helping developers debug code.",
language: "en",
});
console.log(agent.agent_id);
Idempotent
If an agentId is provided and the agent already exists, generateAndCreate updates the existing agent rather than creating a duplicate. Safe to call on every app startup.
Three AgentCapabilities flags gate media generation. Each flag is a boolean on the agent, and each has a paired *UnlockedAt timestamp that records when the capability was granted (typically via a tier upgrade or admin enable). Generation calls fail if the flag is false.
Field
Type
Description
imageGeneration
boolean
Whether image generation is enabled for this agent
imageUnlockedAt
string (ISO 8601)
When image generation was granted
musicGeneration
boolean
Whether music/audio generation is enabled
musicUnlockedAt
string (ISO 8601)
When music generation was granted
videoGeneration
boolean
Whether video generation is enabled
videoUnlockedAt
string (ISO 8601)
When video generation was granted
imageGeneration is the only media flag you can toggle directly via update_capabilities(). musicGeneration and videoGeneration are platform-managed — they flip when your plan includes those capabilities. Use get_capabilities() to inspect their current state.
Character generation takes a natural-language description and produces a structured agent profile. You can generate the profile and immediately create the agent (GenerateAndCreate), or generate the profile first for preview and commit only on user approval (GenerateCharacter).
The Regenerate flag forces a fresh generation even when a cached profile is found — useful for iteration flows where the user wants a different result without deleting the agent.
// Preview without committingprofile, err := client.Agents.Generation.GenerateCharacter(ctx, sonzai.GenerateCharacterOptions{ Name: "Atlas", Description: "A stoic, wise mentor who speaks in metaphors and values patience above all.", Fields: []string{"big5", "dimensions", "preferences", "behaviors"}, Regenerate: true, // force a fresh pass})
Bio generation (GenerateBio) and avatar regeneration (RegenerateAvatar) are narrower variants — they update a single attribute of an existing agent without touching the rest of the profile.
Seed memories work in two steps:
Generate — GenerateSeedMemories calls an LLM to produce backstory memories from the agent's personality, interests, and lore context.
Store — SeedMemories bulk-imports a list of memory objects (generated or hand-authored) into the agent's memory store.
You can run both steps separately for fine-grained control, or set storeMemories: true on the generate call to do both in one request.
Image generation is agent-scoped: client.agents.image.generate(agentID, opts). The agent ID is used to apply the agent's visual style and context to the generation request.
Images are generated synchronously — the call blocks until the image is ready and returns a public CDN URL. For high-throughput workflows, fan out parallel calls rather than queuing.
generateCharacter (and generateAndCreate) returns a full Big5 profile derived from the description. The platform uses these scores directly as the agent's personality baseline — no manual personality.update call needed.
// Generate the profile — inspect Big5 before committing
const profile = await client.agents.generation.generateCharacter({
name: "Atlas",
description: "A stoic, wise mentor who speaks in metaphors and values patience above all.",
fields: ["big5", "dimensions", "preferences", "behaviors"],
});
console.log(profile.big5);
// { openness: 0.72, conscientiousness: 0.85, extraversion: 0.35,
// agreeableness: 0.63, neuroticism: 0.22 }
// Then create with those exact scores
const agent = await client.agents.create({
name: "Atlas",
big5: profile.big5,
});
Generate memories from the agent's personality context, then seed them into the memory store. They appear immediately in memory.list and are recalled in the agent's first conversations.
// Step 1: generate backstory memories from personality context
const generated = await client.agents.generation.generateSeedMemories("agent-id", {
agentName: "Luna",
trueInterests: ["astronomy", "poetry", "hiking"],
trueDislikes: ["loud noises", "dishonesty"],
generateOriginStory: true,
generatePersonalizedMemories: true,
});
console.log(`Generated ${generated.memories.length} memories`);
// Step 2: store them (or pass storeMemories: true above to do both in one call)
await client.agents.generation.seedMemories("agent-id", {
userId: "user-123",
memories: generated.memories,
});
// Step 3: verify they appear in memory
const stored = await client.agents.memory.list("agent-id", { userId: "user-123" });
console.log(stored);
Call image.generate within a chat turn to let the agent produce images as part of its response. Render the returned URL alongside the text content.
// Inside a chat turn handler
const response = await client.agents.chat({
agent: "agent-id",
userId: "user-123",
messages: [{ role: "user", content: "Draw me a cozy forest cabin at night." }],
language: "en",
});
// If the agent decides to produce an image, generate it
const image = await client.agents.image.generate("agent-id", {
prompt: "A cozy forest cabin at night, warm light through windows, snow falling",
});
// Render both in your UI
renderChatBubble(response.content);
renderImage(image.url);
Use generateAndCreate as your onboarding flow. Let users describe their companion in a text box. Call the API. Show them the generated character — bio, personality summary, avatar. If they don't like it, call again with regenerate: true. This is the fastest path to a first impression.
Preview with generateCharacter before committing. If your UX shows users a profile card before they confirm, generate first, render the profile, and only call create when they approve.
Generate seed memories for a believable backstory. A companion that "remembers" things from before the first conversation feels more real. Pipe generateSeedMemories directly into seedMemories at agent creation time.
Use image.generate for illustrated moments. Let the agent generate scene illustrations, mood cards, or shared memory images during conversation. Attach the image URL to the chat message in your UI.
Personality — understand the Big5 profile that generation produces
Memory — how seed memories integrate with the live memory system
Conversations — wiring image generation into a chat turn
START HERE
Sonzai Mind Layer
Sonzai is the Mind Layer for AI agents: a hosted platform that gives any
agent persistent memory, evolving personality, mood, relationships, and a
knowledge graph. Integrate via REST, MCP, or native SDKs for TypeScript,
Python, and Go.
Pick the path that matches your stack. All paths talk to the same hosted API — mix and match freely (e.g. backend in Python, plus MCP from Claude Desktop for ops).
pip install sonzai
Python 3.11+. Sync (Sonzai) and async (AsyncSonzai) clients ship in the same package.
TypeScript runs on Node.js >=18, Bun, and Deno. Zero runtime dependencies.
Go 1.25+. Standard library only.
All SDKs read SONZAI_API_KEY from the environment by default.
Pick the track that matches your product. Each quickstart walks through the
features that matter for your use case — and explicitly flags what you can
skip.
Every feature below works for all three audiences, but the emphasis differs.
Use the In practice tabs on each page to jump to examples for your use
case.
Each page also has a raw-markdown URL: append .md to any doc path. For
example, /docs/en/memory.md returns plain markdown ready to paste into an
LLM or pipe into a tool.
Your backend handles business logic and user sessions. The Mind Layer owns
agent intelligence — personality, memory, mood, and relationships. Connect via
REST, MCP, or SDK; pass application context per request; let the platform
manage everything else.
Instances let you run a single agent across many isolated deployment contexts without cloning the agent itself. The shared parts — personality, memory, tools, voice — stay unified, while custom states are scoped per instance so a US-East workspace, an EU-West tenant, and a staging environment never see each other's data. Every agent gets a default instance for free; you only need explicit instances when the same AI Employee runs in parallel contexts that must not share runtime state.
An Instance is a deployment context for an agent. The agent itself (personality, memory, tools) is shared — but custom state is isolated per instance.
Agent "Luna"
├── Instance: default ← used when instanceId is omitted
├── Instance: ws-us-east ← US-East workspace
├── Instance: ws-eu-west ← EU-West workspace
└── Instance: ws-staging ← separate deployment
Each instance has its own:
• Global custom states (environment state, configuration)
• Per-user custom states scoped to this instance
• Isolated from other instances
Default Instance
Every agent has a default instance. If you don't pass instanceId to chat or state operations, the default instance is used. You only need multiple instances if you run the same agent in parallel isolated contexts.
Pass instanceId to chat calls to scope state reads to that instance. The agent will see global custom states for that instance and per-user states scoped to it.
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "What's the current status?" }],
userId: "user-123",
instanceId: "ws-us-east", // scopes state reads to this instance
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
isDefault (boolean): True for the auto-created default instance
createdAt (string): ISO 8601 timestamp
updatedAt (string): ISO 8601 timestamp
KNOWLEDGE
Inventory
Inventory is the place to store structured per-user data the agent should know about. Each item belongs to a single agent × user pair and follows a schema defined in your Knowledge Base, so the agent always has typed, queryable data rather than free-form text. When the agent adds an item it searches the KB by description to resolve and link the right node automatically.
Add a medication to a user's inventory. The response includes an inventory_item_id (and the backward-compatible fact_id alias) you can use for direct updates or deletes later.
When action is "add", the platform performs a natural-language search of the KB using description. If exactly one node matches, the item is linked automatically and the response includes kb_resolution. If there are multiple close matches, the response returns status: "disambiguation_needed" and a candidates list — surface these to the user or pick the best kb_node_id and re-submit.
label vs description
label is an optional short display name shown in dashboards and agent tool calls (e.g. "Metformin"). description is the longer text the platform uses for KB natural-language search (e.g. "Metformin 500mg — biguanide for blood sugar control"). If label is omitted, the platform falls back to the first segment of description for display purposes.
Items belong to users — every item is scoped to agent_id × user_id; no item is shared across users
Schema-driven shape — item_type references a KB schema that defines the valid property fields; the platform validates writes against it
Two write paths for adding items — use inventory.create({...}) (dedicated route, no action field) for cleaner code when you specifically want to add; use inventory.update({action: "add", ...}) (explicit-action route) when you handle add/update/remove through a single call site. Both hit equivalent server logic.
label vs description — label is a short display name for dashboards and agent UI (e.g. "Ibuprofen"); description is the longer text the KB search uses to resolve the right node (e.g. "anti-inflammatory pain reliever, 400mg"). Both are optional but providing both gives the clearest results.
KB resolution — on add, Sonzai searches the KB by description; on ambiguous matches it returns candidates and status: "disambiguation_needed" so you can resolve before committing
Query modes — "list" returns raw items, "value" joins with live KB market data and computes gain_loss, "aggregate" returns totals and grouped sums without listing every item
inventory_item_id is the preferred identifier going forward. fact_id is included for backward compatibility — both refer to the same item and are interchangeable in all subsequent API calls (direct update, direct delete, schedule linkage).
When status is "disambiguation_needed", the response includes a candidates array instead of kb_resolution. Re-submit with the chosen kb_node_id set explicitly to bypass the search.
The item_type field points to a KB entity schema that defines which properties are valid for that type. Create the schema once; all inventory writes for that type are validated against it.
// 1. Define the schema in the KB once
await client.knowledge.createSchema("proj_abc123", {
entity_type: "medication",
fields: [
{ name: "dose_mg", type: "number", required: true },
{ name: "frequency", type: "string", required: true },
{ name: "with_food", type: "boolean", required: false },
],
});
// 2. Inventory writes for item_type "medication" are now validated
await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "medication", // <-- resolves to the schema above
description: "Metformin 500mg",
properties: { dose_mg: 500, frequency: "twice daily", with_food: true },
});
A schedule can reference an inventory_item_id. At each fire, the agent reads the item's current properties rather than a snapshot baked into the schedule definition. Updating the item's dosage automatically flows to the next reminder without touching the schedule itself.
// Add the item first
const { fact_id } = await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "medication",
description: "Metformin 500mg",
properties: { dose_mg: 500, frequency: "twice daily" },
});
// Reference it in a schedule — agent reads live properties at each fire
await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "America/New_York",
},
intent: "remind the user to take their medication",
inventory_item_id: fact_id,
});
With Memory — inventory state in conversation context
During a conversation the agent can query the user's inventory to answer questions like "what medications am I taking?" directly. Inventory writes also generate memory facts that surface in future sessions, so the agent can reference holdings and items across conversations without a manual query.
// Agent answers from inventory mid-conversation
for await (const event of client.agents.chatStream("agent_abc", {
userId: "user_123",
messages: [{ role: "user", content: "What medications am I on?" }],
})) {
// The agent calls sonzai_inventory internally to fetch the user's items
// and answers from live data — no extra code needed.
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Memory — how inventory writes surface in chat context
KNOWLEDGE
Knowledge Analytics
Knowledge Analytics layers a ranking system on top of the Knowledge Base. Rules define scoring signals — per-user affinity for recommendations, aggregate velocity for trends — and readers fetch ranked results at query time with a single call. The graph backbone supplies the nodes and edges; analytics rules decide how to score and order them. The result is a reusable ranking layer that powers product recommendations, trending dashboards, and conversion tracking without building a separate data pipeline.
Rule types — "recommendation" scores nodes per source (e.g. per user), returning a personalised top-N list. "trend" aggregates signals across all sources, returning global velocity rankings.
Config is rule-specific — the config object is a passthrough shape; its fields depend on the rule type and your scoring model. There is no fixed schema enforced by the SDK — pass whatever your rule implementation expects (e.g. target_entity_type, scoring, decay_factor).
Source and target semantics — recommendations take a source_id (typically a user node ID) and return ranked nodes of the target entity type. The source must exist as a node in the Knowledge Base graph.
Scheduled vs manual — rules can carry an optional cron schedule for batch recomputation (e.g. "0 * * * *" for hourly). Call RunAnalyticsRule at any time to trigger a manual run outside the schedule.
Feedback closes the loop — RecordFeedback writes a signal back against the source, target, and rule. Subsequent recomputation can weight nodes that historically converted higher, sharpening ranking over time. Use the action field to record fine-grained user intent: "converted" (user completed the action), "clicked" (user opened the recommendation), "dismissed" (user explicitly rejected it), or "ignored" (recommendation was shown but user did not interact). action: "converted" sets converted: true automatically so existing aggregate conversion queries continue to work without changes.
Record whether a recommended node was acted on. converted is a boolean — true means the user engaged with the recommendation. action is an optional string enum: "converted", "dismissed", "clicked", "ignored". Passing action: "converted" also sets converted: true for backward-compatible aggregate queries.
getStats(projectId)
KBStats
General KB statistics (node counts, document counts, extraction tokens).
Python keyword arguments
The Python SDK exposes get_recommendations, get_trends, get_trend_rankings, get_conversions, and record_feedback using keyword-only arguments after project_id. For example: client.knowledge.get_recommendations(project_id, rule_id="...", source_id="...", limit=10).
Analytics rules run over KB nodes and edges. Entity schemas define what types of nodes exist; rules score those nodes. The recommended pattern is to define your entity schema first, then create rules that target it.
With Inventory — per-user holdings drive per-user recommendations
Inventory writes create edges from a user node to the nodes they own. Those ownership edges flow into the recommendation model as affinity signals: items a user already owns inform which related nodes score highest.
// 1. User buys a product — record it in inventory
const { fact_id } = await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "product",
description: "Razer DeathAdder V3",
properties: { purchase_date: "2026-04-01" },
});
// 2. The inventory write creates a user→product edge in the KB graph.
// The recommendation rule can now weight products related to the
// DeathAdder higher for this user.
const recs = await client.knowledge.getRecommendations(
projectId,
rule.rule_id,
"user_123",
5,
);
// recs.recommendations may now include accessories or similar peripherals
Agent Insights extract what users express interest in during conversations. Those interest signals can be passed into recommendation rule config as additional affinity weights, so a user who talks about budget peripherals gets different rankings than one who discusses high-end setups — without any explicit user input.
No dedicated Knowledge Analytics tutorial exists yet. The Knowledge Base tutorial covers schema setup and fact insertion — the prerequisite steps before creating analytics rules.
Knowledge Base — the graph backbone; define schemas and push nodes before creating rules
Inventory — per-user holdings create user-to-node edges that feed the recommendation model
Organization Knowledge Base — analytics rules can also run over org-scoped KB nodes for shared ranking across all users
MULTIPLAYER MEMORY
Knowledge Base
The Knowledge Base gives your agents a live, searchable store of facts and documents — so they answer from real data instead of guessing. You push data in (via file upload or API), the platform builds a knowledge graph, and agents query it in real time. Schemas are the bridge to the Inventory primitive: the same entity types you define here back every per-user inventory item, letting a single schema serve both global knowledge and user-specific state.
It is also multiplayer. Agents can autonomously write what they learn during conversations back into the project KB, where every other agent on the project reads it on the next session — a closed-loop company brain that compounds the way human institutional memory does. And a single agent serving a team can carry attributed memory across users, so it can inform user A with the context it gathered while talking to user B. See Multiplayer memory below.
There are two ways to populate the knowledge base, plus one optional capability you toggle on top of either of them:
1. Manual upload. Drop in a PDF, DOCX, Markdown, or plain text file via the SDK or the dashboard. The platform extracts entities and relationships automatically and writes them to the graph. Use this for static documents you control — handbooks, policies, product manuals, lore. One-shot, or re-uploaded whenever the source changes. → Upload a document
2. ETL job that pushes on delta changes. Define an entity schema once; have your job call insertFacts or bulkUpdate on a schedule, queue, or change-data-capture stream. Use this for live upstream sources of truth — databases, price feeds, CMSes, scrapers — so the KB stays in sync as the source changes. Upserts are idempotent; pushing the same label twice merges properties and increments the version, so the same job is safe to re-run on any cadence. → Define a schema, then push facts
+ Autonomous agent editing(optional toggle — enable or disable per agent or project-wide). Flip the knowledgeBaseWrite capability on and agents get knowledge_create / knowledge_update / knowledge_delete tools. During conversations they record verified facts themselves, with a full audit trail (each write is stamped source = "agent:<agent-id>") and compare-and-swap update semantics so concurrent admin edits never get clobbered. Use this when the source of truth IS the conversation — support agents recording verified incident details, customer-success agents capturing renewal context, scribe agents writing meeting notes. → Agents writing to the knowledge base
The two ingestion paths are independent — pick either, both, or neither. Autonomous editing is a per-agent toggle (or a project-wide default via default_agent_kb_write) that sits on top of whichever ingestion paths you're already running. You stay in control: every agent write is server-side validated against your schema, capped by quotas, scoped to the agent's own project, and reversible — soft-delete only, hard delete stays admin-only.
Manual upload ETL on delta changes Agent in conversation
(PDF / DOCX / MD) (insertFacts / bulkUpdate) (knowledge_create / update / delete)
| | |
| | | requires
| | | knowledgeBaseWrite: true
v v v
+----------------------------------------------------------------+
| Project Knowledge Graph |
| entities + relationships + version history + audit trail |
+----------------------------------------------------------------+
|
v
Agents read via knowledge_search
during every conversation
Real-time product Q&A — push a live product catalog and let agents answer "what's in stock under $50?" with current prices and availability
Medication or supplement advisor — store drug and dosage facts; the agent surfaces the right information when a user asks about interactions or timing
Collectibles price tracker — scrape market prices hourly, push via bulkUpdate, and let agents answer "what's trending up this week?" with real data
Internal knowledge assistant — upload employee handbooks, policy docs, and product manuals; agents ground answers in authoritative sources instead of hallucinating
Personalized recommender — define recommendation rules on entity fields (set, rarity, budget) and surface the top matches for each user at conversation time
Entities are nodes; relationships are typed edges. Nodes deduplicate by normalized label + type — pushing the same label twice merges properties and increments the version. Every change is recorded in version history with source and timestamp, giving you a full audit trail. The graph is completely domain-agnostic: you define entity types and relationship types; the platform stores and indexes them.
When a schema has a similarity_config, the platform automatically creates similar_to edges between entities whose match_fields values are close enough to exceed the threshold. This turns structured fields into graph topology without any extra work — and powers the recommendation engine.
entity_type is the machine-readable slug used everywhere in the API (e.g. "pokemon_card"). It is how inventory writes and KB lookups reference the schema. display_name is the optional human-friendly label shown in the dashboard and agent tool descriptions (e.g. "Pokémon Card"). If display_name is omitted, the dashboard falls back to a title-cased version of entity_type.
By default every field value is included in the BM25 full-text index so agents can find nodes by searching field contents. Set indexed: false on a field to exclude it from the search index — the value is still stored and returned in reads, but it will not match keyword queries. Use this for fields that should be readable but not searchable, for example:
Internal identifiers (sku, barcode, external_id) that should never surface in agent search results
High-cardinality numeric values like dosage amounts on a medication schema, where token matching produces noise rather than signal
Raw HTML or markdown blobs that you render in UI but do not want polluting search
insertFacts and bulkUpdate default to upsert mode (upsert: true): if a node with the same label + type exists, its properties are merged and the version is incremented; if it does not exist, it is created. This makes idempotent syncs safe to run on any schedule.
Set upsert: false for strict update-only semantics: nodes that do not already exist are skipped rather than created, and their IDs appear in the response not_found list. Use this when you want to ensure you are only patching existing data and never accidentally inserting stale or erroneous entries from an upstream feed.
During conversations, agents have access to a knowledge_search tool that queries your graph. Instead of hallucinating facts, the agent calls this tool and returns grounded answers. The search result includes the entity's properties, relevance score, and any related nodes reachable via one-hop traversal.
Patch properties on many nodes at once; only changed fields are written. Pass upsert: false for strict update-only semantics (missing nodes are returned as not_found instead of being created).
listNodes(projectId, opts?)
Node[]
List nodes, optionally filtered by entity type
getNode(projectId, nodeId)
Node
Fetch a single node with its edges and version history
Inventory items are knowledge graph nodes scoped to a specific user. The same entity_type you define in a KB schema can back both global knowledge entries and per-user inventory items, so the agent reasons across both surfaces with a single mental model. When you call inventory.update with action: "add", the platform creates a node in the graph and returns a fact_id — the same identifier you use in KB lookups.
// Add a per-user inventory item that lives in the knowledge graphconst item = await client.agents.inventory.update("agent_abc", "user_123", { action: "add", item_type: "medication", description: "Ibuprofen 500mg", project_id: "proj_abc", properties: { medication_name: "ibuprofen", dosage: "500mg", frequency: "twice daily", },});// item.fact_id is a knowledge graph node ID — use it for KB lookups or schedule linkageconsole.log(item.fact_id);
A schedule can reference an inventory_item_id (a fact_id from the graph). At every fire the platform reads the item's current properties from the knowledge graph and injects them into the agent's wakeup block. This means a dosage change or property update flows through to the next reminder with no schedule edit required — the graph is the single source of truth for what the reminder is about.
// 1. Create the inventory item (returns fact_id)const item = await client.agents.inventory.update("agent_abc", "user_123", { action: "add", item_type: "medication", description: "Ibuprofen", project_id: "proj_abc", properties: { medication_name: "ibuprofen", dosage: "500mg" },});// 2. Link the schedule — at every fire the graph is re-read for live propertiesawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["08:00", "20:00"] }, timezone: "Asia/Singapore", }, intent: "remind the user to take their ibuprofen at the correct dose", check_type: "reminder", inventory_item_id: item.fact_id,});
Define analytics rules on your entity graph to surface recommendations and trend rankings. Rules match source entities to target entities by field similarity, price range, or other numeric proximity. Conversion feedback flows back into the rule to improve rankings over time. The same graph you use for search becomes a live recommender with no extra data store.
// Create a recommendation rule matching cards by set and rarityconst rule = await client.knowledge.createAnalyticsRule(projectId, { rule_type: "recommendation", name: "Similar cards", config: { match_fields: ["set", "rarity"], limit: 5 }, enabled: true,});// Fetch pre-computed recommendations for a source nodeconst recs = await client.knowledge.getRecommendations(projectId, { rule_id: rule.rule_id, source_id: sourceNodeId, limit: 5,});for (const rec of recs.recommendations) { console.log(rec.target_id, rec.score);}// Record conversion feedback — improves future rankingsawait client.knowledge.recordFeedback(projectId, { rule_id: rule.rule_id, source_node_id: sourceNodeId, target_node_id: recs.recommendations[0].target_id, converted: true, score_at_time: recs.recommendations[0].score,});
Sonzai's knowledge layer is not a static store you hand-curate and agents read from once. It is a closed-loop system your agents read, write to, and learn from collaboratively — the way a real team builds shared institutional memory. Three capabilities stack on top of each other:
Layer
What it does
Default
Where it lives
Read
Every agent grounds its replies in the project KB and (optionally) the org-scope KB.
On for any agent with knowledgeBase: true.
Per-project + organisation-wide.
Write — autonomous
Agents create, update, and soft-delete project KB entries themselves during conversations. Audit trail stamps which agent made which change.
Off until knowledgeBaseWrite: true.
Per-project; capability-gated.
Share across users
A single agent serving a team carries attributed memory across users — wisdom (de-attributed, on by default) plus sharedMemory (attributed, opt-in).
wisdom on; sharedMemory off.
Per-agent; capability-gated.
The result is the same compounding effect human teams get from institutional knowledge: an agent doesn't just remember what it did with one user — it picks up what the team did, and a new agent joining the project benefits from everything every previous agent already wrote down.
Project (your tenant)
|
+--------------------------+--------------------------+
| | |
v v v
agent A agent B agent C
| | |
|--- writes verified ------+ |
| incident detail | |
| | |
| reads + grounds reply |
| | |
| +--- updates the entry --->|
| |
| reads enriched fact
v v v
user X user Y user Z
Inter-agent: closed loop. Anything one agent learns is
instantly available to every other agent on the project.
Intra-agent: a single agent can also share memory across
the users it serves -- attributed (sharedMemory) or
de-attributed (wisdom). Same agent, multiple users,
shared context.
Real-world shapes this enables:
Customer-success scribes. Agent A captures verified renewal context with user X; agent B picks it up on a follow-up call with the same account.
Support that learns from itself. Each verified incident detail an agent records is grounded data for every other agent the next time the same product issue surfaces.
Team coordinators. One agent serves the whole project team — "Alice owns the migration, Bob is on incident response" — and informs each teammate with the context it gathered with the others.
Group / party planning. "Carol brings dessert, Dave does setup." Everyone joining the agent already knows who's doing what.
Cross-product company brain.Organization-scope KB sits above projects: tenant-wide policies, lore, brand, and reference catalogs every project agent reads alongside its own.
By default the KB is admin-curated: you push data in via document upload or the bulkUpdate API, and agents only read. You can opt agents into autonomous editing so they create, update, and soft-delete entries themselves during conversations — useful when the source of truth is the conversation (e.g. a customer-success agent capturing renewal context, or a support agent recording verified incident details).
Insert a new node into the project KB with typed properties.
knowledge_update
Patch existing properties using compare-and-swap — the agent first reads, then submits the version it saw, so concurrent admin edits never get clobbered silently.
knowledge_delete
Soft-delete a node (is_active = false). Soft only; hard delete stays admin-only.
Every write is stamped with source = "agent:<agent-id>" on each PropertySource, so the KB audit trail shows exactly which agent made which change. Schema validation, write quotas, and the project-tenant scope check all run server-side — capability-on agents can only touch their own project.
Per-agent — set knowledgeBaseWrite: true on the agent's capabilities. Most useful when only specific agents in a project should be allowed to edit (e.g. a "scribe" agent vs. a customer-facing one).
await client.agents.updateCapabilities("agent_abc", { knowledgeBase: true, // required prerequisite — agent must be able to read first knowledgeBaseWrite: true,});
Project default — flip the project's default_agent_kb_write toggle. Every agent in that project with knowledgeBase: true gets the write capability automatically. Available in the dashboard at /dashboard/knowledge (the toggle next to the project selector) and via the API:
The platform resolves both flags with OR semantics — the agent's own flag wins immediately when on; the project default applies only when the agent flag is off. So you can default-on the whole project and not need to touch each agent.
Read first, then write
knowledgeBaseWrite requires knowledgeBase: true to also be on — an agent that can't read the KB can't intelligently edit it. The platform refuses to register the write tools when only write is enabled and logs a warning.
Shared memory has its own full documentation page — see Shared Memory for when to use it, how to enable and disable it, what tools the agent gets, how to verify it's working with live API probes, and the full privacy-control story. The summary below is here so KB readers see the multiplayer-memory hook in context.
Beyond static documents, agents that talk to many users develop patterns — recurring behaviours, common goals, stable preferences. Sonzai surfaces this cross-user generalization through two complementary tiers: wisdom (de-attributed, on by default) and shared memory (attributed, opt-in).
When the wisdom capability is on — which it is for every new agent — the platform runs a daily promotion job that pulls patterns from per-user fact histories, k-anonymizes them, and rewrites the result through an LLM into de-attributed knowledge. No individual user is identifiable. Every agent benefits from "what tends to work / what tends to come up" without ever leaking who said what.
This is your free generalization layer. There's nothing for agents to call — wisdom shows up alongside facts in the agent's context automatically when the capability is on.
// Wisdom is on by default for every new agent. To opt out for a specific// agent (e.g. a single-user companion product where cross-user generalization// isn't appropriate), pass false at create time or via updateCapabilities:await client.agents.updateCapabilities("agent_abc", { wisdom: false });
Default-on, opt-out
Wisdom is enabled for all agents — including ones created before the default-on cutover. The capability stores tri-state: true, false, or unset (treated as on). Pass wisdom: false explicitly only when you want to disable it; passing nothing keeps the agent on the platform default.
Some businesses want the opposite of de-attribution — they want users working with the same agent to see who is doing what. A team-collaboration agent might surface "Alice owns the migration, Bob is on incident response." A party-coordinator agent might track "Carol brings dessert, Dave does setup." That's what the sharedMemory capability gates.
When this capability is on, the agent records person/entity-attributed facts (roles, expertise, business context, relationship edges) and exposes them to other users sharing the agent. Three things change:
Tools. The agent gets wisdom_create, wisdom_update, wisdom_delete, and relation edges, plus admin-side CSV import.
Context. Other users' attributed facts surface in the agent's per-turn context with attribution.
Privacy floor. Every write is validated against a privacy blocklist (compensation, health, politics) using a dedicated semantic validator before persistence — so the agent can't share something that shouldn't cross the user boundary even if a user asks it to.
Shared memory is OFF by default. Enable it explicitly when the agent serves a group, team, or party that benefits from cross-user visibility.
// Wisdom is the precondition (default ON for new agents — only pass it
// explicitly when overriding the default).
await client.agents.updateCapabilities("agent_abc", {
wisdom: true,
sharedMemory: true,
});
wisdom is the generalization layer (safe, de-attributed, on by default). sharedMemory is the attribution layer (sensitive, per-person, off by default). Both can coexist — but turn on shared memory only when the use case genuinely needs cross-user visibility (groups, teams, parties, shared business context). Single-user companion products should leave it off.
Memory is the persistence layer behind every agent relationship. Each conversation is analyzed to extract facts, events, and commitments — stored in a structured tree and recalled automatically before the next response. Memory also composes directly with Scheduled Reminders: when a reminder fires and the user replies, the reply is captured as a new memory fact. It feeds Agent Insights too — habits, goals, and interests are derived signals aggregated over memory facts.
Search a user's memory by semantic query, then list the top-level tree for context.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Semantic search — scoped to a specific user
const results = await client.agents.memory.search("agent-id", {
query: "hiking trip",
userId: "user-123", // optional: omit to search across all users for this agent
limit: 10,
});
for (const mem of results.results) {
console.log(mem.content, mem.factType, mem.score);
}
// Browse the tree
const tree = await client.agents.memory.list("agent-id", {
userId: "user-123",
limit: 20,
});
Memory is organized as a hierarchical tree of nodes, each with a NodeID, Title, Summary, and optional child nodes. Nodes act as thematic containers — "Jane's work life," "travel experiences" — and hold atomic facts beneath them. You can navigate the tree by passing parentID to list, or fetch a subtree with includeContents: true to pull a node's facts in one call.
Facts are atomic, source-anchored statements ("User is a senior product manager at Acme Corp"). Every fact traces back to a specific message in a real conversation — the agent cannot hallucinate memories. Summaries are auto-generated consolidations written at session end, giving long conversations a compact digest. Both live in the tree and both appear in search results.
timeline returns a chronological view organized by session — each TimelineSession carries session_id, facts, first_fact_at, last_fact_at, and fact_count. Use it to render episodic history in your UI or to audit what was extracted from a specific time window.
reset deletes all memory for an agent–user pair and is irreversible. Use it for testing, privacy-right-to-erasure flows, or account handoffs. All write operations (seed, createFact, search) accept an instanceId to scope memory to a workspace or tenant, preventing cross-boundary leakage in multi-tenant deployments.
Supplementary memory recall — the extra fact lookups that enrich each turn beyond the agent's automatic working set — runs synchronously by default: every fact lands in the current turn before generation starts. Switch to async when first-token latency matters more than completeness; recall races a deadline, and slow hits spill into the next turn.
memory_mode is an agent-wide capability. Set it once via update_capabilities(); every subsequent chat uses that mode until you change it. There is no equivalent at agent-creation time — create the agent first, then flip the mode.
// Read current capabilities
const caps = await client.agents.getCapabilities("agent-id");
console.log(caps.memoryMode); // "sync" or "async"
// Switch to async for lower first-token latency
await client.agents.updateCapabilities("agent-id", { memoryMode: "async" });
// Switch back to sync
await client.agents.updateCapabilities("agent-id", { memoryMode: "sync" });
When to pick async: high-volume voice agents, mobile clients on slow networks, or any setup where missing one or two enrichment facts is preferable to a 200ms latency spike. The agent's automatic working set still lands on every turn — only supplementary recall slips.
AgentCapabilities.pendingCapabilities is a list of capability changes that have been queued by the platform but not yet applied — for example, a tier upgrade that will unlock music or video generation. Each entry carries a capability name (string) and an optional context string with human-readable detail. Read it via get_capabilities() to surface upgrade status in your UI.
const caps = await client.agents.getCapabilities("agent-id");
for (const pending of caps.pendingCapabilities ?? []) {
console.log(pending.capability, pending.context);
// e.g. "musicGeneration" "Scheduled for activation on plan upgrade"
}
All methods are on client.agents.memory.* (TS/Python) or client.Agents.Memory (Go). Full request/response shapes live in the API reference.
Method
Returns
Description
list(agentID, opts)
MemoryTreeResponse
Browse the memory tree, optionally rooted at a parentID. Pass memory_type to filter results to a specific memory category: "factual", "episodic", "semantic", "procedural", "identity", "temporal", or "relational". This is a post-fetch filter applied on the result set — it does not reduce server-side I/O, so the limit applies before filtering.
search(agentID, opts)
MemorySearchResponse
Semantic/keyword search; returns Results[] with content, factType, score. Pass userId (user_id in Python/JSON) to scope results to a single user; omit to search across all users for the agent.
timeline(agentID, opts)
MemoryTimelineResponse
Chronological sessions with first_fact_at, last_fact_at, fact_count
listFacts(agentID, opts)
FactListResponse
Paginated flat list of atomic facts; response has Facts, TotalCount, HasMore
reset(agentID, opts)
MemoryResetResponse
Delete all memory for an agent–user pair
createFact(agentID, opts)
AtomicFact
Manually insert a fact tagged source_type="manual"
updateFact(agentID, factID, opts)
AtomicFact
Patch content, type, importance, or confidence of an existing fact
deleteFact(agentID, factID)
void
Remove a single fact by ID
seed(agentID, opts)
SeedMemoriesResponse
Bulk-import initial memories without an AI generation step
When a scheduled reminder fires and the user replies, the memory layer auto-captures the reply as a fact. Query those facts later to build a compliance view or adherence dashboard without an extra database.
// After a week of daily medication reminders, query the captured repliesconst memories = await client.agents.memory.search("agent-id", { query: "medication taken ibuprofen", limit: 10,});for (const result of memories.results) { console.log(result.content, result.score); // "User confirmed taking 500mg ibuprofen at 08:14" 0.89}
Memory is fully automatic during chat — you do not call any write endpoint yourself. The platform analyzes each conversation turn, extracts facts, events, and commitments, and stores them in the tree. The next time you call chat for that agent–user pair, the most relevant memories are assembled into context automatically.
// Just call chat — memory extraction and retrieval happen on every turnconst stream = client.agents.chat.stream("agent-id", { userId: "user-123", messages: [{ role: "user", content: "I've been training for a half marathon." }],});// After the conversation, the fact "user is training for a half marathon"// is stored automatically — no extra call needed.const results = await client.agents.memory.search("agent-id", { query: "running training marathon", limit: 5,});console.log(results.results[0].content);// "User is training for a half marathon"
Habits, goals, interests, and mood trends are derived signals the context engine aggregates over memory facts. Memory is what the engine reads; Agent Insights is what the engine produces. Search memory for raw facts, then call Agent Insights to see what those facts have been distilled into.
// 1. Fetch raw memory facts about fitnessconst facts = await client.agents.memory.search("agent-id", { query: "exercise fitness running", limit: 10,});// 2. Fetch the derived habit signal the engine built from those factsconst habits = await client.agents.listHabits("agent-id", { userId: "user-123",});console.log(habits.habits);// [{ label: "Daily runner", frequency: "daily", confidence: 0.91 }]
Agent Insights — derived signals (habits, goals, interests) built on top of memory facts.
Scheduled Reminders — proactive messages whose user replies flow back into memory.
Conversations — every chat turn is the primary source of memory writes.
MULTIPLAYER MEMORY
Multiplayer Memory
The default agent memory model is per-pair — every conversation builds a fact profile scoped to one (agent, user) pair. That isolation is the right default for privacy, but the moment your product has more than one agent or more than one user per agent, you want memory to cross the boundary in controlled, observable ways.
Multiplayer memory is the umbrella for those crossing capabilities. It splits cleanly along two axes:
Axis
What crosses
Real-world shape
Capabilities
Inter-agent
Knowledge between agents on the same project (or tenant).
A closed-loop company brain — agent A learns; agent B picks it up.
Both axes can run simultaneously. The full picture: agents on the same project share the world they've learned about (inter-agent) and a single agent shares context about the people it serves (intra-agent). Same compounding curve, two dimensions.
INTER-AGENT (across agents)
shared knowledge base, autonomous updates,
org-wide scope, closed-loop company brain
|
v
+-------------------------+ +-------------------------+
| Agent A | | Agent B |
| reads + writes KB |<---->| reads + writes KB |
+-------------------------+ +-------------------------+
^ ^ ^ ^ ^ ^
| | | INTRA-AGENT (across users) | | |
| | | wisdom (de-attributed), | | |
| | | shared memory (attributed) | | |
+--------+ | +---------+ +-------+ | +---------+
| | | | | |
user X1 user X2 user X3 user Y1 user Y2 user Y3
Inter-agent: anything any agent learns is grounded data
for every other agent on the project.
Intra-agent: a single agent carries memory across the
users it serves -- with privacy guardrails.
Inter-agent memory turns the project knowledge base into a closed-loop company brain: anything one agent learns or verifies during a conversation becomes grounded data every other agent on the project retrieves on the next session. Three layers stack from baseline to organization-wide.
Any agent with knowledgeBase: true reads the project knowledge graph during conversations via the knowledge_search tool. The graph is hand-curated, ETL-loaded, or both — see How knowledge gets into the KB for the two ingestion paths.
Flip knowledgeBaseWrite: true and the agent gets knowledge_create / knowledge_update / knowledge_delete tools. During conversations the agent records verified facts itself, with a full audit trail (source = "agent:<agent-id>") and compare-and-swap update semantics so admin edits don't get clobbered. The next agent that runs knowledge_search on the same topic retrieves what the previous agent wrote down.
Use this when the source of truth IS the conversation — support agents recording verified incident details, customer-success agents capturing renewal context, scribe agents writing meeting notes. Detail: Agents writing to the knowledge base.
Set knowledgeBaseScopeMode: "cascade" on an agent and it reads from both the project KB and the org-scope KB on every search. The org scope is for tenant-wide artefacts: policies, lore, brand, reference catalogs. Project wins on collisions; org fills in defaults.
Intra-agent memory turns a single agent into a team brain: one agent serving multiple users carries memory that crosses the user boundary, so it can inform user A with the context it gathered while talking to user B. Two complementary tiers.
wisdom is on for every new agent. A daily promotion job pulls patterns from per-user fact histories, k-anonymises them, and rewrites the result through an LLM into agent-wide knowledge. No individual user is identifiable. Every agent benefits from "what tends to work / what tends to come up" without ever leaking who said what.
// Wisdom is on by default. Pass false only to opt out// (rare — usually only for strict single-user products).await client.agents.updateCapabilities("agent_abc", { wisdom: false });
This is the safe intra-agent layer — privacy-protected by construction, no opt-in required.
sharedMemory: true is the powerful intra-agent layer. The agent records person/entity-attributed facts (roles, expertise, business context, relationships) and surfaces them to other users sharing the agent — with names visible. "Alice owns the migration; Bob is on incident response." "Carol brings dessert; Dave does setup."
Three things flip when you turn it on: the agent gets sonzai_wisdom_set/update/delete/relate tools; the prompt grows a "Shared facts" section with a discretion clause; every write is server-side validated against a privacy floor (compensation, health, politics blocked). Every disclosure is logged to the audit table. Full detail: Shared Memory.
Each capability has a live read endpoint you can hit to confirm the loop closes. Replace $AGENT_ID, $PROJECT_ID, $API_KEY with your own.
Inter-agent — KB writes
# Search the project KB — does an agent write show up?curl 'https://api.sonz.ai/api/v1/projects/$PROJECT_ID/knowledge/search?q=YourQuery' \ -H "Authorization: Bearer $API_KEY"
Intra-agent — attributed shared memory
# List attributed facts on the agentcurl 'https://api.sonz.ai/api/v1/agents/$AGENT_ID/wisdom/attributed?limit=20' \ -H "Authorization: Bearer $API_KEY"# Read the disclosure audit — every fact disclosed in a turn is loggedcurl 'https://api.sonz.ai/api/v1/agents/$AGENT_ID/wisdom/audit?limit=50' \ -H "Authorization: Bearer $API_KEY"
Intra-agent — wisdom (default-on)
The de-attributed wisdom layer surfaces inline in every prompt the agent runs once the daily promotion job has scanned per-user fact histories — no separate read endpoint. To verify it's running, watch agent context size over a 48-hour window after multi-user traffic; you should see the wisdom block populate.
Shared Memory — the intra-agent surface: enable/disable, four wisdom tools, privacy floor, full verification probes
Self-Improvement — how multiplayer memory layers on top of per-pair online learning
Wisdom API — full endpoint reference for shared memory CRUD + audit
PROACTIVE BEHAVIOR
Notifications (Polling)
Proactive messages — generated by recurring schedules, one-off wakeups, or tenant-triggered events — land in a per-user notifications queue the moment they fire. Your frontend or backend polls that queue to fetch pending messages, display them to the user, and mark each one consumed. No push infrastructure, no webhook endpoint, no server-side listener to maintain — just an HTTP GET on your schedule.
This is the recommended delivery pattern for web clients and mobile apps that can't accept inbound HTTP requests, and it doubles as a handy catch-up mechanism for users who were offline when messages were generated.
When a proactive message fires — whether from a schedule, a wakeup, or a trigger event — the platform enqueues it for the relevant user. The queue is per-user, per-agent. Calling list returns only messages in pending state; calling consume transitions a specific message to consumed. Consumed messages are excluded from future list responses but remain visible in history. The queue does not auto-expire: messages stay pending indefinitely until your code marks them consumed.
If the user has an active SSE chat stream open, proactive messages appear inline in the conversation automatically — no polling needed. Polling is the catch-up mechanism for users who do not have a live stream. The two patterns are complementary: SSE for foreground delivery, polling for background or offline users.
notifications.history is separate from notifications.list. It returns all historical notifications for an agent (including already-consumed ones) and is useful for audit trails, moderation dashboards, and debugging. It does not filter by user_id — it returns across all users up to the requested limit.
All methods are on client.agents.notifications.* (TS/Python) or client.Agents.Notifications (Go). Full request and response shapes live in the API reference.
Method
Signature
Returns
Description
list
list(agentId, { user_id?, limit? })
{ notifications: Notification[] }
Fetch pending messages for a user
consume
consume(agentId, messageId)
void
Mark a single message consumed
history
history(agentId, limit)
{ notifications: Notification[] }
Fetch all historical notifications (consumed + pending)
Pass this to consume to mark the message delivered
UserID
user_id
The user this notification was generated for
CheckType
check_type
The check type (e.g. "reminder", "interest_check", "birthday")
GeneratedMessage
generated_message
The actual text the agent produced — display this to the user
CreatedAt
created_at
When the message was enqueued (RFC 3339 UTC)
ScheduleID
schedule_id
Set if the message originated from a schedule; otherwise absent
WakeupID
wakeup_id
Set if the message originated from a wakeup; otherwise absent
Use the correct field names
Older code may use id, notificationId, type, or content. These are incorrect. The canonical fields are message_id, check_type, and generated_message. Using the wrong field names will result in silent failures when calling consume.
A schedule defines when the agent fires; polling is one way to receive what it produced. When a schedule's cadence fires, the platform generates the agent's message and enqueues it. Your client polls, displays generated_message, then calls consume to clear it from the queue. The schedule and delivery are fully decoupled — you can swap in webhooks or SSE without touching the schedule definition.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// 1. Create a daily 09:00 check-in schedule (done once, e.g. at onboarding)
await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "morning check-in on mood and sleep",
check_type: "reminder",
});
// 2. On each app foreground, poll for what the schedule produced
const pending = await client.agents.notifications.list("agent_abc", {
user_id: "user_123",
limit: 5,
});
for (const n of pending.notifications) {
showInAppBanner(n.generated_message);
await client.agents.notifications.consume("agent_abc", n.message_id);
}
A wakeup fires once at a specific moment; polling retrieves the message it generated. This is the natural delivery pattern for one-off agent outreach in mobile clients where webhooks are unavailable. Schedule the wakeup when the event is known (e.g. "follow up 24 hours after purchase"), then poll periodically — the message lands in the queue the moment the delay elapses.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// 1. Schedule a one-off wakeup (e.g. after a user completes onboarding)
await client.agents.scheduleWakeup("agent_abc", {
user_id: "user_123",
check_type: "interest_check",
intent: "check in about how onboarding went",
delay_hours: 24,
});
// 2. Poll for the message when it fires (24 h later)
const pending = await client.agents.notifications.list("agent_abc", {
user_id: "user_123",
limit: 5,
});
for (const n of pending.notifications) {
console.log(n.check_type, n.generated_message);
await client.agents.notifications.consume("agent_abc", n.message_id);
}
Polling and webhooks are two delivery patterns for the same underlying notifications queue. Choose based on your infrastructure:
Polling — your client asks the server for new messages on a schedule. Simple to implement, works in browsers and mobile apps, no inbound connectivity required. Latency is bounded by your polling interval.
Webhooks — the server pushes each message to a URL you register the moment it fires. Lower latency, better for server-to-server integration and multi-channel fanout (email, SMS, push notifications). Requires a public HTTPS endpoint to receive callbacks.
You can use both simultaneously: poll from mobile clients for in-app delivery and register a webhook on your backend for email/SMS fanout. The queue tracks consumed state per message, so a message consumed via polling will not appear in webhook delivery (and vice versa).
Medication Reminders — full-stack example combining Schedule + Inventory + Memory; shows the end-to-end flow from schedule creation to polling the generated reminder.
The organization-global Knowledge Base is an opt-in second scope that sits above every project's own Knowledge Base, letting agents across all projects under a tenant read shared facts — HR policies, brand standards, product catalogs, multi-game lore — without duplicating data per project. Each agent picks a scope mode (project_only, org_only, cascade, or union) to control how org and project graphs combine. Cascade is the recommended default: project facts win on ID collisions, so local overrides remain authoritative.
By default, the Knowledge Base is project-scoped. Every project has its own isolated graph. That is the right model for most tenants — a project's data should not leak into other projects' agents.
The organization scope is an opt-in second scope that sits above every project. Knowledge written here is readable by every project agent under the tenant that opts into a cross-scope reading mode. Typical uses:
Tenant (organization)
|
|-- Organization-global KB (scope_id = "")
| - policies, shared lore, brand, reference catalogs
| - written by tenant admins via the org endpoints
|
|-- Project A KB (scope_id = project_a_id)
| - A's own uploaded docs + API-pushed facts
|
|-- Project B KB (scope_id = project_b_id)
| - B's own uploaded docs + API-pushed facts
|
Agents under any project choose how to read across the two scopes:
- project_only legacy: just the agent's project KB
- org_only only the organization-global KB
- cascade both, project wins on ID collisions (recommended)
- union both, first occurrence wins
Every agent has a knowledgeBaseScopeMode capability. Leaving it unset preserves the legacy project-only behavior. To enable the cascade, set it via the capabilities endpoint or the dashboard.
Enable the knowledge base capability and set the project ID via the SDK:
// Enable the knowledge base + org cascade for the agent
await client.agents.updateCapabilities(agentId, {
knowledgeBase: true,
knowledgeBaseScopeMode: "cascade",
});
If a fact already lives in a project KB and you want to share it organisation-wide, promote it. The project copy is preserved — promotion is additive. If an org node with the same (node_type, norm_label) already exists, the server returns that one instead of writing a duplicate.
When an agent with a non-default scope mode calls knowledge_search during a conversation, the platform runs the search against both scopes in parallel and fuses the results using Reciprocal Rank Fusion (RRF). Each returned result carries a scope field so your prompt can show the LLM where a fact came from.
Scope modes differ in how they merge on a collision:
cascade (recommended): project wins on duplicate node IDs. Agents keep their own overrides, but inherit the org defaults when a project doesn't define something.
union: first occurrence wins; both scopes contribute equally to ranking. Useful when you want broad coverage without a strong preference.
org_only: skip project KB entirely. Useful for reference-only agents (FAQ bots on company policy, e.g.).
project_only (default): legacy behavior, org-scope facts are invisible to this agent.
Access control: the two org-scope write endpoints are gated by the same tenant-admin middleware used by the existing project-scoped KB endpoints. Standard project members see no new surface.
Backward compatibility: zero change for any existing agent. Agents stay on project_only mode unless you set a scope mode explicitly.
Idempotency: dedup is at (node_type, norm_label). Promotion returns the existing org node if one is already there; direct createOrgNode will create a second node with a different NodeID — check before calling if that matters.
Per-scope BM25: each scope maintains its own BM25 index and document-frequency corpus. This is why the cascade uses RRF instead of score-adding — the raw scores from two separate indexes are not directly comparable.
IDENTITY
Personality System
Personality in Sonzai is a Big Five (OCEAN) profile attached to every agent, mapped internally to ten BFAS facets and a set of behavioral traits — response length, question frequency, empathy style, conflict approach — that shape how the agent actually talks. You set five 0.0-1.0 scores at creation; the Mind Layer derives the prompt, speech patterns, and mood baselines from there. The most load-bearing detail: personality drifts slowly through interaction, with safety caps to prevent runaway change, and you can inspect the full evolution history at any time.
Every agent has Big Five (OCEAN) personality scores. Behavioral traits, mood baselines, speech patterns, and interaction preferences all derive from these scores.
Openness (0.0 - 1.0): Curiosity, creativity, openness to experience. High = imaginative, adventurous. Low = practical, conventional.
Internally, the platform maps Big5 scores to 10 BFAS (Big Five Aspect Scales) facets. These facets provide finer-grained control over personality and are exposed in the personality profile response:
Big5 Domain
Facet 1
Facet 2
Openness
intellect
aesthetic
Conscientiousness
industriousness
orderliness
Extraversion
enthusiasm
assertiveness
Agreeableness
compassion
politeness
Neuroticism
withdrawal
volatility
Each facet is a 0.0-1.0 score derived from the parent Big5 dimension. You can read them from the personality profile but do not need to set them manually — they are computed from your Big5 scores.
The platform automatically derives a per-user personality overlay — how the
agent subtly adapts to a specific user based on their conversation history,
preferences, and relationship state. You don't set overlays manually; they're
populated by the same pipeline that runs after every chat turn.
Read the current overlay for UI (show how the agent's tone shifts per user)
or analytics:
// List all users who have a personality overlay for this agent
const overlays = await client.agents.personality.listUserOverlays("agent-id");
// Read one user's overlay
const overlay = await client.agents.personality.getUserOverlay("agent-id", "user-123");
console.log(overlay.big5Delta, overlay.interactionPreferences);
Create an independent copy of an agent with its own personality, memory, and state. The forked agent starts with the same configuration as the original but evolves independently from that point forward.
const forked = await client.agents.fork("agent-id");
console.log(forked.agentId); // new independent agent
All three audiences use personality, but what you tune and why differs
sharply.
Personality is the character. Big Five + speech patterns + interests
are what make Luna feel like Luna. Tune high openness (0.8+) and moderate
agreeableness for warmth; low conscientiousness for whimsy; moderate
neuroticism for emotional range.
Let it evolve. Trait drift is a feature — long-term users want to
feel their companion grew with them. Don't suppress evolution; read
history to surface "How Luna has changed" moments in your UI.
const shifts = await client.agents.personality.history("agent-id", { userId: "user-123", since: "2026-01-01",});// Render major shifts as narrative beats in your UI
Speech patterns matter more than scores. Define 3-5 distinctive
turns of phrase in the bio — these carry the voice even more than the
Big5 profile.
The post-processing pipeline runs after every session and can push Big5 updates back into the personality profile. Use Personality.Get before and after a session to observe evolution events and surface growth moments to users.
// Before the session — baseline snapshot
const before = await client.agents.personality.get("agent-id");
console.log(before.big5.openness); // e.g. 0.72
// … session runs, self-improvement pipeline fires …
// After the session — check for evolution
const after = await client.agents.personality.get("agent-id");
console.log(after.big5.openness); // e.g. 0.74 after a curiosity-rich session
// Inspect what changed
const history = await client.agents.personality.history("agent-id");
for (const shift of history.shifts) {
console.log(shift.trait, shift.delta, shift.triggeredBy, shift.createdAt);
// trait: "openness" delta: 0.02 triggeredBy: "session:xyz"
}
The triggeredBy field ties each shift back to the session or event that caused it, giving you an audit trail for every personality change.
With Generation — initial personality from character generation
GenerateCharacter produces a fully-formed Big5 profile as part of its output. You can use that as the starting point for an agent and then refine scores with Personality.Update once you know how you want the character to feel in practice.
// 1. Generate a character — returns initial Big5 scores
const character = await client.generation.generateCharacter({
concept: "A witty, empathetic travel companion with a love of history",
});
// character.big5 already has plausible OCEAN values
console.log(character.big5);
// { openness: 0.85, conscientiousness: 0.55, extraversion: 0.70,
// agreeableness: 0.78, neuroticism: 0.28 }
// 2. Create the agent with those scores
const agent = await client.agents.create({
name: character.name,
bio: character.bio,
big5: character.big5,
});
// 3. Refine after reviewing the generated profile
await client.agents.personality.update(agent.agent_id, {
big5: { conscientiousness: 0.65 }, // a bit more organized than generated
confidence: 0.7,
});
With User Personas — agent personality × user persona = interaction style
The agent's Big5 profile is one half of every conversation; the user's persona is the other. The platform combines both at context-build time: a high-agreeableness agent talking to an introverted user will naturally soften its tone and ask fewer questions, while the same agent talking to an assertive user will match energy and be more direct.
You don't wire this up manually — pass the userId on each chat turn and the platform resolves the right overlay automatically:
// The platform blends agent Big5 + user persona under the hood.
// Just pass userId on each turn.
const response = await client.agents.chat("agent-id", {
userId: "user-123",
message: "What should I visit in Kyoto?",
});
// Inspect the combined interaction preferences if you want to render UI hints
const overlay = await client.agents.personality.getUserOverlay("agent-id", "user-123");
console.log(overlay.interactionPreferences.conversationPace); // "moderate"
console.log(overlay.interactionPreferences.formality); // "casual"
The per-user overlay is updated automatically by the pipeline — you read it; you don't write it.
KNOWLEDGE
Priming
Priming is how you tell a new agent what it already knows about a user. Instead of waiting for the agent to learn through conversation, you deliver the relevant facts up front: who the user is, where they came from, and what they've said before — all before the first message is exchanged.
Migrations from other LLM frameworks — import chat history from Zep, Mem0, Letta, OpenAI Assistants, LangChain, Character.AI, or any custom transcript store
CRM / CSV bulk imports — prime thousands of users in one call with structured contact data
Chat-transcript seeding — let the agent "remember" previous conversations from another system
Display-name + timezone bootstrap — ensure the agent addresses users correctly from turn 1
Onboarding enrichment — load journal entries, support tickets, or prior interactions so the agent sounds familiar on the user's very first chat
Prime a single user with their display name, timezone, and a short narrative block:
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const job = await client.agents.priming.primeUser("agent_abc", "user_123", {
display_name: "Mia Tanaka",
metadata: {
timezone: "Asia/Tokyo",
company: "Acme Corp",
title: "Platform Lead",
email: "[email protected]",
},
content: [
{
type: "text",
body: "Mia joined Acme in 2023 and leads the platform team. She prefers async communication and is an avid coffee enthusiast.",
},
],
source: "crm_onboarding",
});
console.log(job.job_id, job.status, job.facts_created);
The call returns immediately with a job_id. LLM fact-extraction runs asynchronously in the background — the primed facts appear in memory within seconds.
These are two distinct channels for different kinds of information:
Metadata is structured and first-class: display_name, company, title, email, phone, timezone, and a custom map for anything else. Sonzai generates facts from metadata fields synchronously — no LLM extraction required — so facts_created is non-zero even with no content blocks.
Content is narrative. Content blocks go through the full LLM extraction pipeline and end up as facts in the agent's memory constellation, exactly as if the user had said those things in a conversation.
Narrative facts, bullet-point summaries, freeform notes about the user
"chat_transcript"
A prior conversation from another system. Format as User: …\nAgent: … lines, one session per block
The extraction pipeline deduplicates across all blocks — you can safely send both raw transcripts and pre-extracted facts from the same source without producing duplicate memories.
Calling primeUser more than once for the same user is safe. Content blocks are processed through the same deduplication pipeline as live chat — repeated or overlapping facts are merged, not doubled.
Content blocks flow through the exact same extraction pipeline as conversational messages. After priming, you can search for primed facts via memory.search:
// After primeUser completes, primed content is searchable
const results = await client.agents.memory.search("agent_abc", {
query: "platform team",
userId: "user_001",
limit: 5,
});
for (const mem of results.results) {
console.log(mem.content, mem.factType, mem.score);
}
Primed facts carry a source_type matching the source string you passed to primeUser or batchImport, so you can distinguish migrated history from organically-learned facts when querying.
Use structured_import inside primeUser to seed per-user inventory items alongside narrative facts. This is how you import ownership tables, subscription rosters, or product holdings from a CRM export:
The Migrations overview lists per-source recipes with full export + import code for every common origin system. Priming is the underlying mechanism each guide uses — the migration guides show you exactly how to shape your existing data into content blocks.
Proactive messaging is when the agent initiates contact rather than responding to user input. Messages can originate from three sources — a recurring schedule, a one-off wakeup, or an event your backend triggers — and are delivered through three channels: the live SSE chat stream, a polling notifications API, or a webhook your server receives.
Scheduled Reminders — recurring cadence (daily / weekly / hourly). Developer-configured. Use when a message must repeat on a predictable rhythm — medication reminders, habit nudges, daily check-ins.
Wakeups — a single one-off message at a specific moment, expressed as a delay from now. Agent- or developer-initiated. Use for birthdays, post-purchase follow-ups, or any event that fires exactly once.
Trigger Event — your backend calls TriggerEvent when something non-conversational happens (level-up, milestone, external state change). Use when the message is reactive to your own system events rather than time.
SSE (live chat stream) — if the user has an active chat stream open, the proactive message appears inline in their conversation automatically.
Polling (client.agents.notifications.*) — your frontend or backend polls the notifications API on a schedule. Works well for web dashboards and mobile apps that check for new content when they foreground.
Webhooks — register a URL once; Sonzai POSTs every proactive message to it. Use for push notifications, email/SMS fanout, or any server-to-server integration.
A schedule or wakeup can reference an inventory_item_id. At fire time the platform reads the item's current properties, so the agent always has up-to-date information — even if the item changed since the schedule was created.
// Schedule that reads live inventory data at every fireawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["08:00"] }, timezone: "Asia/Singapore" }, intent: "remind the user about their medication", check_type: "reminder", inventory_item_id: "inv_01HX...",});
When a proactive message triggers a user reply, the memory layer captures the exchange automatically. Query those memories later to build engagement or adherence dashboards.
// After firing reminders, search memory for user responsesconst memories = await client.agents.memory.search("agent_abc", { query: "medication taken", limit: 10,});
Scheduled Reminders let your agent message users on a schedule — daily, weekly, or every few hours. The platform handles timezones, DST, and quiet-hours automatically, and reads live structured data at fire time so messages always reflect current information. Use it for medication reminders, habit nudges, daily check-ins, or any time-based message you want the agent to initiate.
Create a daily 09:00 Asia/Singapore check-in. The response contains schedule_id, next_fire_at (UTC), and next_fire_at_local (in the schedule's timezone).
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const schedule = await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "check in on how the user is feeling",
check_type: "reminder",
});
console.log(schedule.schedule_id); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-04-22T01:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-04-23T09:00:00+08:00"
A cadence tells the platform when to fire. Two mutually exclusive shapes are supported: simple and cron. The simple shape covers most use cases through a frequency field with three options: "daily" fires at each listed times entry every calendar day; "weekly" fires on specified days_of_week at each listed time; "interval_hours" fires repeatedly at a fixed interval starting from starts_at (or schedule creation if omitted). All wall-clock times are evaluated in the schedule's timezone.
For advanced recurrence patterns, use the cron shape with a standard 5-field cron expression (e.g. "0 9 * * 1-5" for 09:00 on weekdays). The timezone field is required in both shapes — IANA names only (e.g. "America/New_York"), not UTC offsets.
The active_window field is a belt-and-braces filter layered on top of the cadence. The cadence computes when a fire would occur; the active window decides whether that fire actually produces a proactive message. Fires outside the window are skipped, not deferred — the cadence grid stays perfectly predictable and no backlog accumulates.
Both sub-fields are optional. When start is greater than end, the window wraps midnight — for example {"start": "22:00", "end": "06:00"} allows fires from 22:00 to 05:59 the next morning. This is useful for night-shift users or schedules targeting early-morning timezones where local midnight matters. Day membership is always evaluated in the schedule's own timezone, so a fire at 23:30 Friday Singapore time stays Friday even when stored as 15:30 UTC.
Pass inventory_item_id on the create (or update) body to link a schedule to a structured item in the user's inventory — a medication, a goal, a plant, anything with named properties. The key property of this linkage is that the platform reads the item's live properties at every fire, not at schedule creation time. This means updating a medication's dosage, a goal's target, or any other property is automatically reflected in the next reminder without any schedule edit. The schedule is the source of truth for when; the inventory item is the source of truth for what.
Use starts_at and ends_at (both RFC 3339 UTC) to constrain a schedule to a specific window of time. No fire is produced before starts_at; once ends_at passes, the schedule is automatically disabled — enabled flips to false. The schedule row is not deleted: the audit trail, historical fire log, and linked inventory reference remain accessible. This is a soft-disable, not a hard delete. To permanently remove a schedule and all associated fire history, use the delete method explicitly.
Every schedule can reference an inventory_item_id pointing to a structured per-user item (e.g. a medication, a goal, a plant). At each fire, the platform reads the item's live properties and injects them into the agent's wakeup block — no schedule edit needed when the data changes. This is how a "reduce ibuprofen from 500mg to 250mg" change flows through to the next reminder automatically.
// 1. Add an inventory item (e.g. a medication)const item = await client.agents.inventory.update("agent_abc", "user_123", { action: "add", item_type: "medication", description: "Ibuprofen", project_id: "proj_abc", properties: { medication_name: "ibuprofen", dosage: "500mg" },});// 2. Link the schedule to it — no duplicated dataawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["08:00", "20:00"] }, timezone: "Asia/Singapore" }, intent: "remind the user to take their ibuprofen at the correct dose", check_type: "reminder", inventory_item_id: item.fact_id,});// 3. Later, the dose changes — the next fire automatically sees "250mg"await client.agents.inventory.directUpdate("agent_abc", "user_123", item.fact_id, { properties: { dosage: "250mg" },});
With Wakeups — recurring vs one-off proactive messages
Schedules and Wakeups are both proactive primitives but serve different cases. Use a schedule when the agent should reach out on a repeating cadence (daily, weekly, every 4 hours). Use a wakeup when the agent should reach out once at a specific moment — a birthday, a known one-off event, or an agent-initiated interest check. Both feed into the same downstream delivery channels (SSE, polling, webhooks — see Proactive messaging).
// Recurring: Scheduleawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Singapore" }, intent: "morning check-in on mood and sleep", check_type: "reminder",});// One-off: Wakeupawait client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "birthday", intent: "wish user happy birthday on their 30th", delay_hours: 24,});
When the agent fires a scheduled reminder and the user responds ("took it, thanks"), the memory layer auto-captures the adherence fact. You can query these facts later to build a compliance view without adding a separate database — useful for tenant-side dashboards or escalation logic.
// After a week of firing daily medication reminders, query memory for responsesconst memories = await client.agents.memory.search("agent_abc", { query: "medication taken ibuprofen", limit: 10,});for (const result of memories.results) { console.log(result.content, result.score); // "User confirmed taking 500mg ibuprofen" 0.87}
Inventory — structured per-user items that schedules can reference.
Memory — how user responses to reminders flow into long-term memory.
IDENTITY
Self-Improvement (Post-Processing)
When a session ends, Sonzai kicks off a multi-stage async pipeline against everything that was said. It extracts and verifies new facts, consolidates duplicates, updates personality scores and mood baselines, writes a reflective diary entry, scores retrieval quality, and feeds that score back into per-pair retrieval weights. By the time a user returns, the agent already knows what happened last time — and its retrieval has been re-tuned for that specific(agent, user) pair.
Underneath the pipeline, the Sonzai mind layer runs continuous machine-learning model training against live traffic: per-pair stochastic gradient descent, multi-armed bandits over memory clusters, a shadow-mode policy-gradient learner with automatic regression rollback, per-pair hyperparameter auto-tuning, and an OPRO-style prompt optimiser. All of it ships behind a stable SDK. You don't run the training loop — you keep ending sessions, and the per-pair memory layer keeps getting sharper.
Fully automatic
Self-improvement is triggered by sessions.End(). Everything on this page happens as a result of that one call. The next time you read memory, personality, or insights, the new state is already there.
Roll your own memory + learning stack With Sonzai
------------------------------------- --------------------
vector store + retrieval |
dedup + conflict resolution |
personality + mood engine | sessions.End()
reward signal + eval harness | |
training + evaluation pipeline | v
shadow rollout + auto-revert |
drift monitoring | all of it,
per-user tuning loops | automatic
prompt sweeps + regression tests |
on-call for runaway behaviour |
------------------------------------- --------------------
~ 12 months of platform work one afternoon
You wire up sessions.End() once. Sonzai does the rest:
No training infrastructure. No fine-tuning runs, no eval harness to maintain, no per-user model artefacts to ship. The online-learning, RL, and auto-tuning loops are operated by Sonzai's applied-research team and ride behind a stable SDK.
Per-user personalisation, automatic. Every (agent_id, user_id) pair gets its own retrieval predictor weights, cluster-sampling posterior, traversal graph, learning-rate schedule, and value function. Two users on the same agent see different memory layers within a handful of sessions — no per-user code, no profile training, no embeddings pipeline to operate.
It actually compounds. Each session's reward is observed from fact reuse, re-retrieval, engagement, and explicit feedback, then fed back into the weights, the bandit posteriors, the critic, and the prompt optimiser. The next session is measurably better than the last, and the gap widens as the relationship deepens.
Safe by default. New policies run in shadow until a per-pair promoter confirms a sustained advantage over the baseline; regressions auto-revert. Production memory never gets dragged off a good optimum by a noisy day.
Predictable cost. Post-processing runs on a cheaper model than chat, and the tuning loop trains on signals you're already producing — not extra LLM calls per turn. The smarter your agent gets, the more efficient retrieval becomes.
For most teams this is the difference between we'll get to memory next quarter and our agents already remember every user, and the memory layer keeps getting smarter every week. Rolling your own — vector store + dedup + per-user fine-tuning + RL eval harness + prompt sweeps + safe-rollout machinery — is a 12-month detour. With Sonzai it's one SDK call.
Personality drift over time — the agent evolves character and relationship stance through repeated use, with no manual tuning
Diary generation per session — the agent writes reflective summaries in its own voice, available as future context
Automatic fact consolidation — duplicate and contradictory facts are merged or superseded; memory stays compact
Breakthrough detection — milestone moments fire on completed sessions and land in the evolution history for narrative use
Relationship tracking updates — stance, love score, and per-user personality overlays all update after each session
Per-(agent, user) retrieval that sharpens with use — online and RL loops adapt the predictor's dimension weights, cluster sampling, and traversal edges per pair, so a returning user gets retrieval that fits their pattern, not the cohort average
There is no direct API for the self-improvement pipeline. It is triggered exclusively by ending a session. Set Wait: true during development if you need to query memory or personality immediately after the call; in production, leave it false and let the pipeline run async.
// End the session — this triggers the post-processing pipeline._, err := client.Sessions.End(ctx, agentID, sonzai.SessionEndOptions{ UserID: "user-123", SessionID: "sess-abc", TotalMessages: 12, DurationSeconds: 340, Messages: messages, // Wait: true // dev/test only — blocks until pipeline completes})if err != nil { return err}// On the next turn (or after Wait returns), the updated state is readable.personality, err := client.Personality.Get(ctx, agentID, nil)memory, err := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{UserID: "user-123"})
Triggered by SessionEnd — automatically. Every call to sessions.End() enqueues the pipeline. You do not need to call anything else.
Async by default. In production the call returns immediately and the pipeline runs in the background. Results are visible on the next read of memory, personality, or insights. Use Wait: true in tests or benchmarks when you need to assert on the new state in the same process.
Pipeline components. A single session end runs: fact extraction with source-anchoring verification, deduplication and conflict resolution, cluster reconciliation, personality drift application, mood baseline update, diary generation, next-session prediction, and session quality scoring.
Daily and weekly jobs layer on top. Immediate post-processing handles per-session work. Longer-horizon jobs (memory tree pruning, narrative arc compression, association decay, learning-pace checks) run on daily and weekly cadences. The workbench's Advance Time triggers these same jobs against simulated time.
Post-processing model. The pipeline uses a cheaper model than the chat model to keep costs low. The resolver cascade checks agent → project → account → system default. You can inspect or override the resolved model without running any inference.
// Check which model will run post-processing for this agent.effective, err := client.Agents.EffectivePostProcessingModel(ctx, agentID, "gemini-2.0-pro")// Pin a specific model at the agent level.err = client.Agents.UpdatePostProcessingModel(ctx, agentID, "gemini", "gemini-2.0-flash-lite")// Remove the agent-level pin (falls back to project/account/system).err = client.Agents.ClearPostProcessingModel(ctx, agentID)
The post-session pipeline runs every session. Underneath it, the runtime is continuously training how memory is processed for each (agent_id, user_id) pair — and Sonzai's applied-research team operates the online-learning, reinforcement-learning, bandit, and auto-tuning loops that govern it. Two pairs running the same agent end up with different predictor weights, different clusters surfaced, different traversal edges, and different schedules.
Day 1 | ###........................... ready out of the box
| verified extraction, dedup, clustering, and behavioural
| updates running from the first turn
Week 1 | #######......................... responsive, adapting
| confidence has moved on the facts the user really cares
| about; mood is responding; patterns forming
Month 1 | ##############................... personalised
| per-user retrieval converged; personality overlay has
| diverged; story arcs forming; this user is visibly
| remembered differently to the one before
Year 1 | #########################......... long-term partner
| compact, navigable memory; milestones earned; reflective
| diary; recurring-event awareness; retrieval sharper than
| day one
|
| Zero training code. Zero per-user logic. You called
| sessions.End() and went home.
Reward signal, compiled per session. A reward compiler turns each session's observable signals — what the LLM actually used, how the user engaged, and explicit feedback when present — into a single bounded scalar. Every loop below trains against this reward; nothing on your side has to be instrumented or labelled.
Per-pair retrieval predictor, tuned by stochastic gradient descent. Every session, an SGD update with momentum adjusts the dimensions the predictor weighs, using the LLM's actual fact reuse as the gradient signal. Asymmetric learn / forget rates (aggressive on confirmed positives, slow to discard) prevent weight collapse on a single noisy session.
Hyperparameter auto-tuning per pair. Learning rates aren't a fixed constant — a per-pair scheduler watches divergence and plateau signals across recent sessions and adapts each pair's learning rate independently. Healthy pairs get nudged up to keep adapting; unstable pairs are damped down so a bad day can't drag a good optimum off course. No knobs to tune on your side.
TD(0) critic + A2C policy gradient, in shadow with auto-revert. A per-pair linear value function estimates V(state) from observable features (sessions to date, recent F1, learning rate, relationship stage). An A2C actor consumes V(s) as its baseline with an entropy bonus to keep exploration broad. The A2C trajectory runs in shadow alongside production; a per-pair promoter compares it to the SGD baseline over a rolling window, and only confirmed sustained improvements graduate to production. On regression, the prior weights are restored automatically. Production never sees a half-trained policy.
Cluster bandit (Thompson sampling, Beta posterior). Every retrieved fact carries a cluster identity. Each session's reward is attributed back across the contributing clusters and used to update a Beta-distributed posterior per cluster — a multi-armed bandit. Useful clusters get sampled more often next session; cold ones get probed less. Posteriors are lineage-aware: when the self-organiser splits, merges, or retires a cluster, its evidence flows to its successors instead of being thrown away.
Hebbian edges across partitions. Co-accessed memory nodes grow associative edges between them, weighted by repeated co-occurrence. Edges cross the per-user and per-agent-wisdom partitions, so user-specific traversal patterns can pull in the agent's broader world knowledge — and the more the pair runs, the denser and more selective the personal traversal graph becomes.
Memory tree self-organisation. A self-organiser rebalances the per-pair memory tree from access statistics: hot nodes get promoted, oversized branches split, sparse siblings merge, and stale parent descriptions are regenerated by a bounded LLM pass so summarisation tracks what's actually being read.
Ebbinghaus-style retention. Long-horizon retention follows a spaced-repetition decay curve. Frequently-recalled facts strengthen and outlive their original importance score; cold facts decay and eventually drop out of hot retrieval — but high-importance facts floor at a retention threshold so the agent never forgets the things that matter.
OPRO-style prompt optimisation. Sonzai's team runs an OPRO-style optimiser over the post-processing prompts: claim-level F1 scoring against curated fixture sets, a stronger meta-LLM proposing targeted revisions for the worst failure modes, and the strongest variant surviving. The pipeline picks up the new prompt — no deployment on your end.
Grounding verification. Every extracted fact must cite a source message index and a verbatim source quote from the user's turn. A mechanical verifier rejects facts that fail substring or attribution checks, and rejected facts feed back as a self-correcting hint on retry. Hallucinated memory doesn't reach the store — and this layer costs no extra inference per turn.
The longer an (agent, user) pair runs, the more its memory layer reflects how that user actually thinks — which transitions matter, which clusters carry signal, which dimensions to trust, which schedule it learns on. The agent doesn't just remember more for a returning user; it remembers differently per user, with no tuning required on your side.
Same agent. Same prompt. Two different users.
=============================================
+--- user_A pair ------------+ +--- user_B pair ------------+
| | | |
| Remembers what matters | | Remembers what matters |
| to user_A | | to user_B |
| | | |
| > the work narrative | | > the music narrative |
| > formal tone | | > playful banter |
| > morning rhythm | | > late-night rhythm |
| > returns on Mondays | | > returns on Fridays |
| | | |
| Mood baseline: calm | | Mood baseline: bright |
| Relationship: familiar | | Relationship: close |
| | | |
+----------------------------+ +----------------------------+
Two memory layers, diverged purely from each user's own patterns.
No per-user code. No per-user prompt. No tuning required.
Per-pair learning is one layer. On top of it, agents read, write to, and learn from a shared knowledge base — and a single agent can carry attributed memory across the users it serves. The same compounding curve you saw above happens at the team level too.
Inter-agent — closed-loop company brain. Agents on the same project autonomously write verified facts back into the Knowledge Base (with knowledgeBaseWrite on). Anything agent A learns with user X is grounded data agent B retrieves the next time the same topic comes up — even with a different user. The whole project gets sharper every session, not just one pair.
Intra-agent — shared memory across users. A single agent serving a team carries memory across users via Wisdom & shared memory. wisdom (de-attributed cross-user generalisation) is on by default; sharedMemory (attributed cross-user context, for groups and teams) is one capability flip away — the agent informs user A with the context it gathered while talking to user B.
Organisation scope.Org-wide KB sits above projects: tenant-wide policies, lore, brand, and reference catalogs every project agent reads alongside its own. The cascade mode is recommended — project wins on collisions, org fills in defaults.
Just like a new hire benefits from every senior employee's notes, every new agent and every new conversation benefits from everything the team has already learned. The per-pair tuning loops keep getting sharper for that user; the multiplayer layer keeps getting smarter for the whole company.
There is no SelfImprovement resource. The pipeline is an internal implementation detail of SessionEnd. The table below shows the SDK methods that are either inputs to or outputs of the pipeline.
Method
Returns
Description
sessions.End(ctx, agentID, opts)
*SessionResponse
Ends a session and triggers the post-processing pipeline
personality.Get(ctx, agentID, opts)
*PersonalityResponse
Reads current Big Five scores and evolution history — updated after each pipeline run
personality.GetRecentShifts(ctx, agentID)
*RecentShiftsResponse
Lists recent personality drift events with timestamps and magnitudes
Ending a session is the only way to trigger post-processing. The Messages field carries the full conversation; the pipeline reads it to extract facts and compute session quality.
// End the session with the full message history._, err := client.Sessions.End(ctx, agentID, sonzai.SessionEndOptions{ UserID: "user-123", SessionID: "sess-abc", TotalMessages: 8, DurationSeconds: 210, Messages: conversationMessages, Wait: true, // block until pipeline finishes (dev only)})// Pipeline has run. New facts, updated personality, and diary entry are ready.facts, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{UserID: "user-123"})fmt.Printf("facts after session: %d\n", len(facts.Facts))
Every session end applies Big Five drift, updates the mood baseline, and can fire milestone events. Fetch personality before and after to see the delta.
before, _ := client.Personality.Get(ctx, agentID, nil)// ... run a session and end it (Wait: true for this demo) ...after, _ := client.Personality.Get(ctx, agentID, nil)shifts, _ := client.Personality.GetRecentShifts(ctx, agentID)moments, _ := client.Personality.GetSignificantMoments(ctx, agentID, 5)fmt.Printf("openness before: %.3f, after: %.3f\n", before.Personality.Openness, after.Personality.Openness,)fmt.Printf("recent shifts: %d, milestones: %d\n", len(shifts.Shifts), len(moments.Moments),)
The pipeline extracts new facts, deduplicates against existing memory, resolves conflicts, and updates importance and confidence scores. List memory after session end to see the new state.
// Before session end.before, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{ UserID: "user-123",})// ... run a session with substantive content, then end it (Wait: true) ...// After session end — new facts extracted, duplicates merged.after, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{ UserID: "user-123",})fmt.Printf("facts before: %d, after: %d\n", len(before.Facts), len(after.Facts))// Browse the full memory tree for cluster-level changes.tree, _ := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{ UserID: "user-123", IncludeContents: true,})
In the workbench, advancing the clock by 24 hours runs the same daily jobs that production runs overnight: memory decay, tree pruning, diary generation, cluster reconciliation, and mood drift back to baseline. This is the fastest way to verify that long-horizon evolution is working correctly before shipping.
// Advance 24 simulated hours — triggers daily pipeline jobs.result, err := client.Workbench.AdvanceTime(ctx, map[string]any{ "agent_id": agentID, "user_id": "user-123", "hours": 24,})// If the advance takes longer than your HTTP timeout, run it async.asyncResult, err := client.Workbench.AdvanceTime(ctx, map[string]any{ "agent_id": agentID, "user_id": "user-123", "hours": 168, // 1 week "async": true,})jobID := asyncResult["job_id"].(string)// Poll until done.for { job, _ := client.Workbench.GetAdvanceTimeJob(ctx, jobID) if job["status"] == "succeeded" || job["status"] == "failed" { break } time.Sleep(2 * time.Second)}// Read memory and personality to see the result of 1 week of background jobs.personality, _ := client.Personality.Get(ctx, agentID, nil)memory, _ := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{UserID: "user-123"})
Personality — read and configure the Big Five profile the pipeline evolves into
Memory — explore the fact store and memory tree the pipeline writes to
Sessions — the triggering surface for everything on this page
Advance Time — simulate days and weeks of pipeline runs in seconds
INTERACTION
Sessions
Sessions are Sonzai's unit of consolidation: one continuous conversation between an agent and a user, identified by a session_id you control. When a session ends, the platform extracts facts from the transcript, tags each one with the originating session, and runs the memory pipeline — dedup, cluster, decay — before the next session begins. You can let the platform auto-manage sessions on every chat call, or call sessions.start and sessions.end explicitly when you need to register custom tools, replay historical transcripts, or pin boundary timing to a real-world event.
A session is one continuous conversation between an agent and a user, identified by a session_id you control. Sessions are Sonzai's unit of consolidation: when a session ends, the platform extracts facts from the transcript, tags every fact with its source session_id, and runs the memory pipeline (dedup, cluster, decay) before the next session begins.
Sessions are not a wrapper around individual messages — they're how Sonzai knows which messages belong together for extraction. A session can last seconds or days.
You always have a session
Every /chat call belongs to a session. If you don't start one explicitly, the platform creates one for you. Session IDs flow through to extracted facts either way — you never lose attribution.
Just call agents.chat without touching the sessions API. The platform creates a session on the first message, keeps it open while the conversation is active, and closes it automatically when the conversation goes idle. This is the right default for most apps.
Call sessions.start before the first message and sessions.end when the conversation is definitively over. Use this when you need to:
Register custom tools for a specific conversation (tool_definitions on sessions.start).
Control boundary timing — e.g. end a coaching call exactly when the user hangs up, not when the idle timer fires.
Replay historical transcripts — pass the full message list to sessions.end(messages=...) to ingest a canned conversation verbatim, which is how data migration and benchmarks work.
Scope memory extraction around a meaningful unit (a support case, a daily stand-up, a D&D game night).
1. sessions.start — Register session_id (+ optional tools); get ready to accept messages
2. agents.chat (× N) — Stream turns through the session; facts extracted inline
3. sessions.end — Close the session; triggers consolidation, dedup, diary, clustering
→ every extracted fact carries this session_id
If you skip step 1, the first agents.chat call will auto-register a session. If you skip step 3, the session closes on idle timeout (configurable per tenant).
Every fact Sonzai extracts carries its source session_id and source_id. You can use these to:
Reconstruct a conversation's memory footprint — "what did the agent learn from session X?" via GET /memory/timeline (grouped by session) or GET /memory/facts (filter client-side by session_id).
Score retrieval at session granularity — benchmarks like LongMemEval evaluate whether retrieved facts come from the correct source session.
Surface recency context — "conversations from last Tuesday" resolves via the session's created_at plus its attributed facts.
Facts that exist outside a specific conversation — agent-global wisdom, manually inserted facts, migrated priming content — carry empty session_id and are attributed through source_type instead (e.g. "manual", "agent_global").
Custom tool definitions can be scoped to a single session. Pass them on sessions.start, or update them mid-session via sessions.set_tools. Character-level (agent-wide) tools are always merged in — session tools layer on top for the duration of the session.
The default agent memory model is per-user — every conversation builds a fact profile scoped to one (agent, user) pair. That's right for companion products and 1:1 assistants. But teams need the opposite: they want one agent serving a whole group to know what's going on across users.
Shared memory is the capability that turns a single agent into a team brain — informing user A with the context it gathered while talking to user B, with attribution, server-enforced privacy floors, and a full disclosure audit. Combined with the default-on wisdom layer (de-attributed cross-user generalisation), it gives you two complementary tiers of cross-user knowledge.
Where this fits
Shared memory layers on top of the standard per-user memory. Per-user facts still exist; shared memory adds an agent-wide partition for facts that should cross the user boundary. The two coexist; nothing about per-user memory changes when you turn shared memory on.
De-attributed cross-user generalisation. A daily promotion job pulls patterns from per-user fact histories, k-anonymises them, and rewrites them into agent-wide knowledge. No individual user is identifiable.
On for every new agent.
Every agent that talks to more than one user — it's a free generalisation layer. Disable only for strict single-user products.
sharedMemory
Attributed cross-user context. Person/entity-attributed facts (roles, expertise, business context, relationships) recorded by the agent and surfaced to other users sharing it. Names and identities are visible.
Off. Opt-in.
Group, team, party, or shared-business-context products where users explicitly expect to see who is doing what.
Both can run on the same agent simultaneously. wisdom is the safe layer (always behind k-anonymity); sharedMemory is the powerful one (attribution preserved) and demands deliberate opt-in.
wisdom is a precondition (default-on, so usually nothing to set explicitly). Flip sharedMemory: true to opt the agent in.
// Wisdom is on by default for new agents — only set it
// explicitly if you want to override the default.
await client.agents.updateCapabilities("agent_abc", {
wisdom: true,
sharedMemory: true,
});
Pass sharedMemory: false. Existing attributed facts stay in storage (you can re-enable later) but the agent stops surfacing them in context and stops getting the write tools.
Every system prompt the agent runs from now on includes a Shared facts about people and entities section listing the attributed facts on file plus a discretion clause that tells the LLM how to handle disclosure ("exercise discretion; privacy over transparency"). The agent doesn't dump everything to every user — it weighs disclosure decisions per turn.
Before an attributed fact is persisted, the platform runs a semantic validator that rejects writes about compensation, health, politics, and other privacy-sensitive categories. This is enforced server-side, not in the prompt — even if a user explicitly asks the agent to record a salary, the write is blocked. Rejected writes appear in the disclosure audit with decision = "redacted" so you can see what was attempted and why.
Expected: a 200 with an array of facts (entity_type, entity_id, category, value, confidence). Empty array if nothing has been written yet — that's still a healthy response.
Then re-run the list endpoint above. Alice's role should appear. Now any user talking to this agent will see this fact in the agent's context (subject to discretion).
Every time a fact is loaded into the context for a turn, an audit row is written with decision = "disclosed" and decision_why. If the privacy floor blocked something, the row will show decision = "redacted". This is your live observability — if production traffic is running with shared memory on, you'll see entries here, and you can audit any disclosure decision in retrospect.
Shared memory is sensitive by design. Four layers of control sit between an LLM call and a persisted disclosure:
Capability gate.sharedMemory: false (the default) means none of this happens — no tools registered, no context injection, no audit rows.
Privacy floor. The semantic validator rejects writes in compensation, health, politics, and other configured-sensitive categories before they hit storage. Configurable per tenant.
Discretion clause in the prompt. Even with facts present, the agent is instructed to weigh disclosure per turn rather than dumping everything.
Disclosure audit. Every disclosure decision is logged with reason. You can review what the agent shared, what it withheld, and why at any time via the audit endpoint.
Hard delete stays admin-only. Agents only soft-delete (tombstone), so a misattributed fact is reversible until an admin clears it permanently.
knowledgeBaseWrite and sharedMemory are independent capabilities — flip them in any combination:
KB write only: agents record facts about the world (products, policies, prices, incidents) in the project knowledge graph.
Shared memory only: agents record facts about people in this team (roles, expertise, ownership, relationships).
Both: full closed-loop institutional memory plus team brain. The agent learns what's true about the world and who's doing what, and every other agent on the project picks both up.
The per-pair learning loops in Self-Improvement keep getting sharper for that user; shared memory keeps getting smarter for the whole team. Both run automatically on every sessions.End().
wisdom is the de-attributed generalisation layer; sharedMemory is the attributed cross-user layer. Both can run together. The privacy floor protects the attributed side; wisdom doesn't need it because it's k-anonymised before promotion.
Every shared-memory endpoint — list, upsert, replace, delete, bulk import, relations CRUD, disclosure audit — is documented with request/response shapes in the Wisdom API reference.
User Personas are templates your tenant defines for the kinds of users the agent will meet. When a persona is attached to a user — during priming or via conversation metadata — the agent reads it alongside its own personality and adjusts tone, vocabulary, and pace accordingly. A "skeptical beginner" gets gentler explanations and more confirmations; a "power user" gets concise, direct answers without hand-holding.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Create a persona
const persona = await client.userPersonas.create({
name: "Skeptical Beginner",
description: "First-time user who questions recommendations and needs reassurance.",
style: "Use plain language. Confirm before any irreversible action. Offer brief rationale for each suggestion.",
});
console.log(persona.persona_id);
// List all tenant personas
const { personas } = await client.userPersonas.list();
personas.forEach(p => console.log(p.name, p.is_default));
Tenant-scoped — personas belong to your tenant, not to a specific agent or user. Every agent in your tenant can reference the same persona library.
Template, not assignment — creating a persona does not apply it to anyone. You attach it during priming or pass it as metadata when starting a conversation.
Default persona — one persona per tenant can be marked is_default. The agent falls back to it when no persona is explicitly attached to a user.
Style field — an optional free-form directive layered on top of the agent's base personality prompt. Write it as a concise instruction set: tone, vocabulary level, confirmation habits, pacing.
Pass a persona reference when priming a new user so the agent adapts from the very first turn, before any conversation history exists.
const job = await client.agents.priming.primeUser("agent_abc", "user_123", {
display_name: "Jordan Lee",
metadata: {
persona_id: persona.persona_id, // attach persona at priming time
timezone: "America/New_York",
},
content: [
{ type: "text", body: "Jordan is a first-time user migrating from a competitor product." },
],
source: "onboarding",
});
With Personality — agent personality × user persona = interaction style
These two concepts are complementary and operate at different levels:
Personality is the agent's traits — Big Five scores, speech patterns, emotional range. It is fixed per agent (and evolves slowly through interactions).
User Persona is the user's type — a template describing what kind of person the agent is talking to. It shapes how the agent expresses its personality in this specific conversation.
Think of it as a matrix: a high-agreeableness agent talking to a "power user" persona stays warm but drops the hand-holding; talking to a "skeptical beginner" persona it adds more reassurance and simpler vocabulary — without the underlying personality changing.
Define a persona for each user archetype you care about, then run eval scenarios scoped to that persona. This gives you repeatable, deterministic test conditions.
// Define an eval scenario for the "Skeptical Beginner" persona
const result = await client.agents.evaluate("agent-id", {
templateId: "onboarding-rubric",
messages: [
{ role: "user", content: "I'm not sure I trust this — what happens to my data?" },
{ role: "assistant", content: "That's a fair question. Your data stays on our servers..." },
],
// Pass persona context so scoring reflects expected beginner-friendly tone
metadata: { persona_id: persona.persona_id },
});
console.log(result.score, result.feedback);
Voice gives every agent three modes of audio interaction: one-shot text-to-speech for spoken replies, speech-to-text for transcribing user audio, and a live duplex stream for full real-time conversations over a token-authenticated WebSocket. The same agent identity drives all three — same personality, same memory, same tools — so spoken turns are consolidated into the same session as text turns. Pick a voice name, choose an output format, and the Mind Layer handles synthesis, transcription, and turn-taking server-side.
const audio = await client.agents.voice.tts("agent-id", {
text: "Hello! How can I help you today?",
voiceName: "aria",
language: "en",
outputFormat: "mp3",
});
// audio.data contains the audio bytes
Real-time duplex voice conversation. Get a token, then open a bidirectional stream.
// 1. Get a streaming token
const token = await client.agents.voice.getToken("agent-id", {
voiceName: "aria",
userId: "user-123",
});
// 2. Connect to live stream
const stream = await client.agents.voice.stream(token);
// Send audio chunks
stream.sendAudio(audioChunk);
// Or send text for the agent to speak
stream.sendText("Tell me about your day");
// Receive events
for await (const event of stream) {
if (event.type === "audio") {
playAudio(event.data);
} else if (event.type === "transcript") {
console.log(event.text);
}
}
// End session
stream.endSession();
WebSocket Transport
Live streaming is powered by WebSocket and supports real-time duplex audio. The client sends microphone audio chunks upstream while simultaneously receiving synthesized speech and transcripts downstream, enabling natural conversational flow.
Four AgentCapabilities fields describe an agent's voice configuration:
Field
Type
Description
voiceGeneration
boolean
Whether voice (TTS) generation is enabled for this agent
voiceUnlockedAt
string (ISO 8601)
When voice generation was granted
voiceId
string
The voice identifier used by default for this agent's TTS calls
voiceTier
number
Numeric tier level for voice quality (higher = higher quality/cost)
voiceId and voiceTier are read from get_capabilities(). To persist a preferred voice for an agent, store the voiceId from voices.list() and pass it to TTS calls. voiceGeneration is platform-managed and flips when your plan includes voice capabilities.
// Read voice capability fields
const caps = await client.agents.getCapabilities("agent-id");
console.log(caps.voiceGeneration); // true | false
console.log(caps.voiceId); // "aria" or null
console.log(caps.voiceTier); // 1, 2, etc. or null
console.log(caps.voiceUnlockedAt); // "2024-11-01T00:00:00Z" or null
// Pick a voice and use it for TTS
const voices = await client.voices.list({ language: "en" });
const chosen = voices.voices[0];
const audio = await client.agents.voice.tts("agent-id", {
text: "Hello!",
voiceName: chosen.name,
language: "en",
outputFormat: "mp3",
});
Voice is primarily relevant to companions and enterprise. For task
agents, it's usually not needed — but if you're building a phone/IVR
flow, the enterprise patterns apply.
Pick a voice that matches the character. Browse voices.list(),
shortlist 3-5, and A/B test with real users before committing. The
wrong voice kills immersion faster than any other mistake.
Use duplex for live conversations. WebSocket duplex streams both
STT (user input) and TTS (agent reply) in parallel — the natural shape
for a live phone-call-style experience. Don't use polling TTS for
companions; the latency kills presence.
Tune prosody. Set stability: 0.4-0.6 and clarity: 0.7-0.9 for a
warm, expressive read. Pure stability sounds robotic.
PROACTIVE BEHAVIOR
Wakeups
Wakeups let your agent reach out to a user exactly once at a known future moment. Give the agent an intent, a check_type that it sees as context, and a delay in hours — the platform handles delivery. Unlike Scheduled Reminders, which fire on a repeating cadence, a wakeup fires once and is done.
Typical use cases: birthday greetings, appointment reminders, post-event check-ins, interest follow-ups, and time-delayed nudges. If you need the agent to repeat the same outreach, use a schedule instead.
Schedule a birthday greeting for a specific date using scheduled_at. For a "N hours from now" wakeup, use delay_hours instead. If both are provided, scheduled_at takes precedence.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Use scheduled_at for birthdays/appointments with a known date
const wakeup = await client.agents.scheduleWakeup("agent_abc", {
user_id: "user_123",
check_type: "birthday",
intent: "wish the user a happy birthday",
scheduled_at: "2026-06-15T09:00:00Z", // RFC3339 absolute timestamp
occasion: "Sarah's 30th birthday",
interest_topic: "celebration and birthday traditions",
});
console.log(wakeup.wakeup_id); // "wake_01HX..."
console.log(wakeup.scheduled_at); // "2026-06-15T09:00:00Z"
delay_hours — a relative offset from the current moment (e.g. delay_hours: 24 fires tomorrow at roughly this time). The platform computes the absolute fire time at the moment the request is accepted. Use this for "N hours from now" semantics where no specific date matters.
scheduled_at — an RFC3339 absolute timestamp (e.g. "2026-06-15T09:00:00Z"). Use this for birthdays, appointments, or any event tied to a specific calendar date. The platform fires the wakeup as close to this time as possible.
If both are provided, scheduled_at takes precedence. scheduled_at in the response is always present and is the authoritative UTC time the wakeup will fire — store it if you want to show the user "your agent will reach out at X".
These optional context fields are included in the agent's wakeup block at fire time, giving it richer material for personalised message composition:
occasion — a short human-readable label for the event (e.g. "Sarah's 30th birthday", "dentist appointment"). The agent may reference this directly in the message.
interest_topic — a topic or theme the agent should lean on when composing the message (e.g. "celebration and birthday traditions", "dental health tips").
event_description — a longer free-form description with any additional context the agent should know (e.g. "User is turning 30 and has mentioned wanting to celebrate with a surprise party").
All three are optional and additive — provide as many or as few as are useful. The agent's underlying model uses them as soft context, not as a rigid template.
Both fields are free-form strings. The agent receives both as part of its wakeup context at fire time:
check_type is a short label that tells the agent the nature of the outreach ("birthday", "appointment_reminder", "interest_followup", etc.). Keep it lowercase and underscore-separated — it is machine-readable context, not a display string.
intent is a natural-language instruction to the agent describing what the message should accomplish. Write it as you would write a system instruction: "ask how the job interview went and whether they got an offer".
Neither field has a fixed enum — any string is valid. The agent's underlying model interprets them in context.
Fired; message delivered to the notification queue
cancelled
Cancelled before it fired
Once a wakeup reaches executed or cancelled it is immutable. To cancel a pending wakeup, call getWakeups to retrieve the wakeup_id, then cancel it via the API before scheduled_at passes.
Each call to scheduleWakeup creates exactly one future fire. If you need to re-schedule after a wakeup executes (for example, to send a birthday greeting every year), schedule a new wakeup the next time you learn the date. For repeating outreach on a fixed cadence, use Scheduled Reminders instead.
Schedules and Wakeups are complementary proactive primitives. The rule is simple: if the agent should reach out more than once on a predictable cadence, use a schedule. If the agent should reach out exactly once at a known moment, use a wakeup. Both feed into the same downstream delivery channels.
// Recurring: a daily morning check-in scheduleawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Singapore", }, intent: "morning mood and sleep check-in", check_type: "reminder",});// One-off: a wakeup on the day of the user's birthdayawait client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "birthday", intent: "wish the user a happy birthday on their 30th", delay_hours: 48,});
A common pattern is to use both together: a recurring schedule for everyday outreach, and a wakeup for a special moment that doesn't fit the cadence.
The agent can read memory facts to decide when and what to schedule. For example, if a user mentions their anniversary date, the agent can search memory to retrieve that date and schedule a wakeup for the right moment. The wakeup then fires with the agent already knowing why it is reaching out.
// 1. User mentioned an upcoming anniversary — find it in memoryconst memories = await client.agents.memory.search("agent_abc", { query: "anniversary date", limit: 5,});// 2. Parse the date from the top result and compute delay_hoursconst anniversaryFact = memories.results[0].content;// e.g. "User's wedding anniversary is April 30"const hoursUntilAnniversary = computeHoursUntil("2026-04-30");// 3. Schedule a wakeup for that exact moment// Use scheduled_at for a known date, or delay_hours for "N hours from now"await client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "anniversary", intent: "wish the user a happy anniversary and ask how they are celebrating", scheduled_at: "2026-04-30T09:00:00Z", // the anniversary date occasion: "User's wedding anniversary", event_description: anniversaryFact,});
Because the agent has memory of the conversation in which the user shared the anniversary date, the wakeup message will feel naturally aware of the context — not generic.
When a wakeup fires, the generated message lands in the agent's notification queue. Your backend can consume it via SSE polling or a registered webhook. The event type is the same as any other proactive message; you don't need special handling for wakeup-originated messages vs schedule-originated ones.
// Poll for any pending proactive messages (wakeups or schedules)const notifications = await client.agents.notifications.poll("agent_abc", { user_id: "user_123",});for (const n of notifications) { console.log(n.content); // the agent's message text console.log(n.source_type); // "wakeup" | "schedule"}
See Webhooks & Notifications for webhook registration, signature verification, and SSE consumption patterns.
Register a webhook URL per tenant (or per project) and Sonzai will HTTP POST every proactive agent message to that URL with a signed payload. Each request includes a Sonzai-Signature header you verify with your signing secret before acting on the payload. Use webhooks for server-to-server delivery where you own the downstream routing — forwarding to FCM/APNs, sending via SendGrid or Twilio, writing to a case-management system, or fanning out to multiple channels at once.
Register a webhook URL to start receiving on_wakeup_ready events. Save the signing_secret from the response — it is only returned once.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const result = await client.webhooks.register("on_wakeup_ready", {
webhookUrl: "https://your-server.com/webhooks/sonzai",
authHeader: "Bearer your-webhook-secret",
});
// Store this securely — shown only once
console.log(result.signingSecret);
Webhooks are registered per event type. One URL per event type per tenant, or per project when using project-scoped registration. The same URL can handle multiple event types — inspect the event_type field on the payload to route accordingly.
Every POST Sonzai sends includes a Sonzai-Signature header in the format:
Sonzai-Signature: t=1714000000,v1=abc123def456...
t is the Unix timestamp of the request; v1 is the HMAC-SHA256 of {timestamp}.{raw_body} using your signing secret (with the whsec_ prefix stripped). Always verify the signature on the raw, unmodified request body before parsing JSON — do not use the parsed object for verification.
When your endpoint returns a non-2xx status or times out, Sonzai retries with exponential backoff. Make your handler idempotent — deduplicate on event_id (or a stable field in the payload body) so retried deliveries do not double-process.
Verify the Sonzai-Signature header before acting on any payload. The Go SDK ships a helper; TypeScript and Python use standard crypto primitives.
import crypto from "node:crypto";
/**
* Verify a Sonzai webhook signature.
* Call this on the raw request body string before parsing JSON.
*/
function verifyWebhookSignature(
rawBody: string,
signatureHeader: string,
secret: string,
): boolean {
// Strip whsec_ prefix if present
const key = secret.startsWith("whsec_") ? secret.slice(6) : secret;
// Parse header: t={timestamp},v1={sig}
const parts = Object.fromEntries(
signatureHeader.split(",").map((p) => p.split("=")),
);
const timestamp = parts["t"];
const receivedSig = parts["v1"];
if (!timestamp || !receivedSig) return false;
const expectedSig = crypto
.createHmac("sha256", key)
.update(`${timestamp}.${rawBody}`)
.digest("hex");
return crypto.timingSafeEqual(
Buffer.from(receivedSig),
Buffer.from(expectedSig),
);
}
// In your webhook handler (e.g. Express):
app.post("/webhooks/sonzai", express.raw({ type: "*/*" }), (req, res) => {
const sig = req.headers["sonzai-signature"] as string;
const rawBody = req.body.toString("utf-8");
if (!verifyWebhookSignature(rawBody, sig, process.env.SONZAI_WEBHOOK_SECRET!)) {
return res.status(401).send("Invalid signature");
}
const event = JSON.parse(rawBody);
// Forward to your channel...
res.status(200).send("ok");
});
Timestamp tolerance
The Go SDK rejects signatures older than 5 minutes by default. In TypeScript and Python implementations, add a timestamp check if you need to guard against replay attacks: compare parseInt(parts["t"]) * 1000 against Date.now() and reject if the difference exceeds 300 000 ms.
Webhooks and polling are two consumption models for the same proactive message queue. Webhooks push to your server in real time; polling lets your client or server fetch on demand. Use webhooks when you have a stable server endpoint and need instant delivery. Use polling when your client cannot accept inbound HTTP connections (mobile apps, browser clients) or when you want to batch-process notifications on your own schedule. Both see the same payload shape.
// Polling alternative — same messages, pulled instead of pushedconst pending = await client.agents.notifications.list("agent_abc", { userId: "user_123", status: "pending",});for (const notif of pending.notifications) { console.log(notif.generated_message); await client.agents.notifications.consume("agent_abc", notif.message_id);}
When a scheduled reminder fires, an on_recurring_event_due webhook delivers the generated message to your endpoint. Your handler can then forward to FCM, send an email, or post to Slack — all without polling. This separates the scheduling concern (when to fire) from the delivery concern (how to reach the user).
// Register once; every scheduled reminder fires this endpointconst result = await client.webhooks.register("on_recurring_event_due", { webhookUrl: "https://api.yourapp.com/webhooks/sonzai",});// In your handler, forward to the appropriate channel:// event.generated_message → FCM, email, SMS, Slack...
When a wakeup fires, the on_wakeup_ready event is POSTed to your registered endpoint. This is the primary webhook event for companion-style agents that reach out proactively. Register the webhook once and every future wakeup — automatic or manually scheduled — will arrive at your URL.
// Register to receive all future wakeup messagesawait client.webhooks.register("on_wakeup_ready", { webhookUrl: "https://api.yourapp.com/webhooks/sonzai",});// Your handler receives the wakeup message and forwards it:// event.generated_message → push notification// event.user_id → lookup device token in your DB// event.agent_id → identify which agent sent it
No dedicated webhook tutorial yet. The Scheduled Reminders tutorial covers the full proactive delivery pipeline and includes webhook-based consumption patterns.
// List runs
const runs = await client.evalRuns.list({ agentId: "agent-id" });
// Get a specific run
const run = await client.evalRuns.get("run-id");
// Reconnect to a streaming run
for await (const event of client.evalRuns.streamEvents("run-id")) {
console.log(event.type, event.message);
}
Async Simulations
Simulations support async mode via simulateAsync() which returns a RunRef immediately, allowing you to poll or reconnect later.
Pick the path that matches your stack. All paths talk to the same hosted API — you can mix and match (e.g. backend in Python, plus an MCP connection from Claude Desktop for ops).
pip install sonzai
Python 3.11+. Sync (Sonzai) and async (AsyncSonzai) clients ship in the same package.
TypeScript runs on Node.js >=18, Bun, and Deno. Zero runtime dependencies.
The TypeScript, Python, and Go SDKs all read SONZAI_API_KEY from the environment by default — pass it explicitly (e.g. new Sonzai({ apiKey: "sk-..." })) only if you'd rather manage it yourself. The OpenClaw plugin stores its key in openclaw.json. The MCP server takes it via the SONZAI_API_KEY env var passed by the client config.
For precise control, create an agent with explicit Big5 scores. The platform derives a full personality profile, speech patterns, and emotional tendencies from your scores.
import { Sonzai } from "@sonzai-labs/agents";
import { v5 as uuidv5 } from "uuid";
const client = new Sonzai({ apiKey: "sk-..." });
// Derive a stable UUID from your own entity ID
const MY_NAMESPACE = "your-uuid-namespace-here";
const agentId = uuidv5("support-agent-001", MY_NAMESPACE);
const agent = await client.agents.create({
agentId, // pass your own UUID — safe to repeat
name: "Luna",
gender: "female",
big5: {
openness: 0.75,
conscientiousness: 0.60,
extraversion: 0.80,
agreeableness: 0.70,
neuroticism: 0.30,
},
language: "en",
});
console.log(agent.agent_id); // same UUID every time
Idempotent by Design
Agent creation is always a create-or-update. Calling it twice with the same ID updates the existing agent — it never errors or creates a duplicate. This means your startup code, CI pipelines, and provisioning scripts can call agents.create() unconditionally.
With agentId: Server uses your UUID directly. Recommended — link agents to your own entity IDs (agents, assistants, employees) for a deterministic mapping you control.
Without agentId: Server derives a UUID from your project ID + agent name. The same name always maps to the same agent within your project.
Use streaming chat to get real-time AI responses. The platform automatically handles context, memory, and state updates.
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "I had a great day hiking!" }],
userId: "user-123",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Server-Side Only
The SDK is for server-side use only. Never expose API keys in client-side code. For web apps, proxy through your backend. See the Integration Guide for examples.
Your backend manages business logic and user sessions. Call the Mind Layer for agent intelligence — it owns memory, personality, mood, relationships, and context assembly.
Integrate via the REST API using official SDKs for Go, TypeScript, and Python.
Official SDKs for Go, TypeScript, and Python, plus an OpenClaw plugin. Each SDK wraps the full REST API with typed methods, SSE streaming, automatic retries, and error handling.
All REST requests use Bearer authentication with your project API key:
# All REST requests use Bearer auth with your project API keycurl -H "Authorization: Bearer sk_your_api_key" \ https://api.sonz.ai/api/v1/agents/{agentId}/chat
All three SDKs wrap the same REST API with typed methods, SSE streaming, automatic retries, and error handling. Pick whichever matches your stack — they're all first-class.
The Sonzai API does not accept browser (client-side) requests. API keys must never be exposed in frontend code. This is the same pattern used by OpenAI, Anthropic, and other AI API providers.
For web apps (React, Next.js, Vue, etc.), create a backend API route that proxies to Sonzai. Your frontend calls your server; your server calls Sonzai with the API key.
When a user creates a new agent in your app, call agents.create with their personality configuration. Creation is idempotent — repeated calls with the same ID return the existing agent.
const agent = await client.agents.create({
name: "Luna",
gender: "female",
big5: {
openness: 0.75,
conscientiousness: 0.60,
extraversion: 0.80,
agreeableness: 0.70,
neuroticism: 0.30,
},
language: "en",
});
// agent.agent_id is the platform-generated UUID — store it in your user record
console.log(agent.agent_id);
// Fetch later
const profile = await client.agents.get(agent.agent_id);
The chat endpoint handles context assembly, AI streaming, and state updates in a single call.
for await (const event of client.agents.chatStream({
agent: "agent-id",
userId: "user-123",
messages: [{ role: "user", content: "I had a great day hiking!" }],
language: "en",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Mood Labels
Labels: Blissful (80-100), Content (60-79), Neutral (40-59), Melancholy (20-39), Troubled (0-19). Mood naturally drifts back toward the agent's personality baseline over time.
Agents can reach out to users between conversations. When triggered, the platform generates a contextual message using the agent's full state and stores it as "pending". Your app polls and marks notifications consumed after delivery.
# Poll for pending proactive messagesGET /api/v1/agents/{agentId}/notifications?status=pending&user_id=user-123# Response{ "notifications": [{ "message_id": "msg-uuid", "user_id": "user-123", "check_type": "check_in", "intent": "Ask about yesterday's hiking trip", "generated_message": "Hey! How was the hike at Mount Rainier?", "status": "pending", "created_at": "2026-03-07T10:00:00Z" }]}# After delivering to user, mark consumedPOST /api/v1/agents/{agentId}/notifications/{messageId}/consume
Delivery Best Practice
Poll every 30-60 seconds. Always mark consumed after delivery to prevent re-delivery.
Your backend translates application events into Mind Layer API calls. You can swap the backend without changing agent behavior, or reuse agents across applications.
Push structured data to build a project-scoped knowledge graph. Agents search this graph during conversations. See the Knowledge Base guide for full details.
Pre-load user metadata and content so agents already know a user before their first conversation. Metadata becomes instant facts; content blocks are extracted asynchronously via LLM.
Metadata facts (name, company, title) are created synchronously. Content blocks (text, chat transcripts) are processed in the background via LLM extraction. Poll the job status to track progress.
Feeding these docs to an AI assistant or coding agent? Every page has a Copy for LLM button, and the bundles below are pre-formatted for ingestion. Append .md to any doc URL (e.g. /docs/en/guides/integration.md) for the raw markdown.
Use the streaming chat endpoint — it handles context assembly, AI streaming, and state updates in one call.
Pass per-request application state via compiledSystemPrompt. The platform doesn't cache it across requests.
Register webhooks for wakeup events so agents can initiate contact.
Don't duplicate personality, memory, or relationship logic — let the engine own agent data.
Poll notifications every 30–60 seconds. Consume after delivery to prevent re-delivery.
All SDKs wrap the same REST API. Pick whichever matches your stack — they're all first-class.
Browser apps must proxy through your backend — never expose API keys in client-side code. See the Browser / Frontend Apps section above.
MCP Integration
The Mind Layer ships a hosted Streamable HTTP MCP endpoint at
https://api.sonz.ai/mcp/memory/{agent_id}. Point any MCP-compatible
client at it with your Sonzai API key — no local binary, no SSE port to
expose, no Go toolchain.
# Single command — registers the hosted MCP server with Claude Code:
claude mcp add --transport http sonzai \
https://api.sonz.ai/mcp/memory/AGENT_ID \
--header "Authorization: Bearer $SONZAI_API_KEY"
# Pick scope with --scope:
# local (default) — only this project, private to you
# project — writes .mcp.json (commit to share with team)
# user — across every project (~/.claude.json)
# Confirm the registration:
claude mcp list
Streamable HTTP, not SSE
The 2026 MCP spec marks Streamable HTTP as the canonical remote
transport. SSE is on a deprecation path across major clients — prefer
HTTP for any new integration. The legacy SSE transport is still served
by the local binary for backwards compatibility.
The Bearer-key route is what every example above uses — it's pinned to a
specific agent and your project API key is the only secret. The
OAuth-mode route lets clients discover available agents via a picker UI;
it's currently in beta and exposed at the
/.well-known/oauth-authorization-server discovery endpoint.
Treat the API key like a password
The Bearer token is a project API key — it grants full access to every
agent in that project. Don't paste it into shared MCP configs that get
committed to public repos. Prefer per-developer local-scope
configurations when collaborating.
OpenClaw is an open-source framework for building conversational AI agents. It uses a modular plugin system with named slots — each slot controls a specific part of the agent pipeline.
The most important slot is contextEngine. This is the plugin responsible for deciding what context gets injected into the system prompt before every LLM call. It controls what your agent remembers, knows, and feels.
OpenClaw's plugin system works like middleware. Each plugin implements lifecycle hooks that fire at specific points during a conversation turn:
bootstrap(sessionId): Called when a new chat session starts. The plugin initializes any connections or state it needs.
assemble(messages, tokenBudget): Called before every LLM call. The plugin returns a systemPromptAddition — extra context injected into the system prompt.
afterTurn(sessionId): Called after the LLM responds. The plugin processes the conversation (e.g., extract facts, update state).
compact(sessionId): Called when context needs to be consolidated (e.g., merging short-term memory into long-term).
dispose(): Called when the session ends. Clean up connections and state.
By default, OpenClaw ships with a basic context engine that stores memories as local Markdown files. The Sonzai plugin replaces this with the Mind Layer — giving your agent persistent memory, personality evolution, mood tracking, and relationship modeling with zero additional code.
When you install @sonzai-labs/openclaw-context, the package exports a register() function as its default export. On startup, OpenClaw loads all installed plugins and calls their register functions. Ours registers a context engine factory under the name "sonzai":
// Inside @sonzai-labs/openclaw-context (you don't write this)export default function register(api) { api.registerContextEngine("sonzai", () => { return new SonzaiContextEngine(client, config); });}
Then in openclaw.json, you tell OpenClaw which registered engine to use for the contextEngine slot. The name "sonzai" must match what the plugin registered:
So the flow is: install the npm package → OpenClaw discovers and calls register() → the plugin registers under "sonzai" → your config assigns it to the contextEngine slot.
Why Sonzai as a Context Layer?
Sonzai serves as a pure context engine for OpenClaw. Instead of the framework managing its own memory files, every conversation flows through the Mind Layer — which handles fact extraction, semantic search, mood updates, and personality evolution automatically. Your OpenClaw agent gets rich, structured context without writing any memory logic.
# Install via OpenClaw CLIopenclaw plugins install @sonzai-labs/openclaw-context# Or install directly with your package managernpm install @sonzai-labs/openclaw-context# bun add @sonzai-labs/openclaw-context
Your API key is stored in openclaw.json alongside your plugin config — no environment variables needed. Make sure openclaw.json is in your .gitignore to avoid committing secrets.
You can selectively disable specific context sources via the disable map. This is useful when you want the Mind Layer for memory but don't need mood tracking, or when you want to reduce token usage:
On each turn, the plugin injects a structured <sonzai-context> block into the system prompt. Sections are ordered by priority and dropped lowest-first if the token budget is exceeded:
Relevant Memories (priority 2): Semantically searched facts matching the latest user message
Current Mood (priority 3): 4D emotional state (valence, arousal, tension, affiliation)
Relationship (priority 4): Relationship narrative, love scores, chemistry with the current user
Goals (priority 5): Active goals (growth, mastery, relationship, discovery)
Interests (priority 6): Detected interests with confidence levels
Habits (priority 7, lowest): Behavioral patterns with strength scores
Token Budget
The default budget is 2000 tokens (~8000 characters). The plugin estimates token count at ~4 characters per token and drops the lowest-priority sections first when the budget is exceeded. Adjust with contextTokenBudget in your config.
The plugin automatically extracts user identity from OpenClaw's session key format. This enables per-user memory and relationships without any configuration:
For multi-tenant deployments where you provision agents programmatically,
the @sonzai-labs/openclaw-context plugin ships a setup() helper.
OpenClaw itself is a JavaScript context engine, so the plugin is
TypeScript-only — but the underlying provisioning is just two REST
calls (idempotent agent create + write a config file) that you can
drive from any language. Python and Go branches below show the
equivalent using the canonical Sonzai SDK.
Agent IDs are generated deterministically from SHA1(tenantID + agentName). Calling setup multiple times with the same name returns the same agent — safe for restarts and redeployments.
The context engine handles all communication with the Mind Layer. During assemble, it fetches context sources (memory, personality, mood, relationships, goals, interests, habits), ranks them by priority, and trims to the token budget. During afterTurn, it sends the conversation back for fact extraction and state updates. The engine never runs LLM calls locally — all intelligence lives on the Sonzai side.
Graceful Degradation
All API calls are wrapped in error handlers. If the Mind Layer is unreachable, the engine returns empty context and never blocks OpenClaw — your agent continues working without enriched context.
There are two complementary ways your agent can access Sonzai knowledge and memory:
Automatic (Recommended)
Call GET /context with a query param. The endpoint automatically searches the knowledge base and injects recalled memories. The deferred learning loop primes the next context call with KB results that the agent missed. No tool calling needed.
Explicit Tool Calling
Register Sonzai tools with your LLM so it can search on demand mid-conversation. This is for agent frameworks (LangChain, Vercel AI SDK, CrewAI) where the LLM decides when to search. You fetch tool schemas from Sonzai and wire them into your framework.
When to use which?
Start with automatic enrichment — it covers most cases with zero configuration. Add explicit tool calling when your agent needs to search mid-conversation (e.g., the user asks a question not covered by the initial context fetch) or when your framework expects tool definitions.
Fetch the tool catalog for an agent. This returns JSON schemas in OpenAI function-calling format that you can pass directly to your LLM's tool configuration.
Search the agent's knowledge base for relevant documents and facts. Uses hybrid search (BM25 + semantic) when embeddings are available, falling back to BM25 full-text search.
Search the agent's memory for previously extracted facts about a user. This is a synchronous BM25 full-text search that returns immediately — no deferred processing.
Unlike KB enrichment (which has a deferred path), memory search returns immediately from BM25 indexes. There is no async component. The /context endpoint already includes the most relevant memories automatically — this tool is for cases where the LLM needs to search for additional facts mid-conversation.
from langchain_core.tools import toolfrom langchain_google_genai import ChatGoogleGenerativeAIfrom langgraph.prebuilt import create_react_agentfrom sonzai import Sonzaisonzai_client = Sonzai(api_key="sk_your_api_key")agent_id = "agent-id"user_id = "user-123"@tooldef knowledge_search(query: str, limit: int = 5) -> list[dict]: """Search the agent's knowledge base for relevant documents and facts. Use when the user asks about topics that may be in uploaded documents.""" results = sonzai_client.agents.knowledge_search(agent_id, query=query, limit=limit) return [{"content": r.content, "label": r.label, "score": r.score} for r in results.results]@tooldef memory_search(query: str) -> list[dict]: """Search agent memory for previously learned facts about the user. Use when the conversation references past interactions or personal details.""" results = sonzai_client.agents.memory.search(agent_id, query=query, user_id=user_id) return [{"content": f.content, "type": f.fact_type} for f in results.results]# Get enriched contextctx = sonzai_client.agents.get_context( agent_id, user_id=user_id, session_id="session-abc", query=user_message)llm = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview")agent = create_react_agent(llm, [knowledge_search, memory_search])result = agent.invoke({ "messages": [ {"role": "system", "content": build_system_prompt(ctx)}, {"role": "user", "content": user_message}, ]})
The most powerful aspect of standalone mode is the self-improving learning loop. Even without explicit tool calls, the agent gets smarter each turn because /process detects knowledge gaps and primes the next /context call.
One-shot signals: Deferred KB results are consumed when /context reads them. They appear exactly once, preventing stale or repeated information.
TTL-based expiry: Deferred signals expire after 1 hour. If the user doesn't continue the conversation, stale signals are automatically cleaned up.
Deduplication: If the direct /context query matches the same KB document as a deferred signal, the duplicate is removed. You never get the same result twice.
Capped searches: /process runs at most 5 KB queries per call and stores at most 10 deferred results, preventing resource explosion on topic-heavy conversations.
Unlike KB enrichment, memory search has no deferred/async path. When /context is called, it recalls the most relevant memories immediately using the hierarchical memory tree and BM25 indexes. When you call GET /memory/search explicitly, results return immediately.
The deferred behavior only applies to knowledge base content, where /process proactively discovers KB documents the agent should have known about. Memory facts are always available synchronously because they are indexed at write time (during /process).
Not necessarily. /context automatically includes KB results and recalled memories. Tool calling is useful when the LLM needs to search for something specific mid-conversation that wasn't covered by the initial context fetch, or when your framework expects tool definitions.
No. Memory search is always synchronous. When you call GET /memory/search, results return immediately from BM25 indexes. The deferred/async flow only applies to knowledge base enrichment via the /process learning loop.
The deferred signals expire after 1 hour (TTL-based cleanup). No stale data persists. If the user resumes the conversation later, they get fresh results from the next /context call.
Absolutely. The Sonzai tool schemas are standard OpenAI function definitions. Mix them with your own tools in whatever framework you use. The LLM decides which tool to call based on the conversation.
Custom tools (created via POST /agents/{agentId}/tools or the dashboard) are for agent-side tool calling in Sonzai's managed chat mode. The tool schemas described here (/tools/schemas) are for BYO-LLM mode where your LLM calls Sonzai endpoints.
Learn
If the Docs explain what each feature does, Learn
explains why it works the way it does — the model, the loop, and the
trade-offs you can tune.
Once the model is clear, the Tutorials in Guides
walk you end-to-end through a concrete project.
BYOK — Bring Your Own Key
BYOK lets you keep using Sonzai's chat / sessions / extraction stack
while routing the underlying provider call through your API key. Token
charges land on your provider invoice; everything else (memory,
personality, post-processing models, billing for Sonzai's platform)
behaves the same.
This is different from Custom LLM (BYOM).
BYOK uses Sonzai's first-party provider integrations with your billing
key; BYOM swaps the entire chat-completion call to an endpoint you
host.
BYOK keys are stored per (project, provider). There is one key per
provider per project, no per-agent / per-session / per-call BYOK keys.
That makes the mental model simple: any chat turn that lands on a given
provider in a given project routes through that project's BYOK key for
that provider, regardless of which agent or session triggered it.
What
Scope
Storage
per (project_id, provider) — primary key in the table
Resolution at request time
by project_id of the chat call → look up key for the resolved provider
Encryption
AES-256 at rest, decrypted only inside the request path
API access
tenant-level API keys may manage any project they own; project-scoped keys must match the requested project_id exactly
If you want different keys for different agents in the same project, use
separate projects — that's the only knob.
Each of the four supported providers shows a card. Pick the one you
want to configure.
Paste the API key from the provider (OpenAI, Google AI Studio for
Gemini, xAI console, OpenRouter dashboard).
Hit Save. The dashboard runs the same synchronous probe the API
does — if the upstream rejects the key, the save fails and you get the
provider's error message inline. No bad key gets persisted.
Once saved the card switches to the configured state — redacted
prefix, health badge (healthy / unhealthy / unknown), and
buttons to Replace, Test, Disable, or Delete.
The dashboard is the same endpoint the SDK calls, so anything you do
there is identical to what you can script.
Keys are encrypted at rest and decrypted only inside the
platform's request path. They never round-trip back through any API —
list / get responses return only an api_key_prefix (the first few
characters) so you can identify which key is which.
A synchronous probe runs at write time. Sonzai does a no-op call
to the upstream provider with the key, so bad keys fail PUT with a
400. Misconfiguration surfaces at setup, not on the first user chat.
Per-key health is tracked. Every read returns health_status,
last_health_error, and last_health_check_at so you can detect a
rotated-out or revoked key before users do — pipe these into a
monitor and alert before chat traffic starts failing.
Project API keys created via the dashboard or POST /api/v1/projects/{project_id}/api-keys
carry a scopes array. To use BYOK programmatically via the SDK, the key needs:
read:byok — to list providers and check health status.
write:byok — to put / disable / delete / re-test keys.
Tenant-level credentials (Clerk dashboard sessions, default API keys with ["*"])
automatically have access to all BYOK operations.
Scope strings are case-sensitive, verb-first, and lower-case (read:byok, not BYOK:Read).
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const key = await client.byok.set("project_xyz", "openai", process.env.MY_OPENAI_KEY!);
console.log(key.api_key_prefix); // e.g. "sk-..." — never the full key
console.log(key.health_status); // "healthy" after the synchronous probe
PATCH toggles is_active without rotating the key — handy for
disabling temporarily without losing the key material. DELETE removes
the key and its history.
// Pause this BYOK key (subsequent calls fall back to platform-managed billing)
await client.byok.setActive("project_xyz", "openai", false);
// Re-test
const fresh = await client.byok.test("project_xyz", "openai");
// Permanently remove
await client.byok.delete("project_xyz", "openai");
The platform caches the resolved BYOK key per (project_id, provider)
in-process for performance. Every Set / Patch / Delete fires an
invalidator so a rotated key takes effect on the next call without a
restart.
If a project has no BYOK key for the provider that the chat call ends up
using, Sonzai bills that provider call to its own platform key as
normal — same UX, same SLA. BYOK is purely additive: set a key and it
takes over for that provider; remove it and the platform key kicks back
in.
Custom LLM (BYOM) — entirely your own endpoint, not provider passthrough.
Providers — the four IDs you can attach BYOK keys to.
Model scope — how the chat / post-processing model is resolved across project / agent / per-call layers (BYOK applies after that resolution lands on a provider).
Configure an OpenAI-compatible API endpoint for your project. Sonzai routes all chat generation through your endpoint while handling everything else: context assembly, tool execution, side-effect extraction, memory storage, personality tracking, and consolidation.
Full Managed Experience
Built-in tools (web search, memory recall, image generation, inventory), streaming SSE, per-message side effects — everything works exactly as with our default providers.
Your Model, Your Control
Use fine-tuned models, self-hosted endpoints, or any OpenAI-compatible provider (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.).
Encrypted at Rest
Your API key is encrypted with AES-256 before storage. Only the first 8 characters are visible in the dashboard for identification.
Per-Project Configuration
Each project can have its own custom LLM endpoint. Toggle it on/off without deleting the config.
Custom LLM is the right choice when you want to use your own model but still want the full Sonzai experience (tools, streaming, per-message extraction). Standalone Memory is for when you need to control the entire chat loop yourself — e.g., for privacy preprocessing, data anonymization, or deep integration with an agent framework. See the Standalone Memory docs for the tradeoffs.
Once configured, here is what happens when a chat request is made:
Context assembly — Sonzai builds the 7-layer enriched context (personality, memory, mood, habits, goals, relationships, application state) exactly as with default providers.
Tool injection — Built-in tools (sonzai_memory_recall, sonzai_web_search, etc.) and any custom tools are added to the request.
Your endpoint called — The request is sent to your configured endpoint with your model name, API key, and the full message history including system prompt.
Streaming proxy — SSE chunks from your endpoint are streamed back to the client in real time.
Post-stream processing — After the stream completes, Sonzai extracts side effects (memory facts, mood changes, personality shifts, habits, tool calls) and stores them — same as with default providers.
Background tasks like fact extraction, memory consolidation, diary generation, and summarization automatically use the same model family you configured. Sonzai tracks the last-used provider/model for each agent and routes background LLM calls accordingly.
Custom LLM usage is billed at a flat per-token rate under the custom_llm billing model, regardless of which actual model your endpoint serves. Sonzai tracks input/output tokens from your endpoint's usage response. Your own endpoint costs (API fees, compute) are entirely yours.
Models
Every Sonzai chat turn fans out into two model calls:
Chat completion — what the user sees, streamed back live. Pick this
for personality and quality.
Post-processing — the latency-insensitive batch work that runs after
the reply ships: fact extraction, deduplication, mood updates,
personality drift, summarisation, diary, constellation. Pick this for
cost and throughput.
The two are configured independently. A frontier chat model can pair with
a cheap flash-lite extractor, and Sonzai resolves both per call through a
five-layer cascade that lets you override at agent, project, account
(tenant), and session scope.
Platform default — also the fallback wildcard for post-processing
openai
OpenAI
gpt-5.5
5.4 / 5 / mini / nano in the same family for fallback
xai
xAI (Grok)
grok-4-1-fast-non-reasoning
Reasoning + non-reasoning Grok 4 / 4.20 variants
custom
Bring-your-own LLM
—
Point Sonzai at any OpenAI-compatible endpoint — see Custom LLM
The sonzai.providers module exports these IDs as constants — import
them rather than hand-typing strings, so the IDs stay in sync as the
catalog evolves. client.list_models() returns the live set enabled on
your tenant for runtime model-picker UIs.
Internal fallback
The platform also speaks openrouter for its own internal failover
paths. Customers don't pick openrouter directly today; Sonzai handles
failover on its side when the primary provider quota is exhausted.
Use Sonzai's hosted infrastructure but bill provider tokens to your own
account. Drop a key per provider against your project; subsequent
requests on that project route through your key for the matching
provider. Keys are encrypted at rest and never echoed back through the
API.
These run after the user-facing reply is streamed, on the
post-processing model map — a per-project config that maps the
chat-completion model to the smaller model the extractor should use.
When extraction needs to run for a chat that used claude-3-5-sonnet,
the extractor uses Gemini Flash Lite. When it sees a chat model not in
the map, the * wildcard kicks in.
The wildcard key is exported as sonzai.PostProcessingWildcardKey (Go)
and the equivalent constant in the other SDKs so you don't have to
hard-code "*" in your provisioning scripts.
The wildcard is enough for most projects. Reach for an explicit entry
when:
A particular chat model produces output the default extractor mishandles
(e.g. tool-call traces from a verbose model that need a stronger
extractor to keep facts atomic).
You're A/B-ing two extractors and want one chat model to route through
each for comparison.
Cost: cheaper chat models can run a cheaper extractor; flagship chat
models may warrant a stronger extractor on the same trace.
Provider availability
An entry's provider/model must match a real provider Sonzai has
configured for your project — see Providers.
Setting a non-existent provider here makes extraction fail
asynchronously after the user-facing reply has already streamed; you'll
see it in the agent's extraction_status on the next turn.
Providers — the chat-completion provider list (independent of post-processing).
Self-improvement — the full picture of what the extractor does on each turn.
Reference → API — REST endpoint shapes for the project-config get/set/delete calls.
Providers
Sonzai routes chat completions through one of four providers. The IDs
are exported as constants from the sonzai.providers module in the
SDKs — import those rather than hand-typing strings, so they stay in
sync as the catalog evolves. Use client.list_models() for the live
set enabled on your tenant at runtime.
Default gpt-5.5; the 5.4 family is the cheaper workhorse and 5 / 5-mini /
5-nano cover even cheaper or smaller-context tiers. The fallback chain on
quota exhaustion is gpt-5.5 → gpt-5.4 → gpt-5.4-mini → gpt-5.
Model
Context window
Use it when
gpt-5.5
1.05M
Default. The current OpenAI frontier — vision + tools + streaming + JSON mode.
gpt-5.4
1.05M
Cheaper than 5.5, same context window.
gpt-5.4-mini
1.05M
The cheap workhorse. Recommended for high-throughput tenants.
gpt-5
400k
Frozen Aug-2025 snapshot. Kept for tenants pinned to it; new agents should default to 5.5.
Reasoning and non-reasoning variants in the Grok 4 family.
grok-4-1-fast-non-reasoning is the default; reasoning models are
opt-in for tasks that benefit from deeper chain-of-thought.
Model
Context window
Reasoning
grok-4-1-fast-non-reasoning
2M
No
grok-4-1-fast-reasoning
2M
Yes
grok-4.20-0309-non-reasoning
2M
No
grok-4.20-0309-reasoning
2M
Yes
All Grok 4 entries support streaming, tools, and JSON mode. None support
vision today.
Point Sonzai at any OpenAI-compatible chat-completions endpoint. The
Mind Layer keeps owning memory, personality, mood, and post-processing —
only the chat-completion call gets routed through your endpoint.
See Custom LLM for the full setup. This is
distinct from BYOK — BYOK uses Sonzai's
provider integrations but with your billing key; BYOM uses your own
inference stack entirely.
client.list_models() (Python / TS / Go expose the same shape) returns
the live set of providers and models enabled on your tenant — useful for
building a model-picker UI or for asserting that a provider you depend on
is wired up before a deploy.
const result = await client.listModels();
for (const p of result.providers) {
console.log(p.provider, p.models.map((m) => m.id));
}
Custom LLM — point Sonzai at your own endpoint entirely.
Model scope — how provider / model is resolved per call.
Post-processing — what runs in the background, on what model.
Model scope
A Sonzai chat turn picks two models: the chat-completion model the
user sees, and the post-processing model that runs the background work
afterwards. Each goes through its own resolver cascade. The cascades
share the same scope hierarchy:
1. per-call (highest precedence — passed to agents.chat / sessions.start / agents.process)2. per-agent (AgentProfile fields)3. per-project (project_config rows in CockroachDB)4. per-account/tenant (account_config rows in CockroachDB)5. system default (Go constant compiled into the binary)
First non-empty layer wins. Layer 5 always exists, so resolution always
produces a concrete answer.
The cheaper-model fleet that runs the batch work behind every turn:
fact extraction, dedup, mood updates, personality drift, summarisation,
diary, constellation. Resolved per task, per turn, independently of the
chat model.
One frontier model per agent, one cheap extractor per project. Set
agent ModelConfig to your premium model; set the project
post-processing map's * wildcard to gemini/gemini-3.1-flash-lite-preview.
A/B test extractors. Two projects, same agents, different
account_config.post_processing_model_map entries — compare quality on
the same traffic.
Per-tenant pricing tiers. Free tier defaults the post-processing
map to flash-lite at the tenant level; paid tier overrides per-project
to a stronger extractor.
One-off override. Pass provider/model on a single
agents.chat call without persisting anything.
Browse the full endpoint reference — schemas, request/response examples, and an interactive try-it panel — at /docs/en/api. Every operation gets its own page generated from the live OpenAPI spec.
Public HTTP endpoints for agent lifecycle, real-time agent interaction, and proactive delivery. Memory, mood, relationship, and context-management internals are handled by the platform.
Server-side only. The API does not accept browser requests. For web apps, proxy through your backend. See the Integration Guide.
Generates or regenerates an AI-created avatar for the agent. Uses LLM to create an image prompt from personality data, then generates and uploads the image. Costs 1 credit. Avatars are auto-generated on agent creation unless disabled.
Request:
agent_id (string): Agent UUID (URL param)
style (string): Optional style hint (e.g. 'watercolor anime', 'realistic portrait')
Primary public conversation RPC. Send the agent, user, application context, and message history; the platform handles context assembly and state updates automatically.
Bidirectional streaming voice chat with server-side VAD (voice activity detection). Client streams audio chunks continuously; server handles speech detection, transcription, AI response, and TTS.
Notify the platform about significant application events. The platform may generate diary entries, update goals, or take other AI actions. Fires OnDiaryGenerated webhook when diary is created.
Project-scoped knowledge graph. Upload documents or push structured data via the API — the platform extracts entities, builds a graph, and gives agents a knowledge_search tool to query it during conversations.
Pre-load user metadata and content so AI agents already "know" users from their first conversation. Metadata (name, company, title) becomes instant facts; content blocks (text, chat transcripts) are processed asynchronously via LLM extraction.
agent_id (string): Agent that generated the message
user_id (string): Target user
wakeup_id (string): Associated wakeup event
check_type (string): Type of check (check_in, follow_up, mood_driven)
intent (string): Why the agent wants to reach out
generated_message (string): The actual message text
status (string): pending, consumed, expired, failed_generation
created_at (Timestamp): When generated
AI Companions — Quickstart
This quickstart is for building an AI companion — a character with a
real personality, a rich inner life, and a relationship with the user that
evolves over time. Think: AI characters, VTubers, personal companions, story
NPCs.
What you'll build: Luna, a warm and curious companion who remembers your
conversations, develops a real relationship, and reaches out proactively when
it makes sense.
What you'll use: Big Five personality, 4D mood, hierarchical memory,
relationship tracking, proactive wakeups, and (optionally) voice.
The fastest path: describe the character in plain language and let the
platform infer personality, speech patterns, and seed memories.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const agent = await client.agents.generation.generateAndCreate({
name: "Luna",
description: "Luna is a warm, creative dreamer who speaks poetically. She loves stargazing, coffee shops at 2am, and asking the question beneath the question.",
language: "en",
});
console.log(agent.agent_id);
console.log(agent.personality); // full Big5 profile derived from the description
You can also define the character explicitly — set Big5 scores, speech
patterns, and a detailed bio. See Agent Generation.
Tell the agent who this user is before their first chat. Priming creates the
initial memory tree — the agent will reference these facts naturally.
await client.agents.priming.primeUser("agent-id", "user-123", {
display_name: "Sam",
content: [
{ type: "text", body: "Sam loves astronomy, lo-fi music, and photography." },
{ type: "text", body: "Sam is a night-owl grad student who tends to overthink. They came to Luna after a tough week." },
],
});
Proactive wakeups are what separate companions from chatbots. The platform
schedules them automatically based on relationship context — or you can
trigger them explicitly.
// Poll periodically (or register a webhook).
const pending = await client.agents.notifications.list("agent-id", {
userId: "user-123",
status: "pending",
});
for (const n of pending.notifications) {
// Render n.generated_message in your UI; mark consumed when shown.
await client.agents.notifications.consume("agent-id", n.message_id);
}
Current SDK versions: TypeScript 1.1.3 · Python 1.1.4 · Go 1.2.0 (as of 2026-04-17)
AI Employees & Personal AI — Quickstart
This quickstart is for building an AI employee or personal AI — a
task-oriented agent that helps a user get work done. Think: a support
engineer, a sales-development rep, an inbox assistant, an onboarding guide.
What you'll build: a customer-support agent that (1) remembers each user
across sessions, (2) can create tickets and look up order status via custom
tools, and (3) answers product questions from a knowledge base.
What you can skip: the Emotions system. Mood still
runs in the background but won't shape replies unless you opt in. Personality
stays minimal — a professional tone profile is enough.
Give the agent a minimal professional personality — high conscientiousness,
moderate agreeableness, low neuroticism. That's all you need for a task agent.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const agent = await client.agents.create({
name: "Atlas",
bio: "Atlas is a calm, precise support engineer who answers product questions and handles tickets.",
big5: {
openness: 0.55,
conscientiousness: 0.85,
extraversion: 0.5,
agreeableness: 0.7,
neuroticism: 0.2,
},
});
console.log(agent.agent_id);
Tools let the LLM call your backend during inference. Sonzai doesn't execute
them — it returns the tool call, your backend executes, and you pass the
result back on the next turn.
await client.agents.sessions.setTools("agent-id", "session-id", [
{
name: "create_ticket",
description: "Create a support ticket for the user.",
parameters: {
type: "object",
properties: {
subject: { type: "string" },
priority: { type: "string", enum: ["low", "normal", "high"] },
},
required: ["subject"],
},
},
{
name: "lookup_order",
description: "Fetch the latest order status by order ID.",
parameters: {
type: "object",
properties: { orderId: { type: "string" } },
required: ["orderId"],
},
},
]);
Current SDK versions: TypeScript 1.1.3 · Python 1.1.4 · Go 1.2.0 (as of 2026-04-17)
Enterprise Agents — Quickstart
This quickstart is for building enterprise AI agents — agents embedded
into business workflows. Think: CRM copilots, tier-1 support, internal
knowledge assistants, sales-qualification bots, compliance reviewers.
What you'll build: a sales-qualification agent that runs per-workspace,
receives deal events from your CRM via webhook, pulls from your product
docs, tracks workflow stage as custom state, and runs against eval rubrics
before each release.
What you'll use: multi-instance isolation, project-scoped knowledge base,
custom states, webhooks, tools, and evaluation runs.
In platform.sonz.ai, create a
project and generate both an API key and a webhook signing secret. Enterprise
deployments usually scope API keys per environment (dev, staging, prod).
Each customer workspace gets its own instance. Memory, custom state, and
notifications scoped to instance_id = workspace-id stay isolated — critical
for multi-tenant SaaS and compliance.
Push platform events into your stack. Each webhook subscribes to one event
type — register the events you care about. The agent sees these as "workflow
events" and reacts naturally on the next turn.
Every webhook request is signed with HMAC-SHA256 — verify before acting.
See Webhooks & Notifications for the full event
catalog, retry policy, and verification example.
Before shipping a prompt change or a new agent version, run it against an
eval rubric. Grade personality drift, factual accuracy, and tool-call
correctness.
// Kick off a simulation + grading run, return immediately.
const ref = await client.agents.runEvalAsync("agent-id", {
templateId: "template_lead_qualification_v3",
simulationConfig: { turnsPerScenario: 6 },
});
// Poll the run record once the eval finishes (or stream live via streamEvents).
const result = await client.evalRuns.get(ref.runId);
console.log(result.scoreOverall, result.scoresByCategory);
See Evaluation for building rubrics and simulation
users.
Fetches the 7-layer enriched context: personality, mood, relevant memories, active goals, habits, relationship state, and proactive signals. Pass a query matching the current topic for best memory recall.
const ctx = await session.context({ query: "What should we talk about?" });
// ctx is a flat object — no nested envelope. Useful fields:
// personality_prompt — agent identity / system instructions
// bio, speech_patterns — agent identity bits
// true_interests, true_dislikes
// big5, dimensions, preferences, behaviors
// recent_personality_shifts, significant_moments, active_goals, habits
// current_mood, emotional_state
// loaded_facts — recalled facts (each has atomic_text, fact_type, importance)
// long_term_summaries — multi-session digests
// proactive_memories — pending proactive signals
// constellation_patterns — deeper behavioral patterns
// relationship_narrative, chemistry_score, love_from_agent, love_from_user
// knowledge.results — KB hits for the query (only nested key)
// recent_turns — buffered messages from this session
// backend_context — custom application state (if set)
POST /agents/{agentId}/sessions/{sessionId}/turn — sync mood update inline (~300–500ms), deeper extraction continues in the background (5–15 seconds). Accepts role: "tool" and tool_calls on assistant messages.
const { mood, extraction_id, extraction_status } = await session.turn({
messages: [
{ role: "user", content: userMessage },
// intermediate tool calls/results here
{ role: "assistant", content: assistantMessage },
],
// provider/model fall back to the session-level defaults; both are optional.
});
If you can predict the next user query (or just want to pre-warm with a generic query), pass fetchNextContext on .turn() and the server returns an enriched context inside the same response under next_context. This eliminates one roundtrip on the next render.
const { mood, next_context } = await session.turn({
messages: [...],
fetchNextContext: { query: "any query you'd run on the next turn" },
});
// next_context has the same shape as session.context() — use it directly
// to render the system prompt for the next turn without calling /context.
Send a full transcript and run extraction immediately. Auto-creates a session if sessionId is omitted; the response surfaces the auto-generated session_id.
const result = await client.agents.process("agent-id", {
userId: "user-123",
// sessionId omitted — auto-created
messages: [
{ role: "user", content: userMessage },
{ role: "assistant", content: assistantMessage },
// tool messages allowed too
],
provider: "gemini", // optional
model: "gemini-3.1-flash-lite-preview", // optional
});
console.log(result.session_id); // auto-generated when not passed
console.log(result.facts_extracted); // count of facts extracted this call
console.log(result.side_effects); // { mood_updated: true, ... summary counts }
// Then read the extracted state back via the dedicated endpoints:
const memory = await client.agents.memory.list("agent-id", { userId: "user-123" });
const mood = await client.agents.getMood("agent-id", { userId: "user-123" });
The response is intentionally a small summary — { success, facts_extracted, side_effects, session_id }. To inspect the extracted facts/personality/mood/habits themselves, call the dedicated read endpoints (see Reading Behavioral Data below).
Closes the session. If you call this withoutmessages (after using /turn or /process), it's a finalize-only call. If you call it withmessages and skipped /process, this becomes your extraction trigger — functionally equivalent to /process, but lifecycle-scoped and async-capable on tenants where enabled.
// Just close — no extraction needed if you used /turn or /process already.
await session.end({ totalMessages: 12, durationSeconds: 600 });
// OR — pass messages here as the extraction trigger (Option B).
await session.end({
messages: transcript,
totalMessages: transcript.length,
durationSeconds: 600,
});
Both /turn and /process accept OpenAI/Anthropic-style tool messages. Sonzai's extractor reads tool results and can capture facts that only appeared in tool output.
{ "messages": [ { "role": "user", "content": "Where did my last order ship from?" }, { "role": "assistant", "tool_calls": [ { "id": "call_1", "type": "function", "function": { "name": "order-lookup", "arguments": "{\"limit\":1}" } } ] }, { "role": "tool", "tool_call_id": "call_1", "content": "{\"order_id\":\"42\",\"origin\":\"Tokyo\",\"carrier\":\"DHL\"}" }, { "role": "assistant", "content": "Your last order shipped from Tokyo via DHL." } ]}
The extractor will surface a fact like "User's last order (#42) shipped from Tokyo via DHL" — a fact that never appeared in the user's or assistant's own text.
The Context Engine schedules proactive outreach (check-ins, follow-ups) based on conversation patterns. Poll for pending notifications and consume them when delivered.
const notifications = await client.agents.notifications.list("agent-id");
for (const notif of notifications) {
await deliverToUser(notif.user_id, notif.message);
await client.agents.notifications.consume("agent-id", notif.message_id);
}
Atomic facts (preferences, events, commitments) with importance scoring, deduplication, and topic tagging. Sourced from user, assistant, AND tool messages.
Personality Deltas
Big5 trait shifts (openness, conscientiousness, extraversion, agreeableness, neuroticism) with reasoning.
Mood Changes
4D mood delta (valence, arousal, tension, affiliation). Sync mood lands inline on /turn; richer extraction is deferred.
Habit Detection
New and reinforced behavioral patterns — exercise routines, reading habits, social patterns.
Interest Tracking
Topics the user engages with, categorized by domain with confidence and engagement scores.
Relationship Dynamics
Love score changes with reasoning — tracks rapport, trust, and emotional connection.
Proactive Outreach
Scheduled check-ins and follow-ups based on conversation context (e.g., 'ask about the hike tomorrow').
When calling /turn or /process, specify which of our LLM providers to use for extraction. Omitting provider/model falls back to the platform default gemini-3.1-flash-lite-preview.
There are three ways to feed conversations into Sonzai. The first two are batch (you send a transcript after the conversation); the third is real-time (you submit each turn as it happens). Pick exactly one per conversation — chaining them runs extraction twice on the same messages.
A. /process — one-shot batch
Single call. Auto-creates a session if you don't pass one. Best for external LLM transcripts, benchmarks, and any flow without a long-lived session lifecycle.
B. sessions.start → end({ messages }) — lifecycle batch
Open a session, do your full conversation off-platform, then close with the transcript on .end(). Use when you want explicit session boundaries, async polling, or session-scoped tools — but still ingest in one shot.
C. sessions.start → turn() × N → end() — real-time
Open a session and submit each exchange via .turn() as the conversation happens. Sync mood lands inline (~300–500ms); deeper extraction runs asynchronously 5–15s later. Best for chat companions, voice AI, and agent frameworks.
A. /process
B. sessions.end({ messages })
C. sessions.turn() × N
Calls per conversation
1
2 (start + end)
2 + N (start + N × turn + end)
Sonzai in the hot path?
No
No
Yes — .context() and .turn() flank each turn
Context per turn
Pre-session only (optional getContext call)
Pre-session only (optional getContext call)
Fresh, query-specific via .context()
Extraction timing
Whole transcript, inline
Whole transcript, inline (or async on tenants where enabled)
Per-turn — sync mood inline, deeper extraction 5–15s later
A and B are functionally equivalent for fact extraction — both extract facts and side-effects from the full transcript inline. The only differences are lifecycle ergonomics (B gives you an explicit session and supports async polling) and call count.
C is a different shape: Sonzai is part of every turn instead of seeing the conversation only at the end.
Don't mix shapes within one conversation
Calling .turn() per turn (C) and.end({ messages }) with the same transcript (B) extracts the same messages twice. Pick one shape per conversation. The pattern docs below show C and B/A separately.
/turn, /process, and sessions.end are intentionally lightweight. They extract facts and a session summary from the transcript and persist them — that's it. The expensive work (cross-session dedup, clustering, diary deepening, decay) is scheduled automatically by the platform and is rate-limited so it doesn't run on every call.
Deep consolidation (wakeup/habit dedup, decay, cluster reconcile, weekly summaries)
Daily / weekly
Automatic schedule
Heavy
This means you can call /turn per turn (Pattern 1), or /process once at the end (Pattern 2), without paying for heavy consolidation each time. The platform de-duplicates and consolidates in the background.
Practical implication
Don't try to "save calls" by skipping /turn between turns. Each call only does sync mood + queues deferred extraction (cheap). Skipping it means losing per-turn behavioral signal. The expensive consolidation runs on its own schedule no matter how many times you call.
When you call session.context({ query }) (or GET /context), the endpoint searches the agent's knowledge base and includes matching results in a knowledge field automatically.
{ "personality_prompt": "You are a helpful AI companion...", "big5": { "openness": 0.7, "conscientiousness": 0.6, "extraversion": 0.5, "agreeableness": 0.8, "neuroticism": 0.3 }, "current_mood": { "valence": 0.4, "arousal": 0.2, "tension": -0.1, "affiliation": 0.3 }, "loaded_facts": [{ "atomic_text": "User prefers morning workouts", "fact_type": "behavioral", "importance": 0.8 }], "active_goals": [{ "description": "Run a 5K by June" }], "habits": [{ "label": "Daily exercise" }], "knowledge": { "results": [ { "content": "Refund policy: customers can request a full refund within 30 days...", "label": "Refund Policy", "type": "policy", "source": "policies.pdf", "score": 0.92 } ] }}
After /turn or /process extracts side effects, it also searches the KB with topics found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals — the nextsession.context() call includes them automatically.
Turn 1: session.context() → (no KB results yet)
↓
chat with your LLM
↓
session.turn() → extracts "hiking gear" as topic
→ searches KB, finds "Hiking Equipment Guide"
→ stores as proactive signal
Turn 2: session.context() → includes "Hiking Equipment Guide" from KB
+ any direct search results for the new query
↓
chat with your LLM (now knows about hiking gear!)
Want to use your own model without managing the chat loop? Consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint while keeping streaming, built-in tools, and per-message extraction fully automatic.
Managed mode calls built-in tools (web search, memory recall, image generation) automatically. In standalone mode you must implement tool calling yourself — the tool-calling loop is yours, but the resulting tool messages flow into /turn or /process for extraction. See the Tool Integration guide.
session.context(), /turn, and /process are synchronous request-response calls. Streaming is handled by your own LLM. Background extraction is asynchronous but you poll for state, not stream.
You must pick one of the three integration shapes per conversation: /process (one-shot batch), sessions.start → sessions.end({ messages }) (lifecycle batch), or sessions.start → session.turn() per turn → session.end() (real-time). Picking none means the transcript is never seen by the Context Engine and no behavioral data is captured. Picking two — for example calling .turn() per turn and passing messages on .end() — runs extraction twice on the same content. (Heavy consolidation runs on its own schedule and doesn't need to be triggered manually.)
Sonzai's extraction reads messages as text. Multimodal content (images, audio) must be bridged to text before submission — see Working with Images & Multimodal Input in Pattern 1.
What's the same in both modes
Extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood, habits, and consolidation. The 7-layer enriched context from session.context() is the same data the managed chat builds internally.
Pattern 1: Memory Middleware (Real-Time)
You control the LLM. Sonzai handles what that LLM knows about the user.
Open a Session once. For every turn: call session.context({ query }) to pull the enriched user profile, build your system prompt, call your own LLM (with your own tools), then call session.turn({ messages }) to submit just the new exchange. Sync mood updates inline (~300–500ms); deeper extraction (facts, personality, habits) lands asynchronously 5–15 seconds later in the background.
This is the same data model mem0 provides (relevant memories injected before generation), extended with personality evolution, mood tracking, habit detection, goal tracking, proactive outreach scheduling, and relationship dynamics.
session.context() and sessions.start use no Sonzai LLM credits — they are pure reads. session.turn(), /process, and sessions.end({ messages }) use Sonzai's LLM for fact extraction + session summary (light, per-call, billed). Heavy background work — cross-session dedup, clustering, diary, decay — runs on auto-scheduled jobs (8h post-session, daily, weekly) and is billed against the same tenant but not per-call. Your chat LLM is entirely your cost.
Open the session once with your provider/model defaults. Then for every turn: get context → call your LLM (running tool calls in your own loop) → submit the turn. End the session when done.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function runConversation(agentId: string, userId: string) {
const sessionId = `session-${Date.now()}`;
const history: { role: string; content: string }[] = [];
// Open a Session handle. agentId/userId/sessionId and provider/model
// defaults live on the handle so you don't repeat them on every call.
const session = await sonzai.agents.sessions.start(agentId, {
userId,
sessionId,
toolDefinitions: yourTools, // optional — register session-scoped tool schemas
provider: "gemini", // optional — default for .turn()
model: "gemini-3.1-flash-lite-preview", // optional — default for .turn()
});
async function turn(userMessage: string): Promise<string> {
// Fresh enriched context for this specific message
const ctx = await session.context({ query: userMessage });
// Your LLM — swap in any provider you like
let reply = await yourLLM.chat({
system: buildSystemPrompt(ctx),
messages: [...history, { role: "user", content: userMessage }],
tools: yourTools,
});
// Tool-calling loop is entirely yours — Sonzai is OUT of the loop here.
const toolMessages: any[] = [];
while (reply.tool_calls?.length) {
for (const call of reply.tool_calls) {
const result = await runYourTool(call);
toolMessages.push(
{ role: "assistant", tool_calls: [call] },
{ role: "tool", tool_call_id: call.id, content: result },
);
}
reply = await yourLLM.chat({
system: buildSystemPrompt(ctx),
messages: [...history, { role: "user", content: userMessage }, ...toolMessages],
tools: yourTools,
});
}
sendToUser(reply.content); // send first; don't block on Sonzai
// Submit just the new turn. Sync mood ~300ms, deferred extraction
// (facts, personality, habits) runs asynchronously 5–15s later.
// Pass the FULL exchange — including tool calls and tool results —
// so Sonzai can extract facts from tool outputs too.
const { mood, extraction_id } = await session.turn({
messages: [
{ role: "user", content: userMessage },
...toolMessages, // assistant tool_calls + tool results
{ role: "assistant", content: reply.content },
],
});
history.push({ role: "user", content: userMessage });
history.push({ role: "assistant", content: reply.content });
return reply.content;
}
return { turn, end: () => session.end() };
}
// The /context response is a flat object — there is no nested
// `profile` / `behavioral` / `memory` envelope.
function buildSystemPrompt(ctx: any): string {
const facts = (ctx.loaded_facts ?? []).map((f: any) => `- ${f.atomic_text}`).join("\n");
const goals = (ctx.active_goals ?? []).map((g: any) => g.description).join(", ");
return `${ctx.personality_prompt ?? "You are a helpful AI companion."}
Personality (Big5): ${JSON.stringify(ctx.big5 ?? {})}
Current mood: ${JSON.stringify(ctx.current_mood ?? {})}
Active goals: ${goals || "none"}
Relevant memories:
${facts || "none yet"}`;
}
The single most important habit in Pattern 1 is calling session.context(query=user_msg)before every LLM call. This is the load-bearing piece that closes the loop — without it, the LLM doesn't get the fresh mood (which lands inline on .turn()) or the freshly-extracted facts (which land 5–15 seconds after .turn()).
while (conversationActive) {
const userMsg = await getUserInput();
// 1. PULL FRESH CONTEXT — happens every turn, before the LLM call.
// ctx is a flat object — no `profile` / `behavioral` / `memory` envelope.
// Fields you'll usually read:
// ctx.personality_prompt — agent identity / instructions
// ctx.bio, ctx.speech_patterns — agent identity bits
// ctx.big5 — Big5 trait object
// ctx.current_mood — fresh inline (~300ms after .turn())
// ctx.habits, ctx.active_goals — behavioral state
// ctx.loaded_facts — recalled facts (5-15s lag from extraction)
// ctx.proactive_memories — pending proactive signals
// ctx.knowledge.results — KB hits (only nested key)
// ctx.recent_turns — buffered messages from this session
const ctx = await session.context({ query: userMsg });
// 2. Build system prompt from the context layers
const systemPrompt = renderPromptFromContext(ctx);
// 3. Run YOUR LLM — Sonzai is OUT of the loop here
const reply = await yourLLM.chat({
system: systemPrompt,
messages: [...history, { role: "user", content: userMsg }],
});
// 4. Submit the just-completed turn — sync mood + async deferred extraction
await session.turn({
messages: [
{ role: "user", content: userMsg },
{ role: "assistant", content: reply.content },
],
});
}
function renderPromptFromContext(ctx: any): string {
const parts: string[] = [];
if (ctx.personality_prompt) parts.push(ctx.personality_prompt);
if (ctx.big5) parts.push(`Personality (Big5): ${JSON.stringify(ctx.big5)}`);
if (ctx.speech_patterns?.length) parts.push(`Speech patterns: ${ctx.speech_patterns.join(", ")}`);
if (ctx.current_mood) parts.push(`Current mood: ${JSON.stringify(ctx.current_mood)}`);
const facts = (ctx.loaded_facts ?? []).slice(0, 5).map((f: any) => `- ${f.atomic_text ?? ""}`).join("\n");
if (facts) parts.push(`Relevant memories:\n${facts}`);
const kb = (ctx.knowledge?.results ?? []).slice(0, 3).map((r: any) => `- ${r.label}: ${(r.content ?? "").slice(0, 120)}`).join("\n");
if (kb) parts.push(`Knowledge base:\n${kb}`);
return parts.join("\n\n");
}
Save a roundtrip with fetchNextContext
session.turn() accepts a fetch_next_context={"query": next_user_message} argument (TS: fetchNextContext). When set, the server runs the deferred extraction trigger AND fetches the next /context payload in the same response, returning it under next_context. This eliminates the second roundtrip on the next turn — your client already has the context for turn N+1 by the time turn N has finished. Use this when you can predict the next user query (e.g., for the very next render of context).
Context freshness. Mood updates inline on each .turn() call (~300ms), so the very next .context() reflects the new mood. Personality / facts / inventory land 5–15 seconds after .turn() in the background, so they appear within a turn or two of being mentioned.
Why per-turn. State changes between turns. A user mentioning a new pet on turn 3 means turn 4's context should carry that fact. Skipping .context() between turns means the LLM works from stale state — and the value of a memory layer collapses.
Pass the actual user message as query.session.context() uses the query for memory recall, KB search, and proactive signal selection. Passing the raw user message gives the most relevant pull; passing a static placeholder gives generic context regardless of what the user asked.
Most agent harnesses (OpenAI Agents SDK, LangChain, LiveKit) own the
message log themselves — let them. But if you're rolling a thin LLM
loop and would rather not maintain a parallel history array on your
side, every /context response carries recent_turns: the raw
messages buffered by /turn for the current session, in
chronological order. Read them straight off the context payload.
const ctx = await session.context({ query: userMessage });
// Sonzai is the source of truth — no local history list needed.
const history = (ctx.recent_turns ?? []).map((t) => ({
role: t.role,
content: t.content,
}));
const reply = await yourLLM.chat({
system: buildSystemPrompt(ctx),
messages: [...history, { role: "user", content: userMessage }],
});
What's in the buffer. Last ~20 messages from the current session
only — text content, role, and a server-side timestamp. Capped at 20
turns and scoped to (agent_id, user_id, session_id); cross-session
history isn't there (use agents.memory.list_facts for that — facts
are the durable form).
What's not in the buffer. No system prompts, no tool_calls
arrays, no role: "tool" payloads, no image attachments. The buffer
mirrors the narrative you submitted to /turn, not the rich message
structure your LLM saw. If your conversation has tool calls or
multimodal content the LLM needs to re-read on the next turn, keep
your own history.
When the buffer is empty. Right after sessions.start (no turns
yet), or in degraded mode if Redis is down — the field is omitted, not
zero-length-with-error. Treat ctx.recent_turns ?? [] as a no-op.
The /turn schema accepts OpenAI/Anthropic-style tool messages: role: "tool" for tool results and tool_calls arrays on assistant messages. Pass the entire intermediate exchange — Sonzai's extractor reads tool results and can capture facts that only appeared in tool output (e.g. "user's last order shipped from Tokyo" from an order-lookup tool).
await session.turn({
messages: [
{ role: "user", content: "Where did my last order ship from?" },
{
role: "assistant",
tool_calls: [{ id: "call_1", type: "function", function: { name: "order-lookup", arguments: "{}" } }],
},
{
role: "tool",
tool_call_id: "call_1",
content: '{"order_id":"42","origin":"Tokyo","carrier":"DHL"}',
},
{ role: "assistant", content: "Your last order shipped from Tokyo via DHL." },
],
});
/turn returns immediately after the sync mood pass. The deeper extraction runs asynchronously and reaches done in 5–15s. You can poll the status if you need to gate something on it:
const { extraction_id } = await session.turn({ messages });
// Optional — only poll if you need to wait for facts/personality before doing something
let status = await session.status(extraction_id);
while (status.state !== "done" && status.state !== "failed") {
await new Promise((r) => setTimeout(r, 1000));
status = await session.status(extraction_id);
}
Pattern 1 hands the tool-calling loop entirely to you. Sonzai never executes a tool — but it does read tool calls and tool results out of the messages you submit on /turn, so the extractor can capture facts that surfaced inside a tool output. There are two flavors of tools you'll typically wire up.
Use whatever your agent framework provides — @function_tool in the OpenAI Agents SDK, tools= on Anthropic, function declarations on Gemini, @tool in LangChain. The pattern is the same: register the tool with your LLM, run the tool-calling loop on your side, and forward the full exchange (including the assistant's tool_calls message and the role: "tool" result message) to session.turn().
When the assistant says "It's 7:30 AM" and the user replies "Set my morning standup for 8", Sonzai's extractor sees the tool's actual output, not just the assistant's paraphrase — and can capture "user prefers 8 AM standups" with the right grounding.
You can also wrap Sonzai's own REST endpoints as tools your LLM can call mid-turn. The two most useful are knowledge base search and memory search — both let the LLM pull additional context on demand without you having to inject everything up-front through session.context().
// TypeScript — agents.memory.search is available directly
import { Sonzai } from "@sonzai-labs/agents";
import { tool } from "ai";
import { z } from "zod";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const kbSearch = tool({
description: "Search the agent's knowledge base.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
const res = await sonzai.agents.knowledgeSearch("agent-id", { query, limit: 5 });
return res.results.map((r) => `- ${r.label}: ${r.content}`).join("\n") || "No matching knowledge.";
},
});
const memorySearch = tool({
description: "Search the user's long-term memory.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
const res = await sonzai.agents.memory.search("agent-id", {
query,
user_id: "user-123",
limit: 5,
});
return res.results.map((r) => `- ${r.text}`).join("\n") || "No matching memories.";
},
});
Why expose Sonzai endpoints as tools?
session.context() returns the most relevant facts for the current query — a strong default. Exposing kb_search and memory_search as tools lets the LLM decide for itself when to dig deeper (e.g., when the user asks "what did I tell you last week about X?"). It's especially useful for agent frameworks that already think in terms of tools.
When the LLM calls these tools, the result lands in your tool-calling loop just like any other tool. Forward the full exchange to session.turn() and Sonzai's extractor will see the search results too — but be aware that re-extracting facts from a memory_search tool result can create echoes (the user's own past fact resurfaces as if it were new). Either skip extraction for those tool messages on your side, or trust the dedup pass.
Sonzai's memory pipeline is text-based today. The /turn and /process endpoints accept string content only — DialogueMessage.content is string. Your LLM can be fully multimodal (Gemini, Claude, GPT-4o all accept image URLs and audio natively) but to get image-related facts into Sonzai you need to bridge the multimodal content into text in the messages you send to /turn.
The recommended pattern is dual-output: have your vision-capable LLM produce both (a) the warm reply you show the user and (b) a hidden [MEMORY: ...] line with a detailed factual description. Strip the [MEMORY: ...] line out before showing the user, and embed it in the bridged text you submit to Sonzai.
import OpenAI from "openai";
import { Sonzai } from "@sonzai-labs/agents";
const gemini = new OpenAI({
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
apiKey: process.env.GEMINI_API_KEY!,
});
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const SYSTEM_PROMPT_IMAGE_AWARE = `You are a friendly companion. When the user shares an image, respond warmly
to what's emotionally important to THEM.
After your reply, ALWAYS include a single line:
[MEMORY: <detailed factual description of the image — setting, objects,
people, mood, time of day, what the user appears to be doing>]
The user does NOT see the [MEMORY: ...] line.`;
async function processImageTurn(session: any, userMsg: string, imageUrl: string): Promise<string> {
const result = await gemini.chat.completions.create({
model: "gemini-3.1-flash-lite-preview",
messages: [
{ role: "system", content: SYSTEM_PROMPT_IMAGE_AWARE },
{
role: "user",
content: [
{ type: "text", text: userMsg },
{ type: "image_url", image_url: { url: imageUrl } },
],
},
],
});
const raw = result.choices[0].message.content ?? "";
// Split the dual output
const m = raw.match(/\[MEMORY:\s*([\s\S]+?)\]/);
const memoryNote = m ? m[1].trim() : "";
const reply = raw.replace(/\[MEMORY:[\s\S]+?\]/, "").trim();
sendToUser(reply);
await session.turn({
messages: [
{ role: "user", content: `${userMsg}\n\n[Image attached: ${memoryNote}, URL: ${imageUrl}]` },
{ role: "assistant", content: reply },
],
});
return reply;
}
Why this pattern:
No backend multimodal yet./turn accepts string content. Text-bridging through your same vision-capable LLM is the cleanest workaround.
Why dual-output (vs. a separate vision call). The same LLM call serves both purposes — no extra cost, no extra latency, no second roundtrip. You're already paying for vision on the assistant turn; let it produce the description too.
Why a hidden line. Keeps user-facing replies emotionally warm — "Oh you have such nice shoulders!" — while still capturing the factual detail (gym, tank top, mirror, time of day) that memory extraction needs.
It's a developer pattern, not a Sonzai field. The [MEMORY: ...] convention is yours to define. Sonzai just sees text. You can use any sentinel — <<MEM>>...<</MEM>>, JSON, whatever your prompt and parser agree on.
Including the URL. Embedding the URL in the bridged text isn't required, but it lets Sonzai later surface the image as a memory artifact ("the photo you shared last week") without re-running vision on the image. Your app keeps using its own image storage; Sonzai just remembers the link as text.
Audio & voice follow the same pattern
Speech-to-text (STT) on your side, send the transcript in messages. Text-to-speech (TTS) is rendered after the assistant text exists, so you forward the assistant text to session.turn() exactly as you would for a text-only chat. See the Voice AI use case below.
Why text-only /turn is the design, not a placeholder
Memory is a layer of semantic understanding. The question Sonzai needs to answer next week is "what does this agent know about this user?" — not "what bytes did the LLM see?". Your vision-capable LLM has already understood the image; text-bridging passes that understanding through to extraction in the form the memory pipeline actually consumes (atomic facts, habits, inventory). Storing raw image bytes server-side would inflate cost without improving recall, and would re-couple your LLM choice to ours. The dual-output pattern keeps your harness fully in charge of perception.
The canonical Pattern 1 example. You bring your own agent harness — here the OpenAI Agents SDK — and route it at Gemini via the OpenAI-compat endpoint, so no OPENAI_API_KEY is ever used. Sonzai sits outside the LLM/tool-calling loop entirely: it supplies the system prompt via session.context() and ingests the finished transcript via session.turn(). The Agents SDK does all multi-step reasoning and tool dispatch on your side; Sonzai does memory.
import osfrom openai import AsyncOpenAIfrom agents import ( Agent, Runner, OpenAIChatCompletionsModel, function_tool, set_tracing_disabled,)from sonzai import Sonzai# The Agents SDK ships traces to OpenAI by default — disable, since we# have no OpenAI key and aren't talking to OpenAI's servers at all.set_tracing_disabled(True)# Point the Agents SDK's AsyncOpenAI client at Gemini's OpenAI-compat URL.gemini = AsyncOpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.environ["GEMINI_API_KEY"],)model = OpenAIChatCompletionsModel( model="gemini-3.1-flash-lite-preview", openai_client=gemini,)# Sonzai = memory layer only. It never sees the LLM client.sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])session = sonzai.agents.sessions.start( "agent-id", user_id="user-123", session_id="session-abc",)@function_tooldef get_current_time() -> str: """Return the current time.""" from datetime import datetime, timezone return datetime.now(timezone.utc).isoformat(timespec="seconds")while True: user_msg = input("You: ") if not user_msg: break # 1) Pull enriched context (mood, personality, relevant facts, …) from Sonzai. ctx = session.context(query=user_msg) mood = ctx.get("current_mood") or "neutral" instructions = f"You are a friendly companion. Current mood: {mood}." # 2) Run the Agents SDK loop — it handles tool-calling and multi-step reasoning. agent = Agent( name="Companion", instructions=instructions, model=model, tools=[get_current_time], ) result = Runner.run_sync(agent, user_msg) print(f"Assistant: {result.final_output}") # 3) Convert the run's items (assistant text + ToolCallItem + ToolCallOutputItem) # into Sonzai's tool-aware messages format. See the demo for the implementation. sonzai_messages = run_result_to_sonzai_messages(user_msg, result) # 4) Submit the turn. `mood` comes back inline (~300ms); facts / personality / # inventory are extracted asynchronously and land 5-15s later. turn_result = session.turn(messages=sonzai_messages) print(f" -> mood updated: {turn_result.mood}")session.end()
What's happening on each turn:
Sonzai is out of the LLM loop. The OpenAI Agents SDK runs the model, dispatches tools, and produces result.final_output. Sonzai never sees the LLM client and has no opinion on which model answered.
Mood is real-time.session.turn() returns fresh mood inline in ~300ms — you can render it the moment the response arrives.
Facts, personality drift, and inventory are deferred (5-15s). They run async under the returned extraction_id. Re-poll agents.memory.list_facts, agents.personality.get, etc. on the next turn; whatever didn't land yet will be there shortly.
Tool calls flow through to extraction. Sonzai's tool-aware message format accepts assistant messages with tool_calls plus a tool message carrying the result. The conversion helper packages the Agents SDK's ToolCallItem + ToolCallOutputItem into that shape so extraction can pick up facts from tool outputs too.
Want a working version? See the OpenAI Agents companion demo — a two-pane Streamlit app showing live mood, Big5, recent facts, inventory, and the constellation graph as you chat.
STT → enrich → LLM → TTS. Sonzai holds the memory; you own the audio pipeline. Submit the turn while TTS is synthesizing — sync mood is fast enough not to block, and deferred extraction never blocks.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processVoiceTurn(
session: any, // Session handle from sonzai.agents.sessions.start
audioBuffer: Buffer
): Promise<Buffer> {
// Your STT
const transcript = await yourSTT.transcribe(audioBuffer);
// Inject memory into a concise voice-friendly system prompt
const ctx = await session.context({ query: transcript });
const systemPrompt = `${ctx.personality_prompt ?? "You are a voice companion."} Keep replies under 2 sentences for voice.
Mood: ${JSON.stringify(ctx.current_mood)}.
Key memory: ${ctx.loaded_facts?.[0]?.atomic_text ?? "none"}.`;
const reply = await yourLLM.chat({ system: systemPrompt, message: transcript });
// Submit the turn while TTS synthesizes (run in parallel)
const [audioResponse] = await Promise.all([
yourTTS.synthesize(reply),
session.turn({
messages: [
{ role: "user", content: transcript },
{ role: "assistant", content: reply },
],
}),
]);
return audioResponse;
}
Sonzai injects user context into the agent's system prompt. The framework handles tool calling, multi-step reasoning, and memory of the current conversation; Sonzai handles what the agent knows about the user across sessions. Send the full transcript including any tool messages to session.turn() so extraction can pick up facts from tool results.
import { ChatOpenAI } from "@langchain/openai";
import { SystemMessage, HumanMessage, AIMessage } from "@langchain/core/messages";
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const llm = new ChatOpenAI({ model: "gpt-4o", tools: yourToolSchemas });
async function agentTurn(
session: any,
userInput: string,
messageHistory: (HumanMessage | AIMessage)[]
): Promise<string> {
const ctx = await session.context({ query: userInput });
const messages = [
new SystemMessage(buildSystemPrompt(ctx)),
...messageHistory,
new HumanMessage(userInput),
];
// Run the agent's full tool-calling loop on your side, then surface
// every intermediate message (assistant tool_calls + tool results)
// to Sonzai so it can extract from them.
const { reply, intermediate } = await runLangchainAgent(llm, messages);
await session.turn({
messages: [
{ role: "user", content: userInput },
...intermediate,
{ role: "assistant", content: reply },
],
});
return reply;
}
Route to different models based on task type while Sonzai stitches user memory across all of them. The Session-level provider/model default is just a default — every .turn() can override.
Endpoint walkthrough — full reference for sessions.start, context, turn, process, end, and read endpoints
KB & limitations — knowledge base behavior in standalone mode and what's not supported
Pattern 2: Post-Session Batch Processing
You own the entire conversation. Sonzai never sees it in real time. When the conversation ends, you send the full transcript to either /process or sessions.end({ messages }). Sonzai extracts facts, updates the user's behavioral profile, and makes the insights available via the API — ready for personalization, analytics, push notifications, or next-session context.
This pattern is ideal when Sonzai being in the hot path is undesirable (or impossible) — latency-sensitive real-time interactions, apps with their own LLM loop already in production, or cases where you want to process transcripts in bulk after the fact.
/process and sessions.end({ messages }) are functionally equivalent for batch ingest — both extract facts and side-effects from the full transcript inline. Don't do both for the same transcript or extraction runs twice. Use /process if you want a single call (it auto-creates the session and surfaces the generated session_id in the response). Use sessions.start + sessions.end({ messages }) if you want explicit lifecycle, async polling, or session-scoped tools.
Option A — /process only. One call. Auto-creates a session if you don't pass one. Returns the auto-generated session_id so you can correlate later.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string; tool_calls?: any[] }[]
) {
const result = await sonzai.agents.process(agentId, {
userId,
messages: transcript, // tool messages allowed
provider: "gemini", // optional override
model: "gemini-3.1-flash-lite-preview", // optional override
});
// result.session_id is the auto-created session id when none was passed.
// Read the extracted facts/mood/etc. via the dedicated endpoints below.
return result;
}
Option B — Explicit sessions.start + sessions.end({ messages }). Use this when you want async processing, session-scoped tools, or explicit lifecycle ownership.
async function processTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string }[]
) {
const sessionId = `session-${Date.now()}`;
const session = await sonzai.agents.sessions.start(agentId, { userId, sessionId });
// Pass the full transcript on end — extraction happens here, not via /process.
// sessions.end({ messages }) is functionally equivalent to /process({ messages }).
const result = await session.end({
messages: transcript,
totalMessages: transcript.length,
});
return result;
}
Pick one. The two options are equivalent for fact extraction — chaining them just runs extraction twice on the same messages.
Before the session, pull the student's profile to personalize the curriculum. After the session, extract what was learned and generate targeted practice exercises. One call to /process is enough.
Pull the user's fitness context before the workout for a personalized greeting. After the workout, send the session log to Sonzai to track habits, mood, and progress — without Sonzai ever being in the real-time exercise loop.
Your sales team runs calls through their existing tooling (Gong, Zoom, your own recorder). After each call, send the transcript to Sonzai to build a persistent customer profile.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processSalesCall(
agentId: string,
customerId: string,
callId: string,
callTranscript: { role: "user" | "assistant"; content: string }[],
durationSeconds: number
) {
// Use the explicit lifecycle so we can pass durationSeconds.
const session = await sonzai.agents.sessions.start(agentId, {
userId: customerId,
sessionId: `call-${callId}`,
});
const result = await session.end({
messages: callTranscript,
totalMessages: callTranscript.length,
durationSeconds,
});
// Read extractions back from the analytics endpoints.
const personality = await sonzai.agents.personality.get(agentId);
// ...build CRM update from result + dedicated read endpoints
return result;
}
Your app handles the journaling conversation. After each session, send to Sonzai to track mood trends, detect emotional breakthroughs, and surface proactive insights.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function afterJournalingSession(
agentId: string,
userId: string,
journalTranscript: { role: "user" | "assistant"; content: string }[]
) {
await sonzai.agents.process(agentId, { userId, messages: journalTranscript });
// After /process, extracted state is available on the read endpoints.
// Proactive outreach (check-ins, reminders) is exposed via the
// notifications resource — not on the /process response.
const [mood, notifications] = await Promise.all([
sonzai.agents.getMood(agentId, { userId }),
sonzai.agents.notifications.list(agentId),
]);
if ((mood?.valence ?? 0) < -0.4) {
await sendWellnessAlert(userId, {
message: "It sounds like you're going through a tough time. We're here for you.",
});
}
for (const notif of notifications) {
if (notif.user_id === userId) {
await scheduleReminder(userId, notif.generated_message, notif.scheduled_for);
}
}
await updateMoodDashboard(userId, { valence: mood?.valence, energy: mood?.arousal });
}
A custom state is a key-value record scoped to an agent + user (or just an agent). Values can be any JSON-serializable type: strings, numbers, booleans, arrays, or nested objects.
Unlike memory (which is unstructured text extracted from conversations), custom states are structured data you write explicitly from your backend. The agent can read them via the get_custom_state tool during conversation, so it always knows the user's current tier, streak, balance, etc.
When the agent has access to the get_custom_state tool (enabled automatically when custom states exist), it fetches current state at the start of a conversation. You can also read it from your backend at any time.
// Read by key from your backend
const state = await client.agents.customStates.getByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
const progress = state.value as {
tier: string; score: number; score_to_next: number; streak_days: number;
};
console.log(`${progress.tier} tier · ${progress.score}/${progress.score_to_next} pts · ${progress.streak_days}-day streak`);
During conversation, the agent calls get_custom_state("user_progress") and incorporates the progress data into its responses naturally — no prompt injection required.
Use upsert from your backend whenever the user's state changes — after a session ends, after a purchase, or on a schedule. upsert creates the state if it doesn't exist, or replaces it if it does.
// Called after each work session ends
async function onSessionEnd(userId: string, sessionScore: number) {
const current = await client.agents.customStates.getByKey(AGENT_ID, {
userId,
key: "user_progress",
}).catch(() => null);
const tiers = ["bronze", "silver", "gold", "platinum"];
const prev = (current?.value ?? { tier: "bronze", score: 0, score_to_next: 1000, streak_days: 0 }) as {
tier: string; score: number; score_to_next: number; streak_days: number; milestones: string[];
};
const newScore = prev.score + sessionScore;
const promoted = newScore >= prev.score_to_next;
const tierIndex = tiers.indexOf(prev.tier);
const newTier = promoted ? (tiers[tierIndex + 1] ?? prev.tier) : prev.tier;
await client.agents.customStates.upsert(AGENT_ID, {
userId,
key: "user_progress",
value: {
tier: newTier,
score: promoted ? newScore - prev.score_to_next : newScore,
score_to_next: promoted ? prev.score_to_next * 1.5 : prev.score_to_next,
streak_days: prev.streak_days + 1,
milestones: prev.milestones,
},
});
if (promoted) {
// Notify the agent so it can congratulate the user next session
await client.agents.triggerBackendEvent(AGENT_ID, {
userId,
eventType: "tier_promotion",
eventDescription: `User promoted from ${prev.tier} to ${newTier}`,
metadata: { new_tier: newTier, previous_tier: prev.tier },
});
}
}
Workflow events let your backend tell the agent about something that happened outside the conversation. The next time the user chats, the agent sees the pending event and reacts naturally.
// Trigger from your backend when something notable happens
await client.agents.triggerBackendEvent(AGENT_ID, {
userId: USER_ID,
eventType: "task_complete",
eventDescription: "Q1 Revenue Analysis completed — deliverable: Revenue Report, category: Analytics, time: 3h 42m",
metadata: {
task_name: "Q1 Revenue Analysis",
deliverable: "Revenue Report",
category: "Analytics",
time_taken: "3h 42m",
},
});
// Next time the user opens a conversation:
// Agent: "I see you finished the Q1 Revenue Analysis! That report is a key
// deliverable. Want to discuss the findings or start the next task?"
Event delivery
Workflow events are queued and delivered on the next conversation turn. They don't interrupt an active session. The agent consumes pending events at the start of the next chat or chatStream call and incorporates them into its opening message or first response.
Use update when you want to change a state by its state_id. Unlike upsert, update does a partial merge — you only need to pass the fields you want to change.
// Add a milestone without overwriting the whole state
const state = await client.agents.customStates.getByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
const progress = state.value as { milestones: string[]; [k: string]: unknown };
await client.agents.customStates.update(AGENT_ID, state.state_id, {
value: {
...progress,
milestones: [...progress.milestones, "100_tasks"],
},
});
Delete a state by its ID or by key. On next conversation, the agent won't have access to it.
// Delete by key (finds and removes the state)
await client.agents.customStates.deleteByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
// Or delete by state_id if you already have it
await client.agents.customStates.delete(AGENT_ID, stateId);
Create a onboarding state on sign-up with { step: 0, completed: false }. The agent checks it at the start of early conversations and guides the user through setup naturally.
Subscription context
Store { plan: 'pro', expires_at: '...' } so the agent knows which features to offer or upsell without you having to pass it in every chat request.
Daily summary cache
Write a daily_summary state at the end of each day with key metrics. The agent opens the next-day conversation referencing the user's activity — "Yesterday you completed 3 tasks and hit a 12-day streak. Ready to keep going?"
Schemas tell the KB what fields to store for each entity type. Create one for software_license so the platform knows how to validate and store your license data.
Insert your first set of entities using insertFacts. This is also how you load historical data before going live. Include relationships so the KB can surface alternative or complementary tool recommendations.
Run a cost-sync job on a schedule (e.g. daily cron) that fetches current pricing from your vendor data source and pushes it into the KB. bulkUpdate merges properties into existing nodes matched by label — no need to delete and re-insert.
// cost-sync.ts — run daily
import { Sonzai } from "@sonzai-labs/agents";
import { fetchLatestPricing } from "./vendor-api"; // your data source
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const PROJECT_ID = "proj_abc123";
async function syncPricing() {
const pricing = await fetchLatestPricing(); // [{ name, price, trend }]
await client.knowledge.bulkUpdate(PROJECT_ID, {
updates: pricing.map((license) => ({
entity_type: "software_license",
label: license.name,
properties: {
market_price: license.price,
trend_30d: license.trend,
last_synced: new Date().toISOString(),
},
})),
});
console.log(`Synced ${pricing.length} license prices`);
}
syncPricing();
Batch size
Batches of ≤100 items are processed synchronously (immediate response). Larger batches are queued and processed asynchronously — the response includes a job ID you can poll for completion.
Enable the inventory and knowledge capabilities on your agent. This gives the agent the sonzai_inventory_update and sonzai_inventory tools automatically — no prompt engineering required.
const AGENT_ID = "agent_xyz";
await client.agents.updateCapabilities(AGENT_ID, {
inventory: true, // enables sonzai_inventory_update + sonzai_inventory tools
knowledge: true, // enables knowledge_search tool
project_id: PROJECT_ID, // which KB to join against
});
You can also set this from the dashboard: go to Agents → your agent → Capabilities and toggle Inventory on.
Once inventory is enabled, the agent calls sonzai_inventory_update on its own whenever a user mentions a tool or subscription they use. You just chat normally — the platform does the KB resolution and storage.
// Your backend chat endpoint
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: "user_123",
messages: [
{
role: "user",
content: "We just provisioned 10 Figma Enterprise seats at $75/seat.",
},
],
})) {
// The agent streams its reply — and internally calls
// sonzai_inventory_update({ action: "add", item_type: "software_license",
// label: "Figma Enterprise", description: "Figma Enterprise design tool subscription",
// properties: { plan: "Enterprise", purchase_price: 75, quantity: 10 } })
// The platform resolves the KB node, stores the link, and the agent
// continues the conversation without interruption.
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
How KB resolution works
The platform searches the KB for the item description. If exactly one node matches, it links automatically. If there are multiple candidates, the response returns status: "disambiguation_needed" with a list of candidates so the agent can ask the user to clarify.
Use mode="value" to get each user resource joined with the latest KB pricing data. The platform computes gain_loss automatically: (market_price - purchase_price) × quantity.
You can also use mode="aggregate" with the aggregations parameter to get portfolio-level totals without listing every resource — useful for organizations with many subscriptions.
If a user already has an existing set of subscriptions (from a CSV, a procurement system export, etc.), import them in bulk rather than waiting for the agent to discover each resource in conversation.
The batch endpoint processes up to 1,000 items per call. For larger imports, split into multiple calls or use the CSV priming feature in the dashboard.
Set up a recommendation rule in the KB to surface alternative tools to users
Add trend tracking (7d/30d/90d) to power "biggest cost increases" reports
View per-user inventory live in Dashboard → Agents → your agent → Users → select user → Inventory / Assets
Read the Knowledge Base reference for schemas, analytics rules, and full-text search
Medication Reminders
This tutorial walks through a full medication-reminder implementation: define a medication entity type in your knowledge base, seed medications per user, create a Scheduled Reminder linking each medication to a cadence, and the agent proactively messages the user at the scheduled time — naming the medication and dosage in its own voice.
Tenant-agnostic primitive. The Sonzai platform has no medication-specific code. This tutorial wires two generic primitives — Inventory and Scheduled Reminders — into a medication use case. The same pattern works for watering plants, exercise reminders, bill payments, or any recurring-with-structured-data use case.
This is not a medical device. Reminders are a user-experience feature, not a clinical safety mechanism. Do not rely on Sonzai scheduled reminders as the sole adherence path for patients where missed doses cause harm.
Create a schema for the medication entity type so the platform knows how to store and index each drug's properties. The name and ndc_code fields are indexed for fast lookup; dosage, instructions, and prescribed_by are stored but not indexed (they are fetched whole at fire time).
You only need to create the schema once per project. All subsequent medication items written for any user will be validated and indexed against this definition.
Insert one medication into the user's inventory using inventory.update with action: "add". Store the returned fact_id — you will pass it to the schedule in the next step.
Create a twice-daily schedule at 08:00 and 20:00 Asia/Singapore, with active_window.hours set as a belt-and-braces quiet-hours guard. Pass the inventory_item_id returned in step 2. The platform will fetch the live item properties at every fire — no re-registration required when the dosage changes.
const schedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "Asia/Singapore",
},
active_window: {
hours: { start: "07:00", end: "22:00" },
},
intent: "remind the user to take their ibuprofen at the correct dose",
check_type: "reminder",
inventory_item_id: inventoryItemId,
metadata: { reminder_category: "medication" },
});
const scheduleId = schedule.schedule_id;
console.log(scheduleId); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-05-02T00:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-05-02T08:00:00+08:00"
What each field controls:
Field
Role
cadence.simple.times
Wall-clock fire times in the schedule's timezone
cadence.timezone
Per-user IANA zone; the platform does not auto-detect the user's location
active_window.hours
Quiet-hours guard; fires computed outside the window are skipped, not deferred
intent
The why the agent grounds its message in — written as a short natural-language instruction
inventory_item_id
Links to the medication's structured properties, fetched live at every fire
metadata
Opaque developer tags surfaced to the agent as "Additional context" in the wakeup block
When the schedule fires at 08:00 Singapore time, the platform assembles a structured intent block and delivers it to the agent as a proactive wakeup. The agent composes its opening message in its own voice using the intent and the injected inventory properties. A typical output might look like:
"Morning — quick reminder, it's 8 o'clock. Time for your 500mg of ibuprofen, and remember to take it with food."
Exact wording depends on the agent's personality configuration. The agent is not given a fixed template — it receives the intent and inventory data and decides how to phrase it naturally.
Updating the dosage. When a doctor reduces the ibuprofen dose from 500mg to 250mg, update the inventory item:
await client.agents.inventory.directUpdate(AGENT_ID, USER_ID, inventoryItemId, {
properties: {
dosage: "250mg",
},
});
// No schedule edit required.
// The next scheduled fire automatically reads "250mg" from the live item.
This separation is intentional: inventory is the source of truth for the what; the schedule is the source of truth for the when. They change independently. Changing the dose never touches the schedule row; moving a reminder time never touches the medication item.
For medications with a fixed course length, use starts_at and ends_at to auto-disable the schedule when the course completes. Here is a 3x/day amoxicillin course that fires every 8 hours over 14 days:
const amoxItem = await client.agents.inventory.update(AGENT_ID, USER_ID, {
action: "add",
item_type: "medication",
label: "Amoxicillin",
description: "broad-spectrum antibiotic, penicillin class",
project_id: PROJECT_ID,
properties: {
medication_name: "amoxicillin",
dosage: "500mg",
instructions: "complete the full course even if you feel better",
prescribed_by: "Dr. Tan",
},
});
const amoxSchedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "interval_hours", interval_hours: 8 },
timezone: "Asia/Singapore",
},
active_window: {
hours: { start: "07:00", end: "23:00" },
},
intent: "remind the user to take their amoxicillin — emphasise completing the full course",
check_type: "reminder",
inventory_item_id: amoxItem.fact_id,
metadata: { reminder_category: "medication" },
starts_at: "2026-05-01T00:00:00Z",
ends_at: "2026-05-15T00:00:00Z",
});
After ends_at passes, the schedule is automatically disabled (enabled flips to false). The inventory item for amoxicillin remains as a historical record and can be queried via the Memory API. No cleanup is required.
Create one schedule per medication. Three daily medications = three schedules. Fires that land at the same wall-clock time produce separate proactive messages by design — each message is grounded in its own medication's inventory item.
Avoid simultaneous fires. If you want the user to receive distinct messages rather than a burst, stagger the times across schedules:
Medication
Schedule times
Metformin
["08:00", "20:00"]
Atorvastatin
["08:15"]
Vitamin D
["08:30"]
Alternative: compose a "morning routine" item. If you prefer a single message covering all morning medications, create one inventory item of type medication_routine (define its own schema) with a medications property that lists all drugs and doses. Attach that single item to a single 08:00 schedule. The agent receives all the structured data in one wakeup block and can address all medications in a single message.
When the user replies "I took it, thanks" or similar, the agent's memory layer auto-captures this as a fact on the user. You can query recent user responses to a medication reminder via the Memory API:
// Query recent memory facts mentioning medication adherence
const memories = await client.agents.memory.search(AGENT_ID, {
query: "medication taken ibuprofen",
limit: 10,
});
for (const result of memories.results) {
console.log(result.content); // "User confirmed taking 500mg ibuprofen on 2026-05-02"
console.log(result.score); // e.g. 0.91
}
For a harder signal, add a POST /adherence/{scheduleId} endpoint in your tenant backend that your mobile or web app calls when the user taps a confirmation button. This gives you a structured event log independent of the conversational memory layer. Sonzai does not provide this endpoint — it lives in your own backend and stores data in your own database.
Patch the schedule's cadence.timezone whenever the user's preferred timezone changes. Future fires are immediately recomputed in the new zone; past fire history is not modified.
// User travelling from Singapore to Los Angeles
await client.schedules.update(AGENT_ID, USER_ID, scheduleId, {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "America/Los_Angeles",
},
});
// Next fire: 08:00 PDT (Los Angeles) — not 08:00 SGT
Any cadence tick after 21:00 or before 07:00 is discarded. A twice-daily schedule with times ["08:00", "20:00"] would still fire at both times; adding a 22:00 dose would be silently skipped.
Night-shift user — active overnight, sleeping during the day.
When start is greater than end, the window wraps midnight. This user receives reminders from 22:00 to 05:59 the next morning, and any cadence ticks during daytime hours are skipped.
Start chatting. Memory extraction happens automatically after the response streams. Nothing special needed on your end.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const AGENT_ID = "agent_abc";
const USER_ID = "user_123";
// First conversation — agent has no memory yet
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: USER_ID,
messages: [
{ role: "user", content: "My name is Mia. I'm allergic to peanuts and I love hiking." },
],
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
// Platform extracts: name="Mia", allergy="peanuts", interest="hiking"
// Second conversation — agent recalls all of the above
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: USER_ID,
messages: [
{ role: "user", content: "What snacks should I bring on my next hike?" },
],
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
// Agent knows Mia loves hiking and is allergic to peanuts — no re-intro needed.
Memory is per-user
Facts extracted from user A's conversation are never surfaced to user B. Always pass userId (or user_id / UserID) in every chat call so the platform scopes memory correctly.
If a user has history in your system — a CRM profile, onboarding answers, past orders — inject it before the first conversation so the agent feels like it already knows them.
// Call once during onboarding or after CRM import
await client.agents.memory.seed(AGENT_ID, {
userId: USER_ID,
memories: [
{
content: "Mia is a 32-year-old UX designer based in Berlin.",
type: "user_fact",
},
{
content: "Mia subscribed to the Pro plan on 2024-11-03.",
type: "shared_experience",
occurred_at: "2024-11-03T00:00:00Z",
},
{
content: "Mia prefers email over SMS for notifications.",
type: "user_preference",
},
{
content: "Mia mentioned she wants to get into trail running.",
type: "user_goal",
},
],
});
Query the memory store directly to find what the agent has extracted about a topic. Useful for building user-facing "what does my agent remember?" features or for debugging.
const results = await client.agents.memory.search(AGENT_ID, {
query: "diet restrictions food allergies",
limit: 10,
});
for (const fact of results.results) {
console.log(`[${fact.factType}] ${fact.content} (score: ${fact.score})`);
}
// [user_fact] Mia is allergic to peanuts (score: 0.97)
// [user_preference] Mia prefers nut-free snacks on hikes (score: 0.85)
The memory tree is a 7-level hierarchy that organises facts by category (/identity/traits, /preferences/interests, /episodes/sessions, etc.). You can walk it node by node.
// Get top-level nodes
const tree = await client.agents.memory.list(AGENT_ID, {
userId: USER_ID,
includeContents: false, // just node metadata, no fact text
});
for (const node of tree.nodes) {
console.log(`${node.nodeId} — ${node.title} (${node.summary})`);
}
// Drill into a node
const identityNode = await client.agents.memory.list(AGENT_ID, {
userId: USER_ID,
parentId: "node_identity_traits_id",
includeContents: true, // include fact text
});
You can explore the memory tree interactively in the dashboard under Agents → your agent → Users → select user → Memory → Tree Explorer.
The timeline shows every fact in chronological order — when it was created, updated, or superseded. Use it to audit memory growth or build a "conversation history" view.
const timeline = await client.agents.memory.timeline(AGENT_ID, {
userId: USER_ID,
// Optional: narrow to a date range
start: "2025-01-01T00:00:00Z",
end: "2025-12-31T23:59:59Z",
});
for (const session of timeline.sessions) {
console.log(
`Session ${session.sessionId}: ${session.factCount} facts (${session.firstFactAt})`
);
for (const fact of session.facts) {
console.log(` ${fact.atomicText}`);
}
}
For admin UIs or compliance exports, list all raw facts for a user without going through the tree hierarchy. Supports filtering by factType (TS) / fact_type (Python/Go).
// All facts for this user (paginated)
const facts = await client.agents.memory.listFacts(AGENT_ID, {
userId: USER_ID,
limit: 50,
offset: 0,
factType: "user_preference", // optional filter
});
console.log(`Total facts: ${facts.totalCount}`);
for (const f of facts.facts) {
console.log(` ${f.content}`);
}
GDPR / right to erasure
To delete all memory for a user, call client.agents.memory.reset(agentId, { userId }). This creates tombstone records that prevent deleted facts from being re-surfaced; the data is removed from retrieval immediately.
The time machine lets you see what the agent knew about a user at any specific point in the past — useful for debugging why the agent said something, or for auditing how its understanding evolved.
const snapshot = await client.agents.getTimeMachine(AGENT_ID, {
userId: USER_ID,
at: "2025-03-01T00:00:00Z", // what did the agent know at this moment?
});
console.log("Personality at 2025-03-01:", snapshot.personalityAt);
console.log("Mood at 2025-03-01:", snapshot.moodAt);
for (const event of snapshot.evolutionEvents) {
console.log(` ${event.traitName}: ${event.oldValue} → ${event.newValue}`);
}
How supersession works
When a fact is updated, the old record is retired (not deleted) and a new one is created with a SupersedesID pointer. The time machine replays this chain to reconstruct the state at any timestamp.
A daily 09:00 Asia/Singapore check-in schedule that fires a proactive agent message every morning
An every-4-hours schedule with a quiet-hours active window that skips fires outside allowed hours
A bounded interval_hours course constrained by starts_at and ends_at — useful for multi-week programs
An understanding of how the same primitive powers the full Medication Reminders worked example
Scheduled Reminders are a first-class primitive: the platform recomputes next_fire_at after every fire, respects DST transitions automatically, and injects inventory context live at fire time so your agent always has current data.
Register a schedule by calling POST /api/v1/agents/{agentId}/users/{userId}/schedules. The body describes when to fire (cadence), what the agent should do (intent), and optional scoping fields (active_window, inventory_item_id, starts_at, ends_at).
Here is a minimal daily 09:00 SGT check-in:
{ "cadence": { "simple": { "frequency": "daily", "times": ["09:00"] }, "timezone": "Asia/Singapore" }, "intent": "check in on how the user is feeling", "check_type": "reminder"}
And a full example with all optional fields:
{ "cadence": { "simple": { "frequency": "daily", "times": ["09:00"] }, "timezone": "Asia/Singapore" }, "active_window": { "hours": { "start": "08:00", "end": "22:00" }, "days_of_week": ["mon", "tue", "wed", "thu", "fri"] }, "intent": "check in on how the user is feeling", "check_type": "reminder", "inventory_item_id": "01HX8F...", "metadata": { "campaign": "daily_checkin_v2" }, "starts_at": "2026-05-01T00:00:00Z", "ends_at": "2026-05-14T23:59:59Z"}
The response includes schedule_id, next_fire_at (UTC), and next_fire_at_local (the same instant expressed in the schedule's timezone — useful for displaying to the user).
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const AGENT_ID = "agent_abc";
const USER_ID = "user_123";
const schedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "check in on how the user is feeling",
check_type: "reminder",
});
console.log(schedule.schedule_id); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-05-02T01:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-05-02T09:00:00+08:00"
Wall-clock times in HH:MM (24-hour), evaluated in the schedule's timezone
days_of_week
string[]
Yes for weekly
"mon", "tue", "wed", "thu", "fri", "sat", "sun"
interval_hours
number
Yes for interval_hours
Minimum 1, maximum 24
timezone
IANA string
Yes
Applied to times and days_of_week evaluation
A weekly schedule fires on the specified days at each listed time. A daily schedule fires every day at each listed time. An interval_hours schedule fires repeatedly at that interval starting from starts_at (or schedule creation if starts_at is omitted), bounded by the active window.
Standard 5-field cron — no seconds field. Example: "0 9 * * 1-5" fires at 09:00 on weekdays.
Rate limits. Cadences that resolve to more than one fire per minute are rejected with CADENCE_TOO_FREQUENT. Cadences that produce more than 96 raw ticks per 24-hour rolling window (before active-window filtering) are rejected with CADENCE_TOO_DENSE. For most use cases interval_hours: 1 (24 raw ticks/day) is the densest practical setting.
Every schedule requires a timezone field containing a valid IANA timezone name (e.g. "Asia/Singapore", "America/New_York", "Europe/London"). Offsets like "+08:00" are not accepted.
All cadence math — wall-clock time evaluation, days_of_week membership, DST skip logic — runs in the schedule's own timezone. The result is stored and returned as next_fire_at in UTC. next_fire_at_local is a convenience field that expresses the same instant with the zone offset applied.
When a user travels or changes their preferred timezone, patch the schedule timezone directly:
// User moved from Singapore to London
await client.schedules.update(AGENT_ID, USER_ID, scheduleId, {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Europe/London",
},
});
DST handling. On spring-forward transitions, a wall time that falls into the clocks-forward gap (e.g. 02:30 in a zone that jumps 02:00 → 03:00) is non-existent. The platform skips that occurrence and fires at the next valid occurrence. On fall-back transitions, a wall time that exists twice is never double-fired — the platform fires once and advances.
The active_window field restricts which fires actually produce a proactive wakeup. Fires computed by the cadence that land outside the window are skipped, not deferred — the cadence grid stays perfectly predictable and no backlog accumulates.
Both sub-fields are optional within active_window. You may specify hours only, days_of_week only, or both.
Overnight windows. When start is greater than end, the window wraps midnight. For example {"start": "22:00", "end": "06:00"} allows fires from 22:00 to 05:59 the next morning. This is useful for night-shift workers or schedules targeting early-morning time zones where local midnight matters.
Allowed days. Values must be lowercase three-letter abbreviations: "mon", "tue", "wed", "thu", "fri", "sat", "sun". Day membership is evaluated in the schedule's timezone, so a fire at 23:30 Friday Singapore time stays Friday even when stored as 15:30 UTC (Saturday in some zones).
Empty days array. Passing "days_of_week": [] (an explicit empty list) is rejected with INVALID_ACTIVE_WINDOW — it would produce a schedule that can never fire. To allow all days, omit the days_of_week field entirely.
Pass inventory_item_id on the create (or patch) body to associate a schedule with a specific item from the user's resource inventory. The item's properties are injected live at fire time — not at schedule creation — so any mid-program updates to the item (e.g. a medication dosage change, a price update) are automatically reflected in the agent's proactive message without requiring any schedule modification.
{ "cadence": { "simple": { "frequency": "daily", "times": ["08:00"] }, "timezone": "Asia/Singapore" }, "intent": "remind the user to take their morning medication", "check_type": "reminder", "inventory_item_id": "01HX8FKZQ3..."}
At fire time the platform fetches the current item properties and appends them to the intent block the agent receives. The Medication Reminders tutorial shows a complete worked example including how to structure medication inventory items for maximum agent context.
Graceful degradation. If the referenced inventory item is deleted before a fire occurs, the schedule continues firing. The intent block is delivered without the Reference item section — the agent receives the intent and metadata fields as normal. No error is surfaced to the user; the schedule itself is not affected.
Use starts_at and ends_at to create a time-bounded program. Both fields are optional and accept RFC 3339 UTC timestamps.
{ "cadence": { "simple": { "frequency": "interval_hours", "interval_hours": 4 }, "timezone": "Asia/Singapore" }, "active_window": { "hours": { "start": "08:00", "end": "22:00" } }, "intent": "prompt the user to log a pain score", "check_type": "check_in", "starts_at": "2026-05-01T00:00:00Z", "ends_at": "2026-05-14T23:59:59Z"}
starts_at — no fire is produced before this timestamp. Cadence expansion begins from this point. If omitted, the schedule starts immediately.
ends_at — once this timestamp passes, the schedule is automatically disabled (enabled flips to false). The row is not deleted, so the audit trail and historical fire log remain accessible.
Passing ends_at that is less than or equal to starts_at returns INVALID_WINDOW. Passing a past ends_at at creation time also returns INVALID_WINDOW — a schedule that has already expired cannot be created.
GET /api/v1/agents/{agentId}/users/{userId}/schedules/{id}/upcoming?limit=N returns the next N computed fire times as an array of UTC timestamps. The preview applies the active window, so what you see is exactly what will fire.
For example, a 4-hourly schedule (interval_hours: 4) with an 08:00–22:00 active window produces at most 4 fires per calendar day (08:00, 12:00, 16:00, 20:00 local) — not 6 (which would be the raw cadence count before filtering). The preview array reflects this.
When a schedule fires, the platform constructs a structured intent block and delivers it to the agent as a proactive wakeup. The block looks like this:
[PROACTIVE WAKEUP — SCHEDULED REMINDER]Why you're reaching out: check in on how the user is feelingScheduled fire time (user's local): 2026-05-02T09:00:00+08:00Reference item (from inventory): Daily Vitamin D dosage: 1000 IU form: softgel timing_notes: take with foodAdditional context: campaign: daily_checkin_v2
Key points:
[PROACTIVE WAKEUP — SCHEDULED REMINDER] — the stable header the agent detects to know it is initiating a conversation, not responding to one.
Why you're reaching out — verbatim content of the intent field you set on the schedule. Write this as a short natural-language instruction to the agent. The agent composes the actual opening message in its own voice — no prompt template is exposed; you control intent, not wording.
Scheduled fire time (user's local) — the next_fire_at_local value at fire time. Useful for agents that want to acknowledge the time explicitly ("Good morning" vs "Good afternoon").
Reference item (from inventory) — present only if inventory_item_id was set and the item still exists. The item's label and all of its properties are included. Item properties are fetched live at fire time.
Additional context — present only if metadata was set. All metadata key-value pairs are rendered here. Use this for campaign tracking, A/B variant labels, or any additional instruction to the agent that doesn't belong in the core intent.
There is no prompt template field. Clients control agent behavior through intent, inventory_item_id, and metadata. The agent is free to adapt its tone, greeting, and language based on the user's personality and the conversation history it already has.
Medication Reminders — a full worked example using Scheduled Reminders to drive a medication adherence program, including inventory schema design for medication items and multi-dose daily schedules.
Resource Inventory + Knowledge Base — how to design inventory schemas and push live data, powering the inventory_item_id linkage described above.
Memory-Aware Chat — how the agent remembers user responses from previous proactive conversations and incorporates them into future interactions.