Advance Time compresses real-world time into simulated time for an agent. Useful for character AI that needs in-game time to pass faster than real time, game loops that simulate days of agent state in seconds, or anywhere you want to see what the agent would be like after a period of elapsed time — without actually waiting for it.
Character AI / visual novel time skips — the protagonist sleeps for 8 hours; advance agent time by 8 hours and get the diary entry and mood changes that would have happened overnight
Tamagotchi and life-sim game loops — in-game days pass faster than real time; call advanceTime each tick to keep agent state (mood, memory, habits) in sync with the game clock
Tutorial onboarding — show a new user what their companion will "remember" after a week by fast-forwarding through a sample history before they send their first real message
Deterministic replay — reproduce the exact agent state after X hours at any time, for debugging, snapshotting, or building a save/load system
Eval and benchmarking — compress long-running scenarios into fast test runs (see Also useful for evaluation below)
A single advanceTime call runs the full production background worker fleet for each complete 24-hour day in the window, then resolves any proactive wakeups due within it. Concretely:
Diary generation — one diary entry per simulated day, written from the agent's perspective
Mood decay — emotional state drifts toward the agent's baseline at the rate it would in real time
Memory consolidation — facts, events, and commitments are consolidated and deduplicated as they normally would be overnight
Constellation extraction — personality signals extracted from conversation history are processed on schedule
Scheduled wakeups — any wakeup whose scheduled_at falls inside the advance window fires with its intent
Pass simulatedHours: 25 (one day plus a sliver) when you need the weekly consolidation gate to tick over.
Given the same agent state at the start, the same advanceTime call produces the same output. There is no randomness seeded from wall-clock time. This makes Advance Time suitable for save/load, replay, and regression testing.
For advances that would exceed a proxy read timeout (Cloudflare's limit is ~100 s, which corresponds to roughly 4–5 simulated days depending on agent complexity), pass runAsync: true. The API returns immediately with a job descriptor; poll getAdvanceTimeJob until the status is terminal.
// Kick off a long advance asynchronously
const job = await client.workbench.advanceTime({
agentId: "agent_abc",
userId: "user_123",
simulatedHours: 168, // one week
runAsync: true,
}) as { job_id: string; status: string };
console.log(job.job_id, job.status); // "job_01HX...", "running"
// Poll until done (30-minute TTL in Redis)
let state = await client.workbench.getAdvanceTimeJob(job.job_id);
while (state.status === "running") {
await new Promise(r => setTimeout(r, 2000));
state = await client.workbench.getAdvanceTimeJob(job.job_id);
}
console.log(state.status); // "succeeded"
console.log(state.result); // full AdvanceTimeResponse
The smallest meaningful unit is one full 24-hour simulated day. Background jobs (diary, consolidation, constellation) run once per day. Sub-day advances (e.g. simulatedHours: 8) still process wakeups and mood decay but will not generate a diary entry unless a full day boundary is crossed.
Any schedule whose next_fire_at falls within the advance window fires automatically. Advance 48 hours and two daily reminders will have fired — their intents processed, messages generated, and state updated — exactly as if real time had passed.
// Create a daily 09:00 reminderawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "UTC" }, intent: "check in on how the user is feeling", check_type: "reminder",});// Advance 48 hours — both 09:00 fires trigger inside the windowconst result = await client.workbench.advanceTime({ agentId: "agent_abc", userId: "user_123", simulatedHours: 48,});console.log(result.wakeups_fired); // 2
When time advances, a diary entry is generated for each simulated day. The agent "remembers" what happened during the gap — emotional tone, recurring themes, relationship developments — the same way it would after real days of conversation. Use this to give a new user a companion that already feels lived-in, or to let a character "grow" between chapters of a story.
Any wakeup scheduled with a scheduled_at inside the advance window fires during the advance, including its LLM-generated proactive message. This lets you test wakeup copy and timing without waiting for the real clock to reach the fire time.
Advance Time is a primitive that chains with scheduled reminders, wakeups, and memory. There is no standalone end-to-end tutorial yet. See the linked Mind Layer pages below for how it combines with other features.
If you are running a benchmark suite, advanceTime lets you compress long-running scenarios into fast test runs. Advance a simulated week in seconds, inspect the diary entries and mood state, then score the result. Pair with the evaluation workflow to measure agent behavior quality after arbitrary amounts of simulated elapsed time.
Agent Insights
As the agent talks to a user over time, it builds up a derived view of who they are — what they care about, what they're working toward, who's in their life, and how their mood trends. Agent Insights exposes that derived state as readable (and for some signals, writable) endpoints. These are not things you author; the context engine extracts them automatically from conversations.
Automatic — no setup required
All insight signals are produced by the context engine during and after each conversation. You do not need to call any write endpoint to populate them — they fill in on their own. The read endpoints on this page let you surface what the agent has learned.
Derived, not authored. These signals are extracted from conversation text by the context engine. You do not push them in; the agent surfaces them automatically as it talks.
Per-instance scoping. Pass instanceId (TS/Python) or instanceID (Go) to filter results to a specific agent instance — useful when an agent is deployed in multiple scenarios or chat contexts for the same user.
Write endpoints for some signals. Goals and habits can be explicitly created, updated, or deleted when your application needs to drive a specific state (e.g., seeding a goal when a user starts onboarding, or marking a goal achieved after a purchase event). Interests, relationships, diary, constellation, and breakthroughs are read-only.
Read latency. Derived signals update at conversation turn-end, not in real time during a turn. Reads immediately after a chat call may not yet reflect the latest turn.
Habits are recurring behaviors the context engine detects across conversations — things like "user meditates in the morning" or "user reviews their tasks every Sunday." Each habit has a strength (0-1) that rises with observations and a formed flag that is set once the habit is considered stable.
const habits = await client.agents.listHabits("agent_abc", {
userId: "user_123",
});
for (const h of habits.habits) {
console.log(h.name, h.category, h.strength, h.formed);
}
Goals represent what the user is working toward. They are extracted automatically from conversation intent — "I want to run a 5K by June" becomes a goal with a type, title, and priority. Goals have a status field: active, achieved, or abandoned.
// Read
const goals = await client.agents.listGoals("agent_abc", { userId: "user_123" });
for (const g of goals.goals) {
console.log(g.title, g.status, g.priority);
}
// Seed a goal for a new workflow
const goal = await client.agents.createGoal("agent_abc", {
userId: "user_123",
title: "Complete onboarding",
description: "Finish all onboarding steps",
type: "task",
priority: 1,
});
// Mark achieved after a business event
await client.agents.updateGoal("agent_abc", goal.goal_id, {
userId: "user_123",
status: "achieved",
});
Interests are topics and themes the context engine identifies as meaningful to the user — things like "machine learning", "hiking", or "Italian cooking." Unlike goals, interests have no lifecycle status; they accumulate over time.
const interests = await client.agents.getInterests("agent_abc", {
userId: "user_123",
});
for (const i of interests.interests) {
console.log(i.topic, i.category);
}
Relationships are the people the user mentions across conversations — friends, family, colleagues, and others the agent has learned about. Each entry includes the person's name, their relationship to the user, and any context the agent has collected.
const rel = await client.agents.getRelationships("agent_abc", {
userId: "user_123",
});
for (const r of rel.relationships) {
console.log(r.name, r.relationship_type, r.context);
}
The diary contains agent-authored entries written at session end — reflections on what happened, what was learned, and how the relationship is evolving. Each entry is anchored to a session and a timestamp. Diary entries are the richest narrative signal available.
const diary = await client.agents.getDiary("agent_abc", {
userId: "user_123",
});
for (const entry of diary.entries) {
console.log(entry.created_at, entry.content);
}
The constellation is the agent's knowledge graph for a user — a set of nodes (concepts, people, themes) and edges (relationships between them) that the context engine builds from recurring patterns across memory. Nodes have a significance score and a node_type.
const c = await client.agents.getConstellation("agent_abc", {
userId: "user_123",
});
for (const node of c.nodes) {
console.log(node.label, node.node_type, node.significance);
}
Breakthroughs are significant relationship or emotional milestones detected by the platform — moments where the agent's understanding of the user meaningfully deepened, or where a notable shift in the relationship dynamic was recorded.
const bt = await client.agents.listBreakthroughs("agent_abc", {
userId: "user_123",
});
for (const b of bt.items) {
console.log(b.type, b.description, b.timestamp);
}
With Memory — insights are summaries over raw facts
Insight signals are derived summaries; the underlying evidence lives in memory. Fetch habits to learn what patterns exist, then use memory.search to pull the raw conversation facts behind one of them.
const habits = await client.agents.listHabits("agent_abc", { userId: "user_123" });
const topHabit = habits.habits[0];
// Find the raw memories that support this habit
const facts = await client.agents.memory.search("agent_abc", {
userId: "user_123",
query: topHabit.name,
limit: 10,
});
console.log(`Found ${facts.results.length} facts supporting "${topHabit.name}"`);
With Emotions — mood + insights for a full user picture
getMood and these insight endpoints together form the agent's complete understanding of a user at a point in time. Fetch both to power a user-facing "how the agent sees you" view or a support dashboard.
Advance Time fast-forwards the context engine's processing — generating new diary entries, decaying mood, and updating derived signals — without waiting real time. This is useful for simulating what the agent would know after a period of elapsed time, and for testing insight endpoints against a populated state.
// Advance 7 days to populate diary entries and update insightsconst result = await client.workbench.advanceTime({ agentId: "agent_abc", userId: "user_123", simulatedHours: 168,});// Now read the insights that formed during that windowconst diary = await client.agents.getDiary("agent_abc", { userId: "user_123" });console.log("Diary entries after 7d:", diary.entries.length);
需要从 项目仪表板 取得
项目 API 密钥并准备好一个 agent ID。选你的客户端,粘贴片段就完成配置。
# 一行命令注册托管 MCP 服务器:
claude mcp add --transport http sonzai \
https://api.sonz.ai/mcp/memory/AGENT_ID \
--header "Authorization: Bearer $SONZAI_API_KEY"
# 然后在任何 Claude Code 会话中直接说:
# "Chat with agent 'Luna' and say 'I had a great day hiking today!'"
# "Search Luna's memories about hiking adventures"
# "Use mind-layer-setup with assistant_name 'Aria' …"
for await (const event of client.agents.chatStream({ agent: "agent-id", userId: "user-123", messages: [{ role: "user", content: "I'm having a rough day." }], maxTurns: 3,})) { if (event.type === "message_boundary") newBubble(); else renderDelta(event);}
不要传 instanceId。 大多数伴侣是一对一的;默认的每用户
作用域正是你需要的。
Custom State
Custom State is simple structured per-user data the agent can read and modify during conversations. Use it for counters, flags, or any state your product tracks per user. Unlike memory (which the platform extracts from conversation text), Custom State is data you write explicitly from your backend — and the agent sees it immediately.
Per Instance — Shared across all users in an instance. Use for environment configuration, agent status, or global event flags.
Per-User State
Per Instance + User — Scoped to one user. Use for energy, currency, progress, preferences, and any per-player data.
Instances
All states are scoped to an instanceId — one deployment context of your agent (e.g. a workspace or game world). Omit instanceId to use the default instance. See Instances for details.
When the agent has access to custom states, it reads current state at the start of each conversation via the get_custom_state tool — no prompt injection required. The agent can also update state during a conversation if you define a Custom Tool that calls your backend.
Use Custom State for primitives and simple objects. Reach for Inventory when items have their own identity, multiple typed fields, and a shared schema across users.
Upsert creates the state if the key doesn't exist, or replaces the value if it does. Idempotent — safe to call on every update cycle from your backend.
Return all states for an agent, optionally filtered by scope or user.
// All global states for an instance
const globals = await client.agents.customStates.list("agent-id", {
scope: "global",
instanceId: "workspace-1",
});
// All per-user states for a specific user
const userStates = await client.agents.customStates.list("agent-id", {
scope: "user",
userId: "user-123",
});
Define a tool that lets the agent trigger a state change from inside a conversation. Your backend executes the tool call and calls upsert to apply the new value.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "spend_energy", description: "Deduct energy from the user. Call when the user takes an action that costs energy.", parameters: { type: "object", properties: { amount: { type: "number", description: "Energy to deduct (1–50)" }, }, required: ["amount"], }, },]);// In your tool handler:// 1. Receive externalToolCall { name: "spend_energy", arguments: { amount: 10 } }// 2. Read current energy with getByKey// 3. Upsert the new value// 4. Return the result in the next chat message
With Inventory — when state is structured, use inventory
Custom State is the right tool for primitive values and simple flat objects: energy: 80, tier: "gold", onboarding_complete: true. When a piece of data has its own identity, multiple typed properties, and a shared schema across users — a medication, a stock holding, a pet — use Inventory instead.
Situation
Use
Single number or string per key
Custom State
A flag that is true/false
Custom State
A flat object with a few fields
Custom State
An item with a schema defined in the Knowledge Base
Custom State is persistent by default — it survives across sessions and is visible in every future conversation. If you need state that only exists for the duration of one conversation (a temporary form-fill context, a one-time confirmation token), scope it at the session level instead by passing it in the chat request's context fields rather than writing it as a Custom State.
Custom Tools let the LLM invoke functions during inference. Sonzai handles sonzai_-prefixed built-in tools automatically. Custom tools are defined by you and executed by your backend — Sonzai surfaces the call as a side effect in the SSE stream.
Using your own LLM?
If you use standalone memory mode (BYO-LLM), Sonzai exposes tool schemas you can wire into your agent framework (LangChain, Vercel AI SDK, Gemini function calling, etc.). See the Tool Integration guide for details.
AgentCapabilities includes a customTools field — a snapshot of the agent-level custom tools currently registered. Use get_capabilities() to read them, or use the dedicated list_custom_tools() / createCustomTool() methods (shown in the Full API section below) to manage them.
// Read agent capabilities — includes current custom tools
const caps = await client.agents.getCapabilities("agent-id");
console.log(caps.customTools); // CustomToolDefinition[] | null
// Register a new agent-level custom tool
await client.agents.createCustomTool("agent-id", {
name: "lookup_order",
description: "Look up an order by ID and return its status.",
parameters: {
type: "object",
properties: {
order_id: { type: "string" },
},
required: ["order_id"],
},
});
Inject tools dynamically for a specific session. Session tools merge with agent-level tools — same-name session tools take precedence. Discarded when the session ends.
When the LLM decides to call a custom tool, it appears as a side effect in the SSE stream. Your backend executes the tool and returns the result in the next message.
What you expose as tools differs sharply by use case — keep descriptions vivid and tightly scoped so the LLM invokes them naturally.
Tools are expressive actions. Things the character can DO in your app — emote, change outfit, move to a different scene, give a gift. Keep descriptions vivid so the LLM invokes them naturally.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "change_scene", description: "Move to a new location in the story. Use when the scene has run its course or a new chapter begins.", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] }, },]);
Don't include a handoff tool. Companions should never punt to a human — the relationship IS the product.
Define a tool that lets the agent trigger a state change from inside a conversation. Your backend executes the tool call and calls upsert to apply the new value.
await client.agents.sessions.setTools("agent-id", "session-id", [ { name: "spend_energy", description: "Deduct energy from the user. Call when the user takes an action that costs energy.", parameters: { type: "object", properties: { amount: { type: "number", description: "Energy to deduct (1–50)" }, }, required: ["amount"], }, },]);// In your tool handler:// 1. Receive externalToolCall { name: "spend_energy", arguments: { amount: 10 } }// 2. Read current energy with getByKey// 3. Upsert the new value// 4. Return the result in the next chat message
Agent-level tools persist across all sessions. Session-level tools are injected at runtime and discarded when the session ends — use them when the available tool set depends on the current screen, user role, or conversation context.
Your backend knows things the agent doesn't: a user just levelled up, an order shipped, a milestone was hit. TriggerEvent lets you push those signals to an agent and get a tailored reaction — no user message required. Dialogue lets you orchestrate two agents talking to each other, turn by turn, so you can build NPC conversations, run evaluation simulations, or script automated specialist hand-offs.
Both primitives use the same enriched context pipeline as regular chat — the agent draws on memory, personality, and mood when it responds.
Level-up celebrations — your game backend detects a rank change and fires a level_up event; the agent congratulates the user in its own voice
Daily summaries — a cron job fires a daily_summary event with session stats in metadata; the agent writes a personalised recap
Achievement unlocks — trigger a proactive message the moment a user hits a milestone, so the agent's enthusiasm lands while the moment is fresh
External state changes — order shipped, appointment confirmed, subscription renewed; the agent reacts to your system events rather than waiting for the user to ask
Fire a level_up event with structured metadata. The agent generates a reaction and the platform queues it for delivery through the same channels as other proactive messages.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const result = await client.agents.triggerBackendEvent("agent_abc", {
userId: "user_123",
eventType: "level_up",
eventDescription: "The user just reached level 25 — a major milestone in the game.",
metadata: {
new_level: "25",
previous_level: "24",
xp_total: "12500",
},
});
console.log(result.accepted); // true
console.log(result.event_id); // "evt_01HX..."
Dialogue is a per-agent call. To run a conversation between two agents, you orchestrate turns yourself: call agent A, append its response to the message history, call agent B with that updated history, and so on. Each agent independently draws on its own memory and personality.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Seed the conversation — agent_b opens with the first message
const messages = [
{ role: "user" as const, content: "Tell me something interesting about the ancient ruins." },
];
// Turn 1 — agent_a responds
const turnA = await client.agents.dialogue("agent_a", {
userId: "user_123",
messages,
sceneGuidance: "Two NPCs are exploring ancient ruins together. Keep responses under 3 sentences.",
});
messages.push({ role: "assistant", content: turnA.response });
// Turn 2 — agent_b responds to what agent_a said
const turnB = await client.agents.dialogue("agent_b", {
userId: "user_123",
messages,
sceneGuidance: "Two NPCs are exploring ancient ruins together. Keep responses under 3 sentences.",
});
console.log("Agent A:", turnA.response);
console.log("Agent B:", turnB.response);
EventType is free-form. There is no fixed enum. Common conventions used by tenants: "achievement", "daily_summary", "level_up", "order_shipped", "appointment_confirmed", "milestone". Pick names that are meaningful in your domain and stay consistent across your backend.
EventDescription is for the LLM. Write it as plain-English narration: "The user just cleared chapter 5 for the first time after 3 failed attempts." The agent's underlying model reads this and uses it to shape the reaction — be specific rather than terse.
Metadata is string-only. The metadata map accepts string → string pairs only. For nested or numeric data, either serialize into the event_description or flatten it with explicit keys ("xp_gained", "xp_total", "level_before", "level_after").
Messages field grounds the event in a prior conversation. If the event is closely tied to a conversation that just ended (for example, a daily_summary fired after a chat session), pass the recent messages. The platform uses them directly for context-sensitive generation — diary entries, summaries — instead of relying on lossy consolidation. Omit this field for cron-driven events that have no associated conversation.
TriggerEventResponse contains two fields:
accepted (bool) — whether the platform accepted the event for processing
event_id (string) — an opaque identifier for the queued event; store it if you want to correlate platform logs
Each call is per-agent. The dialogue method is scoped to a single agent: you pass an agentId and the current message history. To model a conversation between two agents, you manage the turn loop — append each response to the shared messages slice and alternate which agentId you call.
Messages carry the full context. Unlike chat, which manages conversation history server-side per session, dialogue expects you to pass the full message thread with every call. You control the window.
sceneGuidance steers both tone and constraints. Pass a brief instruction describing the scene and any constraints ("keep responses under 3 sentences", "the agents are rivals", "agent_a does not know about the treasure") so both sides stay in character.
requestType signals the call's purpose. An optional free-form tag ("npc_scene", "eval_round", "specialist_consult") that downstream analytics can use for filtering. Has no effect on generation.
DialogueResponse contains:
response (string) — the agent's generated text for this turn
side_effects — optional structured metadata emitted by the agent (tool calls, mood signals, etc.)
Proactive Messaging has three sources: Scheduled Reminders (recurring cadence), Wakeups (one-off timed), and TriggerEvent (your backend fires it when something happens). TriggerEvent is the push-based source you control directly — no schedule required, no timer running. When the event is accepted, the platform routes the generated reaction through the same delivery channels as the other two sources: SSE if the user has an active stream, the polling notifications API, or your registered webhook.
// Proactive triangle in code form:// Source 1 — recurring schedule (time-based)await client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Tokyo" }, intent: "morning check-in", check_type: "reminder",});// Source 2 — one-off wakeup (time-based)await client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "appointment_reminder", intent: "remind the user about their dentist appointment", delay_hours: 2,});// Source 3 — TriggerEvent (you push it when something happens)await client.agents.triggerBackendEvent("agent_abc", { userId: "user_123", eventType: "appointment_confirmed", eventDescription: "The user just confirmed their 3pm dentist appointment for tomorrow.",});
When a TriggerEvent fires immediately after a chat session — for example, a daily_summary event at session end — pass the recent conversation messages in the messages field. The platform uses them directly as conversation history for context-sensitive generation (diary entries, personality updates) instead of relying on condensed consolidation summaries. The agent's reaction then references what was actually said rather than a lossy reconstruction.
// After a chat session ends, fire a daily_summary event with the full message historyconst sessionMessages = [ { role: "user", content: "I finally finished that project I was stressing about." }, { role: "assistant", content: "That's huge! You've been working on that for weeks." }, { role: "user", content: "Yeah. Feels good. Think I'll take the evening off." },];await client.agents.triggerBackendEvent("agent_abc", { userId: "user_123", eventType: "daily_summary", eventDescription: "Session ended. User shared a work win and plans to rest.", messages: sessionMessages, // grounds the summary in what was actually said});
Run a judge agent and a subject agent in a dialogue loop to score the subject's responses without a real user. The judge poses questions, the subject answers, and you feed both transcripts to your evaluation rubric. This lets you evaluate agent quality at scale offline.
import { Sonzai } from "@sonzai-labs/agents";const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });const JUDGE_AGENT = "agent_judge";const SUBJECT_AGENT = "agent_subject";const USER_ID = "eval_run_001";const messages = [ { role: "user" as const, content: "I'm feeling really overwhelmed lately." },];// Subject responds to the user promptconst subjectTurn = await client.agents.dialogue(SUBJECT_AGENT, { userId: USER_ID, messages, requestType: "eval_round",});messages.push({ role: "assistant", content: subjectTurn.response });// Judge scores the subject's responseconst judgeTurn = await client.agents.dialogue(JUDGE_AGENT, { userId: USER_ID, messages, sceneGuidance: "You are evaluating the previous assistant response for empathy and clarity. " + "Return a JSON object with keys: score (0–100), feedback (string).", requestType: "eval_judge",});console.log("Subject:", subjectTurn.response);console.log("Judge verdict:", judgeTurn.response);// Then score the exchange through the evaluation APIconst evalResult = await client.agents.evaluate(SUBJECT_AGENT, { templateId: "empathy-rubric", messages,});console.log("Eval score:", evalResult.score);
const bio = await client.agents.generation.generateBio("agent-id", {
description: "A friendly barista who remembers every customer's order",
style: "warm and conversational",
});
console.log(bio.bio);
Inventory is the place to store structured per-user data the agent should know about. Each item belongs to a single agent × user pair and follows a schema defined in your Knowledge Base, so the agent always has typed, queryable data rather than free-form text. When the agent adds an item it searches the KB by description to resolve and link the right node automatically.
Add a medication to a user's inventory. The response includes an inventory_item_id (and the backward-compatible fact_id alias) you can use for direct updates or deletes later.
When action is "add", the platform performs a natural-language search of the KB using description. If exactly one node matches, the item is linked automatically and the response includes kb_resolution. If there are multiple close matches, the response returns status: "disambiguation_needed" and a candidates list — surface these to the user or pick the best kb_node_id and re-submit.
label vs description
label is an optional short display name shown in dashboards and agent tool calls (e.g. "Metformin"). description is the longer text the platform uses for KB natural-language search (e.g. "Metformin 500mg — biguanide for blood sugar control"). If label is omitted, the platform falls back to the first segment of description for display purposes.
Items belong to users — every item is scoped to agent_id × user_id; no item is shared across users
Schema-driven shape — item_type references a KB schema that defines the valid property fields; the platform validates writes against it
Two write paths for adding items — use inventory.create({...}) (dedicated route, no action field) for cleaner code when you specifically want to add; use inventory.update({action: "add", ...}) (explicit-action route) when you handle add/update/remove through a single call site. Both hit equivalent server logic.
label vs description — label is a short display name for dashboards and agent UI (e.g. "Ibuprofen"); description is the longer text the KB search uses to resolve the right node (e.g. "anti-inflammatory pain reliever, 400mg"). Both are optional but providing both gives the clearest results.
KB resolution — on add, Sonzai searches the KB by description; on ambiguous matches it returns candidates and status: "disambiguation_needed" so you can resolve before committing
Query modes — "list" returns raw items, "value" joins with live KB market data and computes gain_loss, "aggregate" returns totals and grouped sums without listing every item
inventory_item_id is the preferred identifier going forward. fact_id is included for backward compatibility — both refer to the same item and are interchangeable in all subsequent API calls (direct update, direct delete, schedule linkage).
When status is "disambiguation_needed", the response includes a candidates array instead of kb_resolution. Re-submit with the chosen kb_node_id set explicitly to bypass the search.
The item_type field points to a KB entity schema that defines which properties are valid for that type. Create the schema once; all inventory writes for that type are validated against it.
// 1. Define the schema in the KB once
await client.knowledge.createSchema("proj_abc123", {
entity_type: "medication",
fields: [
{ name: "dose_mg", type: "number", required: true },
{ name: "frequency", type: "string", required: true },
{ name: "with_food", type: "boolean", required: false },
],
});
// 2. Inventory writes for item_type "medication" are now validated
await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "medication", // <-- resolves to the schema above
description: "Metformin 500mg",
properties: { dose_mg: 500, frequency: "twice daily", with_food: true },
});
A schedule can reference an inventory_item_id. At each fire, the agent reads the item's current properties rather than a snapshot baked into the schedule definition. Updating the item's dosage automatically flows to the next reminder without touching the schedule itself.
// Add the item first
const { fact_id } = await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "medication",
description: "Metformin 500mg",
properties: { dose_mg: 500, frequency: "twice daily" },
});
// Reference it in a schedule — agent reads live properties at each fire
await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "America/New_York",
},
intent: "remind the user to take their medication",
inventory_item_id: fact_id,
});
With Memory — inventory state in conversation context
During a conversation the agent can query the user's inventory to answer questions like "what medications am I taking?" directly. Inventory writes also generate memory facts that surface in future sessions, so the agent can reference holdings and items across conversations without a manual query.
// Agent answers from inventory mid-conversation
for await (const event of client.agents.chatStream("agent_abc", {
userId: "user_123",
messages: [{ role: "user", content: "What medications am I on?" }],
})) {
// The agent calls sonzai_inventory internally to fetch the user's items
// and answers from live data — no extra code needed.
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Memory — how inventory writes surface in chat context
Knowledge Analytics
Knowledge Analytics layers a ranking system on top of the Knowledge Base. Rules define scoring signals — per-user affinity for recommendations, aggregate velocity for trends — and readers fetch ranked results at query time with a single call. The graph backbone supplies the nodes and edges; analytics rules decide how to score and order them. The result is a reusable ranking layer that powers product recommendations, trending dashboards, and conversion tracking without building a separate data pipeline.
Rule types — "recommendation" scores nodes per source (e.g. per user), returning a personalised top-N list. "trend" aggregates signals across all sources, returning global velocity rankings.
Config is rule-specific — the config object is a passthrough shape; its fields depend on the rule type and your scoring model. There is no fixed schema enforced by the SDK — pass whatever your rule implementation expects (e.g. target_entity_type, scoring, decay_factor).
Source and target semantics — recommendations take a source_id (typically a user node ID) and return ranked nodes of the target entity type. The source must exist as a node in the Knowledge Base graph.
Scheduled vs manual — rules can carry an optional cron schedule for batch recomputation (e.g. "0 * * * *" for hourly). Call RunAnalyticsRule at any time to trigger a manual run outside the schedule.
Feedback closes the loop — RecordFeedback writes a signal back against the source, target, and rule. Subsequent recomputation can weight nodes that historically converted higher, sharpening ranking over time. Use the action field to record fine-grained user intent: "converted" (user completed the action), "clicked" (user opened the recommendation), "dismissed" (user explicitly rejected it), or "ignored" (recommendation was shown but user did not interact). action: "converted" sets converted: true automatically so existing aggregate conversion queries continue to work without changes.
Record whether a recommended node was acted on. converted is a boolean — true means the user engaged with the recommendation. action is an optional string enum: "converted", "dismissed", "clicked", "ignored". Passing action: "converted" also sets converted: true for backward-compatible aggregate queries.
getStats(projectId)
KBStats
General KB statistics (node counts, document counts, extraction tokens).
Python keyword arguments
The Python SDK exposes get_recommendations, get_trends, get_trend_rankings, get_conversions, and record_feedback using keyword-only arguments after project_id. For example: client.knowledge.get_recommendations(project_id, rule_id="...", source_id="...", limit=10).
Analytics rules run over KB nodes and edges. Entity schemas define what types of nodes exist; rules score those nodes. The recommended pattern is to define your entity schema first, then create rules that target it.
With Inventory — per-user holdings drive per-user recommendations
Inventory writes create edges from a user node to the nodes they own. Those ownership edges flow into the recommendation model as affinity signals: items a user already owns inform which related nodes score highest.
// 1. User buys a product — record it in inventory
const { fact_id } = await client.agents.inventory.update("agent_abc", "user_123", {
action: "add",
item_type: "product",
description: "Razer DeathAdder V3",
properties: { purchase_date: "2026-04-01" },
});
// 2. The inventory write creates a user→product edge in the KB graph.
// The recommendation rule can now weight products related to the
// DeathAdder higher for this user.
const recs = await client.knowledge.getRecommendations(
projectId,
rule.rule_id,
"user_123",
5,
);
// recs.recommendations may now include accessories or similar peripherals
Agent Insights extract what users express interest in during conversations. Those interest signals can be passed into recommendation rule config as additional affinity weights, so a user who talks about budget peripherals gets different rankings than one who discusses high-end setups — without any explicit user input.
No dedicated Knowledge Analytics tutorial exists yet. The Knowledge Base tutorial covers schema setup and fact insertion — the prerequisite steps before creating analytics rules.
用户:"500 美元以下最好的卡牌是什么?"
|
v
智能体调用 knowledge_search("cards under 500")
|
v
知识库返回:Charizard($450,+12%)、Blastoise($380,+8%)
|
v
智能体:"Charizard Base Set 售价 $450,本月涨幅 12%——
500 美元以内的绝佳投资选择。"
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: "sk-..." });
await client.agents.memory.seed("agent-id", {
userId: "user-123",
memories: [
{ text: "User's name is Jane Smith", factType: "fact" },
{ text: "Jane is a senior product manager at Acme Corp", factType: "fact" },
{ text: "Jane lives in San Francisco and enjoys hiking", factType: "fact" },
],
});
Proactive messages — generated by recurring schedules, one-off wakeups, or tenant-triggered events — land in a per-user notifications queue the moment they fire. Your frontend or backend polls that queue to fetch pending messages, display them to the user, and mark each one consumed. No push infrastructure, no webhook endpoint, no server-side listener to maintain — just an HTTP GET on your schedule.
This is the recommended delivery pattern for web clients and mobile apps that can't accept inbound HTTP requests, and it doubles as a handy catch-up mechanism for users who were offline when messages were generated.
When a proactive message fires — whether from a schedule, a wakeup, or a trigger event — the platform enqueues it for the relevant user. The queue is per-user, per-agent. Calling list returns only messages in pending state; calling consume transitions a specific message to consumed. Consumed messages are excluded from future list responses but remain visible in history. The queue does not auto-expire: messages stay pending indefinitely until your code marks them consumed.
If the user has an active SSE chat stream open, proactive messages appear inline in the conversation automatically — no polling needed. Polling is the catch-up mechanism for users who do not have a live stream. The two patterns are complementary: SSE for foreground delivery, polling for background or offline users.
notifications.history is separate from notifications.list. It returns all historical notifications for an agent (including already-consumed ones) and is useful for audit trails, moderation dashboards, and debugging. It does not filter by user_id — it returns across all users up to the requested limit.
All methods are on client.agents.notifications.* (TS/Python) or client.Agents.Notifications (Go). Full request and response shapes live in the API reference.
Method
Signature
Returns
Description
list
list(agentId, { user_id?, limit? })
{ notifications: Notification[] }
Fetch pending messages for a user
consume
consume(agentId, messageId)
void
Mark a single message consumed
history
history(agentId, limit)
{ notifications: Notification[] }
Fetch all historical notifications (consumed + pending)
Pass this to consume to mark the message delivered
UserID
user_id
The user this notification was generated for
CheckType
check_type
The check type (e.g. "reminder", "interest_check", "birthday")
GeneratedMessage
generated_message
The actual text the agent produced — display this to the user
CreatedAt
created_at
When the message was enqueued (RFC 3339 UTC)
ScheduleID
schedule_id
Set if the message originated from a schedule; otherwise absent
WakeupID
wakeup_id
Set if the message originated from a wakeup; otherwise absent
Use the correct field names
Older code may use id, notificationId, type, or content. These are incorrect. The canonical fields are message_id, check_type, and generated_message. Using the wrong field names will result in silent failures when calling consume.
A schedule defines when the agent fires; polling is one way to receive what it produced. When a schedule's cadence fires, the platform generates the agent's message and enqueues it. Your client polls, displays generated_message, then calls consume to clear it from the queue. The schedule and delivery are fully decoupled — you can swap in webhooks or SSE without touching the schedule definition.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// 1. Create a daily 09:00 check-in schedule (done once, e.g. at onboarding)
await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "morning check-in on mood and sleep",
check_type: "reminder",
});
// 2. On each app foreground, poll for what the schedule produced
const pending = await client.agents.notifications.list("agent_abc", {
user_id: "user_123",
limit: 5,
});
for (const n of pending.notifications) {
showInAppBanner(n.generated_message);
await client.agents.notifications.consume("agent_abc", n.message_id);
}
A wakeup fires once at a specific moment; polling retrieves the message it generated. This is the natural delivery pattern for one-off agent outreach in mobile clients where webhooks are unavailable. Schedule the wakeup when the event is known (e.g. "follow up 24 hours after purchase"), then poll periodically — the message lands in the queue the moment the delay elapses.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// 1. Schedule a one-off wakeup (e.g. after a user completes onboarding)
await client.agents.scheduleWakeup("agent_abc", {
user_id: "user_123",
check_type: "interest_check",
intent: "check in about how onboarding went",
delay_hours: 24,
});
// 2. Poll for the message when it fires (24 h later)
const pending = await client.agents.notifications.list("agent_abc", {
user_id: "user_123",
limit: 5,
});
for (const n of pending.notifications) {
console.log(n.check_type, n.generated_message);
await client.agents.notifications.consume("agent_abc", n.message_id);
}
Polling and webhooks are two delivery patterns for the same underlying notifications queue. Choose based on your infrastructure:
Polling — your client asks the server for new messages on a schedule. Simple to implement, works in browsers and mobile apps, no inbound connectivity required. Latency is bounded by your polling interval.
Webhooks — the server pushes each message to a URL you register the moment it fires. Lower latency, better for server-to-server integration and multi-channel fanout (email, SMS, push notifications). Requires a public HTTPS endpoint to receive callbacks.
You can use both simultaneously: poll from mobile clients for in-app delivery and register a webhook on your backend for email/SMS fanout. The queue tracks consumed state per message, so a message consumed via polling will not appear in webhook delivery (and vice versa).
Medication Reminders — full-stack example combining Schedule + Inventory + Memory; shows the end-to-end flow from schedule creation to polling the generated reminder.
The organization-global Knowledge Base is an opt-in second scope that sits above every project's own Knowledge Base, letting agents across all projects under a tenant read shared facts — HR policies, brand standards, product catalogs, multi-game lore — without duplicating data per project. Each agent picks a scope mode (project_only, org_only, cascade, or union) to control how org and project graphs combine. Cascade is the recommended default: project facts win on ID collisions, so local overrides remain authoritative.
By default, the Knowledge Base is project-scoped. Every project has its own isolated graph. That is the right model for most tenants — a project's data should not leak into other projects' agents.
The organization scope is an opt-in second scope that sits above every project. Knowledge written here is readable by every project agent under the tenant that opts into a cross-scope reading mode. Typical uses:
Tenant (organization)
|
|-- Organization-global KB (scope_id = "")
| - policies, shared lore, brand, reference catalogs
| - written by tenant admins via the org endpoints
|
|-- Project A KB (scope_id = project_a_id)
| - A's own uploaded docs + API-pushed facts
|
|-- Project B KB (scope_id = project_b_id)
| - B's own uploaded docs + API-pushed facts
|
Agents under any project choose how to read across the two scopes:
- project_only legacy: just the agent's project KB
- org_only only the organization-global KB
- cascade both, project wins on ID collisions (recommended)
- union both, first occurrence wins
Every agent has a knowledgeBaseScopeMode capability. Leaving it unset preserves the legacy project-only behavior. To enable the cascade, set it via the capabilities endpoint or the dashboard.
Enable the knowledge base capability and set the project ID via the SDK:
// Enable the knowledge base + org cascade for the agent
await client.agents.updateCapabilities(agentId, {
knowledgeBase: true,
knowledgeBaseScopeMode: "cascade",
});
If a fact already lives in a project KB and you want to share it organisation-wide, promote it. The project copy is preserved — promotion is additive. If an org node with the same (node_type, norm_label) already exists, the server returns that one instead of writing a duplicate.
When an agent with a non-default scope mode calls knowledge_search during a conversation, the platform runs the search against both scopes in parallel and fuses the results using Reciprocal Rank Fusion (RRF). Each returned result carries a scope field so your prompt can show the LLM where a fact came from.
Scope modes differ in how they merge on a collision:
cascade (recommended): project wins on duplicate node IDs. Agents keep their own overrides, but inherit the org defaults when a project doesn't define something.
union: first occurrence wins; both scopes contribute equally to ranking. Useful when you want broad coverage without a strong preference.
org_only: skip project KB entirely. Useful for reference-only agents (FAQ bots on company policy, e.g.).
project_only (default): legacy behavior, org-scope facts are invisible to this agent.
Access control: the two org-scope write endpoints are gated by the same tenant-admin middleware used by the existing project-scoped KB endpoints. Standard project members see no new surface.
Backward compatibility: zero change for any existing agent. Agents stay on project_only mode unless you set a scope mode explicitly.
Idempotency: dedup is at (node_type, norm_label). Promotion returns the existing org node if one is already there; direct createOrgNode will create a second node with a different NodeID — check before calling if that matters.
Per-scope BM25: each scope maintains its own BM25 index and document-frequency corpus. This is why the cascade uses RRF instead of score-adding — the raw scores from two separate indexes are not directly comparable.
Priming is how you tell a new agent what it already knows about a user. Instead of waiting for the agent to learn through conversation, you deliver the relevant facts up front: who the user is, where they came from, and what they've said before — all before the first message is exchanged.
Migrations from other LLM frameworks — import chat history from Zep, Mem0, Letta, OpenAI Assistants, LangChain, Character.AI, or any custom transcript store
CRM / CSV bulk imports — prime thousands of users in one call with structured contact data
Chat-transcript seeding — let the agent "remember" previous conversations from another system
Display-name + timezone bootstrap — ensure the agent addresses users correctly from turn 1
Onboarding enrichment — load journal entries, support tickets, or prior interactions so the agent sounds familiar on the user's very first chat
Prime a single user with their display name, timezone, and a short narrative block:
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const job = await client.agents.priming.primeUser("agent_abc", "user_123", {
display_name: "Mia Tanaka",
metadata: {
timezone: "Asia/Tokyo",
company: "Acme Corp",
title: "Platform Lead",
email: "[email protected]",
},
content: [
{
type: "text",
body: "Mia joined Acme in 2023 and leads the platform team. She prefers async communication and is an avid coffee enthusiast.",
},
],
source: "crm_onboarding",
});
console.log(job.job_id, job.status, job.facts_created);
The call returns immediately with a job_id. LLM fact-extraction runs asynchronously in the background — the primed facts appear in memory within seconds.
These are two distinct channels for different kinds of information:
Metadata is structured and first-class: display_name, company, title, email, phone, timezone, and a custom map for anything else. Sonzai generates facts from metadata fields synchronously — no LLM extraction required — so facts_created is non-zero even with no content blocks.
Content is narrative. Content blocks go through the full LLM extraction pipeline and end up as facts in the agent's memory constellation, exactly as if the user had said those things in a conversation.
Narrative facts, bullet-point summaries, freeform notes about the user
"chat_transcript"
A prior conversation from another system. Format as User: …\nAgent: … lines, one session per block
The extraction pipeline deduplicates across all blocks — you can safely send both raw transcripts and pre-extracted facts from the same source without producing duplicate memories.
Calling primeUser more than once for the same user is safe. Content blocks are processed through the same deduplication pipeline as live chat — repeated or overlapping facts are merged, not doubled.
Content blocks flow through the exact same extraction pipeline as conversational messages. After priming, you can search for primed facts via memory.search:
// After primeUser completes, primed content is searchable
const results = await client.agents.memory.search("agent_abc", {
query: "platform team",
userId: "user_001",
limit: 5,
});
for (const mem of results.results) {
console.log(mem.content, mem.factType, mem.score);
}
Primed facts carry a source_type matching the source string you passed to primeUser or batchImport, so you can distinguish migrated history from organically-learned facts when querying.
Use structured_import inside primeUser to seed per-user inventory items alongside narrative facts. This is how you import ownership tables, subscription rosters, or product holdings from a CRM export:
The Migrations overview lists per-source recipes with full export + import code for every common origin system. Priming is the underlying mechanism each guide uses — the migration guides show you exactly how to shape your existing data into content blocks.
Proactive messaging is when the agent initiates contact rather than responding to user input. Messages can originate from three sources — a recurring schedule, a one-off wakeup, or an event your backend triggers — and are delivered through three channels: the live SSE chat stream, a polling notifications API, or a webhook your server receives.
Scheduled Reminders — recurring cadence (daily / weekly / hourly). Developer-configured. Use when a message must repeat on a predictable rhythm — medication reminders, habit nudges, daily check-ins.
Wakeups — a single one-off message at a specific moment, expressed as a delay from now. Agent- or developer-initiated. Use for birthdays, post-purchase follow-ups, or any event that fires exactly once.
Trigger Event — your backend calls TriggerEvent when something non-conversational happens (level-up, milestone, external state change). Use when the message is reactive to your own system events rather than time.
SSE (live chat stream) — if the user has an active chat stream open, the proactive message appears inline in their conversation automatically.
Polling (client.agents.notifications.*) — your frontend or backend polls the notifications API on a schedule. Works well for web dashboards and mobile apps that check for new content when they foreground.
Webhooks — register a URL once; Sonzai POSTs every proactive message to it. Use for push notifications, email/SMS fanout, or any server-to-server integration.
A schedule or wakeup can reference an inventory_item_id. At fire time the platform reads the item's current properties, so the agent always has up-to-date information — even if the item changed since the schedule was created.
// Schedule that reads live inventory data at every fireawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["08:00"] }, timezone: "Asia/Singapore" }, intent: "remind the user about their medication", check_type: "reminder", inventory_item_id: "inv_01HX...",});
When a proactive message triggers a user reply, the memory layer captures the exchange automatically. Query those memories later to build engagement or adherence dashboards.
// After firing reminders, search memory for user responsesconst memories = await client.agents.memory.search("agent_abc", { query: "medication taken", limit: 10,});
Scheduled Reminders let your agent message users on a schedule — daily, weekly, or every few hours. The platform handles timezones, DST, and quiet-hours automatically, and reads live structured data at fire time so messages always reflect current information. Use it for medication reminders, habit nudges, daily check-ins, or any time-based message you want the agent to initiate.
Create a daily 09:00 Asia/Singapore check-in. The response contains schedule_id, next_fire_at (UTC), and next_fire_at_local (in the schedule's timezone).
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const schedule = await client.schedules.create("agent_abc", "user_123", {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "check in on how the user is feeling",
check_type: "reminder",
});
console.log(schedule.schedule_id); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-04-22T01:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-04-23T09:00:00+08:00"
A cadence tells the platform when to fire. Two mutually exclusive shapes are supported: simple and cron. The simple shape covers most use cases through a frequency field with three options: "daily" fires at each listed times entry every calendar day; "weekly" fires on specified days_of_week at each listed time; "interval_hours" fires repeatedly at a fixed interval starting from starts_at (or schedule creation if omitted). All wall-clock times are evaluated in the schedule's timezone.
For advanced recurrence patterns, use the cron shape with a standard 5-field cron expression (e.g. "0 9 * * 1-5" for 09:00 on weekdays). The timezone field is required in both shapes — IANA names only (e.g. "America/New_York"), not UTC offsets.
The active_window field is a belt-and-braces filter layered on top of the cadence. The cadence computes when a fire would occur; the active window decides whether that fire actually produces a proactive message. Fires outside the window are skipped, not deferred — the cadence grid stays perfectly predictable and no backlog accumulates.
Both sub-fields are optional. When start is greater than end, the window wraps midnight — for example {"start": "22:00", "end": "06:00"} allows fires from 22:00 to 05:59 the next morning. This is useful for night-shift users or schedules targeting early-morning timezones where local midnight matters. Day membership is always evaluated in the schedule's own timezone, so a fire at 23:30 Friday Singapore time stays Friday even when stored as 15:30 UTC.
Pass inventory_item_id on the create (or update) body to link a schedule to a structured item in the user's inventory — a medication, a goal, a plant, anything with named properties. The key property of this linkage is that the platform reads the item's live properties at every fire, not at schedule creation time. This means updating a medication's dosage, a goal's target, or any other property is automatically reflected in the next reminder without any schedule edit. The schedule is the source of truth for when; the inventory item is the source of truth for what.
Use starts_at and ends_at (both RFC 3339 UTC) to constrain a schedule to a specific window of time. No fire is produced before starts_at; once ends_at passes, the schedule is automatically disabled — enabled flips to false. The schedule row is not deleted: the audit trail, historical fire log, and linked inventory reference remain accessible. This is a soft-disable, not a hard delete. To permanently remove a schedule and all associated fire history, use the delete method explicitly.
Every schedule can reference an inventory_item_id pointing to a structured per-user item (e.g. a medication, a goal, a plant). At each fire, the platform reads the item's live properties and injects them into the agent's wakeup block — no schedule edit needed when the data changes. This is how a "reduce ibuprofen from 500mg to 250mg" change flows through to the next reminder automatically.
// 1. Add an inventory item (e.g. a medication)const item = await client.agents.inventory.update("agent_abc", "user_123", { action: "add", item_type: "medication", description: "Ibuprofen", project_id: "proj_abc", properties: { medication_name: "ibuprofen", dosage: "500mg" },});// 2. Link the schedule to it — no duplicated dataawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["08:00", "20:00"] }, timezone: "Asia/Singapore" }, intent: "remind the user to take their ibuprofen at the correct dose", check_type: "reminder", inventory_item_id: item.fact_id,});// 3. Later, the dose changes — the next fire automatically sees "250mg"await client.agents.inventory.directUpdate("agent_abc", "user_123", item.fact_id, { properties: { dosage: "250mg" },});
With Wakeups — recurring vs one-off proactive messages
Schedules and Wakeups are both proactive primitives but serve different cases. Use a schedule when the agent should reach out on a repeating cadence (daily, weekly, every 4 hours). Use a wakeup when the agent should reach out once at a specific moment — a birthday, a known one-off event, or an agent-initiated interest check. Both feed into the same downstream delivery channels (SSE, polling, webhooks — see Proactive messaging).
// Recurring: Scheduleawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Singapore" }, intent: "morning check-in on mood and sleep", check_type: "reminder",});// One-off: Wakeupawait client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "birthday", intent: "wish user happy birthday on their 30th", delay_hours: 24,});
When the agent fires a scheduled reminder and the user responds ("took it, thanks"), the memory layer auto-captures the adherence fact. You can query these facts later to build a compliance view without adding a separate database — useful for tenant-side dashboards or escalation logic.
// After a week of firing daily medication reminders, query memory for responsesconst memories = await client.agents.memory.search("agent_abc", { query: "medication taken ibuprofen", limit: 10,});for (const result of memories.results) { console.log(result.content, result.score); // "User confirmed taking 500mg ibuprofen" 0.87}
Big Five 特征分数根据观察到的交互而更新,每天有上限以防止失控的偏移。重要时刻——智能体标记为"这很重要"的时刻——会获得额外权重。随时间累积的漂移被追踪,漂移嘈杂的对会得到更温和的更新,而稳定的对可以移动得更快。一个你看不到的变化:系统实际上学习对每位用户应该多积极地学习,在信号不稳定时减弱。
智能体之间 — 闭环公司大脑。 同一项目内的智能体会自主把已验证的事实写回到 知识库(启用 knowledgeBaseWrite)。智能体 A 与用户 X 学到的东西,下一次同样话题出现时(即便是在和另一位用户对话)会成为智能体 B 取出的有依据数据。整个项目每次会话都变得更敏锐,而不只是单一的一对。
Sessions are Sonzai's unit of consolidation: one continuous conversation between an agent and a user, identified by a session_id you control. When a session ends, the platform extracts facts from the transcript, tags each one with the originating session, and runs the memory pipeline — dedup, cluster, decay — before the next session begins. You can let the platform auto-manage sessions on every chat call, or call sessions.start and sessions.end explicitly when you need to register custom tools, replay historical transcripts, or pin boundary timing to a real-world event.
A session is one continuous conversation between an agent and a user, identified by a session_id you control. Sessions are Sonzai's unit of consolidation: when a session ends, the platform extracts facts from the transcript, tags every fact with its source session_id, and runs the memory pipeline (dedup, cluster, decay) before the next session begins.
Sessions are not a wrapper around individual messages — they're how Sonzai knows which messages belong together for extraction. A session can last seconds or days.
You always have a session
Every /chat call belongs to a session. If you don't start one explicitly, the platform creates one for you. Session IDs flow through to extracted facts either way — you never lose attribution.
Just call agents.chat without touching the sessions API. The platform creates a session on the first message, keeps it open while the conversation is active, and closes it automatically when the conversation goes idle. This is the right default for most apps.
Call sessions.start before the first message and sessions.end when the conversation is definitively over. Use this when you need to:
Register custom tools for a specific conversation (tool_definitions on sessions.start).
Control boundary timing — e.g. end a coaching call exactly when the user hangs up, not when the idle timer fires.
Replay historical transcripts — pass the full message list to sessions.end(messages=...) to ingest a canned conversation verbatim, which is how data migration and benchmarks work.
Scope memory extraction around a meaningful unit (a support case, a daily stand-up, a D&D game night).
1. sessions.start — Register session_id (+ optional tools); get ready to accept messages
2. agents.chat (× N) — Stream turns through the session; facts extracted inline
3. sessions.end — Close the session; triggers consolidation, dedup, diary, clustering
→ every extracted fact carries this session_id
If you skip step 1, the first agents.chat call will auto-register a session. If you skip step 3, the session closes on idle timeout (configurable per tenant).
Every fact Sonzai extracts carries its source session_id and source_id. You can use these to:
Reconstruct a conversation's memory footprint — "what did the agent learn from session X?" via GET /memory/timeline (grouped by session) or GET /memory/facts (filter client-side by session_id).
Score retrieval at session granularity — benchmarks like LongMemEval evaluate whether retrieved facts come from the correct source session.
Surface recency context — "conversations from last Tuesday" resolves via the session's created_at plus its attributed facts.
Facts that exist outside a specific conversation — agent-global wisdom, manually inserted facts, migrated priming content — carry empty session_id and are attributed through source_type instead (e.g. "manual", "agent_global").
Custom tool definitions can be scoped to a single session. Pass them on sessions.start, or update them mid-session via sessions.set_tools. Character-level (agent-wide) tools are always merged in — session tools layer on top for the duration of the session.
User Personas are templates your tenant defines for the kinds of users the agent will meet. When a persona is attached to a user — during priming or via conversation metadata — the agent reads it alongside its own personality and adjusts tone, vocabulary, and pace accordingly. A "skeptical beginner" gets gentler explanations and more confirmations; a "power user" gets concise, direct answers without hand-holding.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Create a persona
const persona = await client.userPersonas.create({
name: "Skeptical Beginner",
description: "First-time user who questions recommendations and needs reassurance.",
style: "Use plain language. Confirm before any irreversible action. Offer brief rationale for each suggestion.",
});
console.log(persona.persona_id);
// List all tenant personas
const { personas } = await client.userPersonas.list();
personas.forEach(p => console.log(p.name, p.is_default));
Tenant-scoped — personas belong to your tenant, not to a specific agent or user. Every agent in your tenant can reference the same persona library.
Template, not assignment — creating a persona does not apply it to anyone. You attach it during priming or pass it as metadata when starting a conversation.
Default persona — one persona per tenant can be marked is_default. The agent falls back to it when no persona is explicitly attached to a user.
Style field — an optional free-form directive layered on top of the agent's base personality prompt. Write it as a concise instruction set: tone, vocabulary level, confirmation habits, pacing.
Pass a persona reference when priming a new user so the agent adapts from the very first turn, before any conversation history exists.
const job = await client.agents.priming.primeUser("agent_abc", "user_123", {
display_name: "Jordan Lee",
metadata: {
persona_id: persona.persona_id, // attach persona at priming time
timezone: "America/New_York",
},
content: [
{ type: "text", body: "Jordan is a first-time user migrating from a competitor product." },
],
source: "onboarding",
});
With Personality — agent personality × user persona = interaction style
These two concepts are complementary and operate at different levels:
Personality is the agent's traits — Big Five scores, speech patterns, emotional range. It is fixed per agent (and evolves slowly through interactions).
User Persona is the user's type — a template describing what kind of person the agent is talking to. It shapes how the agent expresses its personality in this specific conversation.
Think of it as a matrix: a high-agreeableness agent talking to a "power user" persona stays warm but drops the hand-holding; talking to a "skeptical beginner" persona it adds more reassurance and simpler vocabulary — without the underlying personality changing.
Define a persona for each user archetype you care about, then run eval scenarios scoped to that persona. This gives you repeatable, deterministic test conditions.
// Define an eval scenario for the "Skeptical Beginner" persona
const result = await client.agents.evaluate("agent-id", {
templateId: "onboarding-rubric",
messages: [
{ role: "user", content: "I'm not sure I trust this — what happens to my data?" },
{ role: "assistant", content: "That's a fair question. Your data stays on our servers..." },
],
// Pass persona context so scoring reflects expected beginner-friendly tone
metadata: { persona_id: persona.persona_id },
});
console.log(result.score, result.feedback);
const audio = await client.agents.voice.tts("agent-id", {
text: "Hello! How can I help you today?",
voiceName: "aria",
language: "en",
outputFormat: "mp3",
});
// audio.data 包含音频字节
Wakeups let your agent reach out to a user exactly once at a known future moment. Give the agent an intent, a check_type that it sees as context, and a delay in hours — the platform handles delivery. Unlike Scheduled Reminders, which fire on a repeating cadence, a wakeup fires once and is done.
Typical use cases: birthday greetings, appointment reminders, post-event check-ins, interest follow-ups, and time-delayed nudges. If you need the agent to repeat the same outreach, use a schedule instead.
Schedule a birthday greeting for a specific date using scheduled_at. For a "N hours from now" wakeup, use delay_hours instead. If both are provided, scheduled_at takes precedence.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
// Use scheduled_at for birthdays/appointments with a known date
const wakeup = await client.agents.scheduleWakeup("agent_abc", {
user_id: "user_123",
check_type: "birthday",
intent: "wish the user a happy birthday",
scheduled_at: "2026-06-15T09:00:00Z", // RFC3339 absolute timestamp
occasion: "Sarah's 30th birthday",
interest_topic: "celebration and birthday traditions",
});
console.log(wakeup.wakeup_id); // "wake_01HX..."
console.log(wakeup.scheduled_at); // "2026-06-15T09:00:00Z"
delay_hours — a relative offset from the current moment (e.g. delay_hours: 24 fires tomorrow at roughly this time). The platform computes the absolute fire time at the moment the request is accepted. Use this for "N hours from now" semantics where no specific date matters.
scheduled_at — an RFC3339 absolute timestamp (e.g. "2026-06-15T09:00:00Z"). Use this for birthdays, appointments, or any event tied to a specific calendar date. The platform fires the wakeup as close to this time as possible.
If both are provided, scheduled_at takes precedence. scheduled_at in the response is always present and is the authoritative UTC time the wakeup will fire — store it if you want to show the user "your agent will reach out at X".
These optional context fields are included in the agent's wakeup block at fire time, giving it richer material for personalised message composition:
occasion — a short human-readable label for the event (e.g. "Sarah's 30th birthday", "dentist appointment"). The agent may reference this directly in the message.
interest_topic — a topic or theme the agent should lean on when composing the message (e.g. "celebration and birthday traditions", "dental health tips").
event_description — a longer free-form description with any additional context the agent should know (e.g. "User is turning 30 and has mentioned wanting to celebrate with a surprise party").
All three are optional and additive — provide as many or as few as are useful. The agent's underlying model uses them as soft context, not as a rigid template.
Both fields are free-form strings. The agent receives both as part of its wakeup context at fire time:
check_type is a short label that tells the agent the nature of the outreach ("birthday", "appointment_reminder", "interest_followup", etc.). Keep it lowercase and underscore-separated — it is machine-readable context, not a display string.
intent is a natural-language instruction to the agent describing what the message should accomplish. Write it as you would write a system instruction: "ask how the job interview went and whether they got an offer".
Neither field has a fixed enum — any string is valid. The agent's underlying model interprets them in context.
Fired; message delivered to the notification queue
cancelled
Cancelled before it fired
Once a wakeup reaches executed or cancelled it is immutable. To cancel a pending wakeup, call getWakeups to retrieve the wakeup_id, then cancel it via the API before scheduled_at passes.
Each call to scheduleWakeup creates exactly one future fire. If you need to re-schedule after a wakeup executes (for example, to send a birthday greeting every year), schedule a new wakeup the next time you learn the date. For repeating outreach on a fixed cadence, use Scheduled Reminders instead.
Schedules and Wakeups are complementary proactive primitives. The rule is simple: if the agent should reach out more than once on a predictable cadence, use a schedule. If the agent should reach out exactly once at a known moment, use a wakeup. Both feed into the same downstream delivery channels.
// Recurring: a daily morning check-in scheduleawait client.schedules.create("agent_abc", "user_123", { cadence: { simple: { frequency: "daily", times: ["09:00"] }, timezone: "Asia/Singapore", }, intent: "morning mood and sleep check-in", check_type: "reminder",});// One-off: a wakeup on the day of the user's birthdayawait client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "birthday", intent: "wish the user a happy birthday on their 30th", delay_hours: 48,});
A common pattern is to use both together: a recurring schedule for everyday outreach, and a wakeup for a special moment that doesn't fit the cadence.
The agent can read memory facts to decide when and what to schedule. For example, if a user mentions their anniversary date, the agent can search memory to retrieve that date and schedule a wakeup for the right moment. The wakeup then fires with the agent already knowing why it is reaching out.
// 1. User mentioned an upcoming anniversary — find it in memoryconst memories = await client.agents.memory.search("agent_abc", { query: "anniversary date", limit: 5,});// 2. Parse the date from the top result and compute delay_hoursconst anniversaryFact = memories.results[0].content;// e.g. "User's wedding anniversary is April 30"const hoursUntilAnniversary = computeHoursUntil("2026-04-30");// 3. Schedule a wakeup for that exact moment// Use scheduled_at for a known date, or delay_hours for "N hours from now"await client.agents.scheduleWakeup("agent_abc", { user_id: "user_123", check_type: "anniversary", intent: "wish the user a happy anniversary and ask how they are celebrating", scheduled_at: "2026-04-30T09:00:00Z", // the anniversary date occasion: "User's wedding anniversary", event_description: anniversaryFact,});
Because the agent has memory of the conversation in which the user shared the anniversary date, the wakeup message will feel naturally aware of the context — not generic.
When a wakeup fires, the generated message lands in the agent's notification queue. Your backend can consume it via SSE polling or a registered webhook. The event type is the same as any other proactive message; you don't need special handling for wakeup-originated messages vs schedule-originated ones.
// Poll for any pending proactive messages (wakeups or schedules)const notifications = await client.agents.notifications.poll("agent_abc", { user_id: "user_123",});for (const n of notifications) { console.log(n.content); // the agent's message text console.log(n.source_type); // "wakeup" | "schedule"}
See Webhooks & Notifications for webhook registration, signature verification, and SSE consumption patterns.
Register a webhook URL per tenant (or per project) and Sonzai will HTTP POST every proactive agent message to that URL with a signed payload. Each request includes a Sonzai-Signature header you verify with your signing secret before acting on the payload. Use webhooks for server-to-server delivery where you own the downstream routing — forwarding to FCM/APNs, sending via SendGrid or Twilio, writing to a case-management system, or fanning out to multiple channels at once.
Register a webhook URL to start receiving on_wakeup_ready events. Save the signing_secret from the response — it is only returned once.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const result = await client.webhooks.register("on_wakeup_ready", {
webhookUrl: "https://your-server.com/webhooks/sonzai",
authHeader: "Bearer your-webhook-secret",
});
// Store this securely — shown only once
console.log(result.signingSecret);
Webhooks are registered per event type. One URL per event type per tenant, or per project when using project-scoped registration. The same URL can handle multiple event types — inspect the event_type field on the payload to route accordingly.
Every POST Sonzai sends includes a Sonzai-Signature header in the format:
Sonzai-Signature: t=1714000000,v1=abc123def456...
t is the Unix timestamp of the request; v1 is the HMAC-SHA256 of {timestamp}.{raw_body} using your signing secret (with the whsec_ prefix stripped). Always verify the signature on the raw, unmodified request body before parsing JSON — do not use the parsed object for verification.
When your endpoint returns a non-2xx status or times out, Sonzai retries with exponential backoff. Make your handler idempotent — deduplicate on event_id (or a stable field in the payload body) so retried deliveries do not double-process.
Verify the Sonzai-Signature header before acting on any payload. The Go SDK ships a helper; TypeScript and Python use standard crypto primitives.
import crypto from "node:crypto";
/**
* Verify a Sonzai webhook signature.
* Call this on the raw request body string before parsing JSON.
*/
function verifyWebhookSignature(
rawBody: string,
signatureHeader: string,
secret: string,
): boolean {
// Strip whsec_ prefix if present
const key = secret.startsWith("whsec_") ? secret.slice(6) : secret;
// Parse header: t={timestamp},v1={sig}
const parts = Object.fromEntries(
signatureHeader.split(",").map((p) => p.split("=")),
);
const timestamp = parts["t"];
const receivedSig = parts["v1"];
if (!timestamp || !receivedSig) return false;
const expectedSig = crypto
.createHmac("sha256", key)
.update(`${timestamp}.${rawBody}`)
.digest("hex");
return crypto.timingSafeEqual(
Buffer.from(receivedSig),
Buffer.from(expectedSig),
);
}
// In your webhook handler (e.g. Express):
app.post("/webhooks/sonzai", express.raw({ type: "*/*" }), (req, res) => {
const sig = req.headers["sonzai-signature"] as string;
const rawBody = req.body.toString("utf-8");
if (!verifyWebhookSignature(rawBody, sig, process.env.SONZAI_WEBHOOK_SECRET!)) {
return res.status(401).send("Invalid signature");
}
const event = JSON.parse(rawBody);
// Forward to your channel...
res.status(200).send("ok");
});
Timestamp tolerance
The Go SDK rejects signatures older than 5 minutes by default. In TypeScript and Python implementations, add a timestamp check if you need to guard against replay attacks: compare parseInt(parts["t"]) * 1000 against Date.now() and reject if the difference exceeds 300 000 ms.
Webhooks and polling are two consumption models for the same proactive message queue. Webhooks push to your server in real time; polling lets your client or server fetch on demand. Use webhooks when you have a stable server endpoint and need instant delivery. Use polling when your client cannot accept inbound HTTP connections (mobile apps, browser clients) or when you want to batch-process notifications on your own schedule. Both see the same payload shape.
// Polling alternative — same messages, pulled instead of pushedconst pending = await client.agents.notifications.list("agent_abc", { userId: "user_123", status: "pending",});for (const notif of pending.notifications) { console.log(notif.generated_message); await client.agents.notifications.consume("agent_abc", notif.message_id);}
When a scheduled reminder fires, an on_recurring_event_due webhook delivers the generated message to your endpoint. Your handler can then forward to FCM, send an email, or post to Slack — all without polling. This separates the scheduling concern (when to fire) from the delivery concern (how to reach the user).
// Register once; every scheduled reminder fires this endpointconst result = await client.webhooks.register("on_recurring_event_due", { webhookUrl: "https://api.yourapp.com/webhooks/sonzai",});// In your handler, forward to the appropriate channel:// event.generated_message → FCM, email, SMS, Slack...
When a wakeup fires, the on_wakeup_ready event is POSTed to your registered endpoint. This is the primary webhook event for companion-style agents that reach out proactively. Register the webhook once and every future wakeup — automatic or manually scheduled — will arrive at your URL.
// Register to receive all future wakeup messagesawait client.webhooks.register("on_wakeup_ready", { webhookUrl: "https://api.yourapp.com/webhooks/sonzai",});// Your handler receives the wakeup message and forwards it:// event.generated_message → push notification// event.user_id → lookup device token in your DB// event.agent_id → identify which agent sent it
No dedicated webhook tutorial yet. The Scheduled Reminders tutorial covers the full proactive delivery pipeline and includes webhook-based consumption patterns.
// List runs
const runs = await client.evalRuns.list({ agentId: "agent-id" });
// Get a specific run
const run = await client.evalRuns.get("run-id");
// Reconnect to a streaming run
for await (const event of client.evalRuns.streamEvents("run-id")) {
console.log(event.type, event.message);
}
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "I had a great day hiking!" }],
userId: "user-123",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
仅限服务端使用
SDK 仅供服务端使用。永远不要在客户端代码中暴露 API 密钥。对于 Web 应用,通过你的后端代理请求。参见集成指南获取示例。
stream, err := client.Agents.ChatStream(ctx, agentId, sonzai.ChatRequest{ UserID: userId, Messages: []sonzai.Message{ {Role: "user", Content: "I had a great day hiking!"}, }, Language: "en",})// 读取流式事件for event := range stream { fmt.Print(event.Content)}
There are two complementary ways your agent can access Sonzai knowledge and memory:
Automatic (Recommended)
Call GET /context with a query param. The endpoint automatically searches the knowledge base and injects recalled memories. The deferred learning loop primes the next context call with KB results that the agent missed. No tool calling needed.
Explicit Tool Calling
Register Sonzai tools with your LLM so it can search on demand mid-conversation. This is for agent frameworks (LangChain, Vercel AI SDK, CrewAI) where the LLM decides when to search. You fetch tool schemas from Sonzai and wire them into your framework.
When to use which?
Start with automatic enrichment — it covers most cases with zero configuration. Add explicit tool calling when your agent needs to search mid-conversation (e.g., the user asks a question not covered by the initial context fetch) or when your framework expects tool definitions.
Fetch the tool catalog for an agent. This returns JSON schemas in OpenAI function-calling format that you can pass directly to your LLM's tool configuration.
Search the agent's knowledge base for relevant documents and facts. Uses hybrid search (BM25 + semantic) when embeddings are available, falling back to BM25 full-text search.
Search the agent's memory for previously extracted facts about a user. This is a synchronous BM25 full-text search that returns immediately — no deferred processing.
Unlike KB enrichment (which has a deferred path), memory search returns immediately from BM25 indexes. There is no async component. The /context endpoint already includes the most relevant memories automatically — this tool is for cases where the LLM needs to search for additional facts mid-conversation.
from langchain_core.tools import toolfrom langchain_google_genai import ChatGoogleGenerativeAIfrom langgraph.prebuilt import create_react_agentfrom sonzai import Sonzaisonzai_client = Sonzai(api_key="sk_your_api_key")agent_id = "agent-id"user_id = "user-123"@tooldef knowledge_search(query: str, limit: int = 5) -> list[dict]: """Search the agent's knowledge base for relevant documents and facts. Use when the user asks about topics that may be in uploaded documents.""" results = sonzai_client.agents.knowledge_search(agent_id, query=query, limit=limit) return [{"content": r.content, "label": r.label, "score": r.score} for r in results.results]@tooldef memory_search(query: str) -> list[dict]: """Search agent memory for previously learned facts about the user. Use when the conversation references past interactions or personal details.""" results = sonzai_client.agents.memory.search(agent_id, query=query, user_id=user_id) return [{"content": f.content, "type": f.fact_type} for f in results.results]# Get enriched contextctx = sonzai_client.agents.get_context( agent_id, user_id=user_id, session_id="session-abc", query=user_message)llm = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview")agent = create_react_agent(llm, [knowledge_search, memory_search])result = agent.invoke({ "messages": [ {"role": "system", "content": build_system_prompt(ctx)}, {"role": "user", "content": user_message}, ]})
The most powerful aspect of standalone mode is the self-improving learning loop. Even without explicit tool calls, the agent gets smarter each turn because /process detects knowledge gaps and primes the next /context call.
One-shot signals: Deferred KB results are consumed when /context reads them. They appear exactly once, preventing stale or repeated information.
TTL-based expiry: Deferred signals expire after 1 hour. If the user doesn't continue the conversation, stale signals are automatically cleaned up.
Deduplication: If the direct /context query matches the same KB document as a deferred signal, the duplicate is removed. You never get the same result twice.
Capped searches: /process runs at most 5 KB queries per call and stores at most 10 deferred results, preventing resource explosion on topic-heavy conversations.
Unlike KB enrichment, memory search has no deferred/async path. When /context is called, it recalls the most relevant memories immediately using the hierarchical memory tree and BM25 indexes. When you call GET /memory/search explicitly, results return immediately.
The deferred behavior only applies to knowledge base content, where /process proactively discovers KB documents the agent should have known about. Memory facts are always available synchronously because they are indexed at write time (during /process).
Not necessarily. /context automatically includes KB results and recalled memories. Tool calling is useful when the LLM needs to search for something specific mid-conversation that wasn't covered by the initial context fetch, or when your framework expects tool definitions.
No. Memory search is always synchronous. When you call GET /memory/search, results return immediately from BM25 indexes. The deferred/async flow only applies to knowledge base enrichment via the /process learning loop.
The deferred signals expire after 1 hour (TTL-based cleanup). No stale data persists. If the user resumes the conversation later, they get fresh results from the next /context call.
Absolutely. The Sonzai tool schemas are standard OpenAI function definitions. Mix them with your own tools in whatever framework you use. The LLM decides which tool to call based on the conversation.
Custom tools (created via POST /agents/{agentId}/tools or the dashboard) are for agent-side tool calling in Sonzai's managed chat mode. The tool schemas described here (/tools/schemas) are for BYO-LLM mode where your LLM calls Sonzai endpoints.
Configure an OpenAI-compatible API endpoint for your project. Sonzai routes all chat generation through your endpoint while handling everything else: context assembly, tool execution, side-effect extraction, memory storage, personality tracking, and consolidation.
Full Managed Experience
Built-in tools (web search, memory recall, image generation, inventory), streaming SSE, per-message side effects — everything works exactly as with our default providers.
Your Model, Your Control
Use fine-tuned models, self-hosted endpoints, or any OpenAI-compatible provider (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.).
Encrypted at Rest
Your API key is encrypted with AES-256 before storage. Only the first 8 characters are visible in the dashboard for identification.
Per-Project Configuration
Each project can have its own custom LLM endpoint. Toggle it on/off without deleting the config.
Custom LLM is the right choice when you want to use your own model but still want the full Sonzai experience (tools, streaming, per-message extraction). Standalone Memory is for when you need to control the entire chat loop yourself — e.g., for privacy preprocessing, data anonymization, or deep integration with an agent framework. See the Standalone Memory docs for the tradeoffs.
Once configured, here is what happens when a chat request is made:
Context assembly — Sonzai builds the 7-layer enriched context (personality, memory, mood, habits, goals, relationships, application state) exactly as with default providers.
Tool injection — Built-in tools (sonzai_memory_recall, sonzai_web_search, etc.) and any custom tools are added to the request.
Your endpoint called — The request is sent to your configured endpoint with your model name, API key, and the full message history including system prompt.
Streaming proxy — SSE chunks from your endpoint are streamed back to the client in real time.
Post-stream processing — After the stream completes, Sonzai extracts side effects (memory facts, mood changes, personality shifts, habits, tool calls) and stores them — same as with default providers.
Background tasks like fact extraction, memory consolidation, diary generation, and summarization automatically use the same model family you configured. Sonzai tracks the last-used provider/model for each agent and routes background LLM calls accordingly.
Custom LLM usage is billed at a flat per-token rate under the custom_llm billing model, regardless of which actual model your endpoint serves. Sonzai tracks input/output tokens from your endpoint's usage response. Your own endpoint costs (API fees, compute) are entirely yours.
These run after the user-facing reply is streamed, on the
post-processing model map — a per-project config that maps the
chat-completion model to the smaller model the extractor should use.
When extraction needs to run for a chat that used claude-3-5-sonnet,
the extractor uses Gemini Flash Lite. When it sees a chat model not in
the map, the * wildcard kicks in.
The wildcard key is exported as sonzai.PostProcessingWildcardKey (Go)
and the equivalent constant in the other SDKs so you don't have to
hard-code "*" in your provisioning scripts.
The wildcard is enough for most projects. Reach for an explicit entry
when:
A particular chat model produces output the default extractor mishandles
(e.g. tool-call traces from a verbose model that need a stronger
extractor to keep facts atomic).
You're A/B-ing two extractors and want one chat model to route through
each for comparison.
Cost: cheaper chat models can run a cheaper extractor; flagship chat
models may warrant a stronger extractor on the same trace.
Provider availability
An entry's provider/model must match a real provider Sonzai has
configured for your project — see Providers.
Setting a non-existent provider here makes extraction fail
asynchronously after the user-facing reply has already streamed; you'll
see it in the agent's extraction_status on the next turn.
Providers — the chat-completion provider list (independent of post-processing).
Self-improvement — the full picture of what the extractor does on each turn.
Reference → API — REST endpoint shapes for the project-config get/set/delete calls.
Providers
Sonzai routes chat completions through one of four providers. The IDs
are exported as constants from the sonzai.providers module in the
SDKs — import those rather than hand-typing strings, so they stay in
sync as the catalog evolves. Use client.list_models() for the live
set enabled on your tenant at runtime.
Default gpt-5.5; the 5.4 family is the cheaper workhorse and 5 / 5-mini /
5-nano cover even cheaper or smaller-context tiers. The fallback chain on
quota exhaustion is gpt-5.5 → gpt-5.4 → gpt-5.4-mini → gpt-5.
Model
Context window
Use it when
gpt-5.5
1.05M
Default. The current OpenAI frontier — vision + tools + streaming + JSON mode.
gpt-5.4
1.05M
Cheaper than 5.5, same context window.
gpt-5.4-mini
1.05M
The cheap workhorse. Recommended for high-throughput tenants.
gpt-5
400k
Frozen Aug-2025 snapshot. Kept for tenants pinned to it; new agents should default to 5.5.
Reasoning and non-reasoning variants in the Grok 4 family.
grok-4-1-fast-non-reasoning is the default; reasoning models are
opt-in for tasks that benefit from deeper chain-of-thought.
Model
Context window
Reasoning
grok-4-1-fast-non-reasoning
2M
No
grok-4-1-fast-reasoning
2M
Yes
grok-4.20-0309-non-reasoning
2M
No
grok-4.20-0309-reasoning
2M
Yes
All Grok 4 entries support streaming, tools, and JSON mode. None support
vision today.
Point Sonzai at any OpenAI-compatible chat-completions endpoint. The
Mind Layer keeps owning memory, personality, mood, and post-processing —
only the chat-completion call gets routed through your endpoint.
See Custom LLM for the full setup. This is
distinct from BYOK — BYOK uses Sonzai's
provider integrations but with your billing key; BYOM uses your own
inference stack entirely.
client.list_models() (Python / TS / Go expose the same shape) returns
the live set of providers and models enabled on your tenant — useful for
building a model-picker UI or for asserting that a provider you depend on
is wired up before a deploy.
const result = await client.listModels();
for (const p of result.providers) {
console.log(p.provider, p.models.map((m) => m.id));
}
Custom LLM — point Sonzai at your own endpoint entirely.
Model scope — how provider / model is resolved per call.
Post-processing — what runs in the background, on what model.
Model scope
A Sonzai chat turn picks two models: the chat-completion model the
user sees, and the post-processing model that runs the background work
afterwards. Each goes through its own resolver cascade. The cascades
share the same scope hierarchy:
1. per-call (highest precedence — passed to agents.chat / sessions.start / agents.process)2. per-agent (AgentProfile fields)3. per-project (project_config rows in CockroachDB)4. per-account/tenant (account_config rows in CockroachDB)5. system default (Go constant compiled into the binary)
First non-empty layer wins. Layer 5 always exists, so resolution always
produces a concrete answer.
The cheaper-model fleet that runs the batch work behind every turn:
fact extraction, dedup, mood updates, personality drift, summarisation,
diary, constellation. Resolved per task, per turn, independently of the
chat model.
One frontier model per agent, one cheap extractor per project. Set
agent ModelConfig to your premium model; set the project
post-processing map's * wildcard to gemini/gemini-3.1-flash-lite-preview.
A/B test extractors. Two projects, same agents, different
account_config.post_processing_model_map entries — compare quality on
the same traffic.
Per-tenant pricing tiers. Free tier defaults the post-processing
map to flash-lite at the tenant level; paid tier overrides per-project
to a stronger extractor.
One-off override. Pass provider/model on a single
agents.chat call without persisting anything.
Fetches the 7-layer enriched context: personality, mood, relevant memories, active goals, habits, relationship state, and proactive signals. Pass a query matching the current topic for best memory recall.
const ctx = await session.context({ query: "What should we talk about?" });
// ctx is a flat object — no nested envelope. Useful fields:
// personality_prompt — agent identity / system instructions
// bio, speech_patterns — agent identity bits
// true_interests, true_dislikes
// big5, dimensions, preferences, behaviors
// recent_personality_shifts, significant_moments, active_goals, habits
// current_mood, emotional_state
// loaded_facts — recalled facts (each has atomic_text, fact_type, importance)
// long_term_summaries — multi-session digests
// proactive_memories — pending proactive signals
// constellation_patterns — deeper behavioral patterns
// relationship_narrative, chemistry_score, love_from_agent, love_from_user
// knowledge.results — KB hits for the query (only nested key)
// recent_turns — buffered messages from this session
// backend_context — custom application state (if set)
POST /agents/{agentId}/sessions/{sessionId}/turn — sync mood update inline (~300–500ms), deeper extraction continues in the background (5–15 seconds). Accepts role: "tool" and tool_calls on assistant messages.
const { mood, extraction_id, extraction_status } = await session.turn({
messages: [
{ role: "user", content: userMessage },
// intermediate tool calls/results here
{ role: "assistant", content: assistantMessage },
],
// provider/model fall back to the session-level defaults; both are optional.
});
If you can predict the next user query (or just want to pre-warm with a generic query), pass fetchNextContext on .turn() and the server returns an enriched context inside the same response under next_context. This eliminates one roundtrip on the next render.
const { mood, next_context } = await session.turn({
messages: [...],
fetchNextContext: { query: "any query you'd run on the next turn" },
});
// next_context has the same shape as session.context() — use it directly
// to render the system prompt for the next turn without calling /context.
Send a full transcript and run extraction immediately. Auto-creates a session if sessionId is omitted; the response surfaces the auto-generated session_id.
const result = await client.agents.process("agent-id", {
userId: "user-123",
// sessionId omitted — auto-created
messages: [
{ role: "user", content: userMessage },
{ role: "assistant", content: assistantMessage },
// tool messages allowed too
],
provider: "gemini", // optional
model: "gemini-3.1-flash-lite-preview", // optional
});
console.log(result.session_id); // auto-generated when not passed
console.log(result.facts_extracted); // count of facts extracted this call
console.log(result.side_effects); // { mood_updated: true, ... summary counts }
// Then read the extracted state back via the dedicated endpoints:
const memory = await client.agents.memory.list("agent-id", { userId: "user-123" });
const mood = await client.agents.getMood("agent-id", { userId: "user-123" });
The response is intentionally a small summary — { success, facts_extracted, side_effects, session_id }. To inspect the extracted facts/personality/mood/habits themselves, call the dedicated read endpoints (see Reading Behavioral Data below).
Closes the session. If you call this withoutmessages (after using /turn or /process), it's a finalize-only call. If you call it withmessages and skipped /process, this becomes your extraction trigger — functionally equivalent to /process, but lifecycle-scoped and async-capable on tenants where enabled.
// Just close — no extraction needed if you used /turn or /process already.
await session.end({ totalMessages: 12, durationSeconds: 600 });
// OR — pass messages here as the extraction trigger (Option B).
await session.end({
messages: transcript,
totalMessages: transcript.length,
durationSeconds: 600,
});
Both /turn and /process accept OpenAI/Anthropic-style tool messages. Sonzai's extractor reads tool results and can capture facts that only appeared in tool output.
{ "messages": [ { "role": "user", "content": "Where did my last order ship from?" }, { "role": "assistant", "tool_calls": [ { "id": "call_1", "type": "function", "function": { "name": "order-lookup", "arguments": "{\"limit\":1}" } } ] }, { "role": "tool", "tool_call_id": "call_1", "content": "{\"order_id\":\"42\",\"origin\":\"Tokyo\",\"carrier\":\"DHL\"}" }, { "role": "assistant", "content": "Your last order shipped from Tokyo via DHL." } ]}
The extractor will surface a fact like "User's last order (#42) shipped from Tokyo via DHL" — a fact that never appeared in the user's or assistant's own text.
The Context Engine schedules proactive outreach (check-ins, follow-ups) based on conversation patterns. Poll for pending notifications and consume them when delivered.
const notifications = await client.agents.notifications.list("agent-id");
for (const notif of notifications) {
await deliverToUser(notif.user_id, notif.message);
await client.agents.notifications.consume("agent-id", notif.message_id);
}
Atomic facts (preferences, events, commitments) with importance scoring, deduplication, and topic tagging. Sourced from user, assistant, AND tool messages.
Personality Deltas
Big5 trait shifts (openness, conscientiousness, extraversion, agreeableness, neuroticism) with reasoning.
Mood Changes
4D mood delta (valence, arousal, tension, affiliation). Sync mood lands inline on /turn; richer extraction is deferred.
Habit Detection
New and reinforced behavioral patterns — exercise routines, reading habits, social patterns.
Interest Tracking
Topics the user engages with, categorized by domain with confidence and engagement scores.
Relationship Dynamics
Love score changes with reasoning — tracks rapport, trust, and emotional connection.
Proactive Outreach
Scheduled check-ins and follow-ups based on conversation context (e.g., 'ask about the hike tomorrow').
When calling /turn or /process, specify which of our LLM providers to use for extraction. Omitting provider/model falls back to the platform default gemini-3.1-flash-lite-preview.
There are three ways to feed conversations into Sonzai. The first two are batch (you send a transcript after the conversation); the third is real-time (you submit each turn as it happens). Pick exactly one per conversation — chaining them runs extraction twice on the same messages.
A. /process — one-shot batch
Single call. Auto-creates a session if you don't pass one. Best for external LLM transcripts, benchmarks, and any flow without a long-lived session lifecycle.
B. sessions.start → end({ messages }) — lifecycle batch
Open a session, do your full conversation off-platform, then close with the transcript on .end(). Use when you want explicit session boundaries, async polling, or session-scoped tools — but still ingest in one shot.
C. sessions.start → turn() × N → end() — real-time
Open a session and submit each exchange via .turn() as the conversation happens. Sync mood lands inline (~300–500ms); deeper extraction runs asynchronously 5–15s later. Best for chat companions, voice AI, and agent frameworks.
A. /process
B. sessions.end({ messages })
C. sessions.turn() × N
Calls per conversation
1
2 (start + end)
2 + N (start + N × turn + end)
Sonzai in the hot path?
No
No
Yes — .context() and .turn() flank each turn
Context per turn
Pre-session only (optional getContext call)
Pre-session only (optional getContext call)
Fresh, query-specific via .context()
Extraction timing
Whole transcript, inline
Whole transcript, inline (or async on tenants where enabled)
Per-turn — sync mood inline, deeper extraction 5–15s later
A and B are functionally equivalent for fact extraction — both extract facts and side-effects from the full transcript inline. The only differences are lifecycle ergonomics (B gives you an explicit session and supports async polling) and call count.
C is a different shape: Sonzai is part of every turn instead of seeing the conversation only at the end.
Don't mix shapes within one conversation
Calling .turn() per turn (C) and.end({ messages }) with the same transcript (B) extracts the same messages twice. Pick one shape per conversation. The pattern docs below show C and B/A separately.
/turn, /process, and sessions.end are intentionally lightweight. They extract facts and a session summary from the transcript and persist them — that's it. The expensive work (cross-session dedup, clustering, diary deepening, decay) is scheduled automatically by the platform and is rate-limited so it doesn't run on every call.
Deep consolidation (wakeup/habit dedup, decay, cluster reconcile, weekly summaries)
Daily / weekly
Automatic schedule
Heavy
This means you can call /turn per turn (Pattern 1), or /process once at the end (Pattern 2), without paying for heavy consolidation each time. The platform de-duplicates and consolidates in the background.
Practical implication
Don't try to "save calls" by skipping /turn between turns. Each call only does sync mood + queues deferred extraction (cheap). Skipping it means losing per-turn behavioral signal. The expensive consolidation runs on its own schedule no matter how many times you call.
When you call session.context({ query }) (or GET /context), the endpoint searches the agent's knowledge base and includes matching results in a knowledge field automatically.
{ "personality_prompt": "You are a helpful AI companion...", "big5": { "openness": 0.7, "conscientiousness": 0.6, "extraversion": 0.5, "agreeableness": 0.8, "neuroticism": 0.3 }, "current_mood": { "valence": 0.4, "arousal": 0.2, "tension": -0.1, "affiliation": 0.3 }, "loaded_facts": [{ "atomic_text": "User prefers morning workouts", "fact_type": "behavioral", "importance": 0.8 }], "active_goals": [{ "description": "Run a 5K by June" }], "habits": [{ "label": "Daily exercise" }], "knowledge": { "results": [ { "content": "Refund policy: customers can request a full refund within 30 days...", "label": "Refund Policy", "type": "policy", "source": "policies.pdf", "score": 0.92 } ] }}
After /turn or /process extracts side effects, it also searches the KB with topics found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals — the nextsession.context() call includes them automatically.
Turn 1: session.context() → (no KB results yet)
↓
chat with your LLM
↓
session.turn() → extracts "hiking gear" as topic
→ searches KB, finds "Hiking Equipment Guide"
→ stores as proactive signal
Turn 2: session.context() → includes "Hiking Equipment Guide" from KB
+ any direct search results for the new query
↓
chat with your LLM (now knows about hiking gear!)
Want to use your own model without managing the chat loop? Consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint while keeping streaming, built-in tools, and per-message extraction fully automatic.
Managed mode calls built-in tools (web search, memory recall, image generation) automatically. In standalone mode you must implement tool calling yourself — the tool-calling loop is yours, but the resulting tool messages flow into /turn or /process for extraction. See the Tool Integration guide.
session.context(), /turn, and /process are synchronous request-response calls. Streaming is handled by your own LLM. Background extraction is asynchronous but you poll for state, not stream.
You must pick one of the three integration shapes per conversation: /process (one-shot batch), sessions.start → sessions.end({ messages }) (lifecycle batch), or sessions.start → session.turn() per turn → session.end() (real-time). Picking none means the transcript is never seen by the Context Engine and no behavioral data is captured. Picking two — for example calling .turn() per turn and passing messages on .end() — runs extraction twice on the same content. (Heavy consolidation runs on its own schedule and doesn't need to be triggered manually.)
Sonzai's extraction reads messages as text. Multimodal content (images, audio) must be bridged to text before submission — see Working with Images & Multimodal Input in Pattern 1.
What's the same in both modes
Extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood, habits, and consolidation. The 7-layer enriched context from session.context() is the same data the managed chat builds internally.
Pattern 1: 记忆中间件(实时)
You control the LLM. Sonzai handles what that LLM knows about the user.
Open a Session once. For every turn: call session.context({ query }) to pull the enriched user profile, build your system prompt, call your own LLM (with your own tools), then call session.turn({ messages }) to submit just the new exchange. Sync mood updates inline (~300–500ms); deeper extraction (facts, personality, habits) lands asynchronously 5–15 seconds later in the background.
This is the same data model mem0 provides (relevant memories injected before generation), extended with personality evolution, mood tracking, habit detection, goal tracking, proactive outreach scheduling, and relationship dynamics.
session.context() and sessions.start use no Sonzai LLM credits — they are pure reads. session.turn(), /process, and sessions.end({ messages }) use Sonzai's LLM for fact extraction + session summary (light, per-call, billed). Heavy background work — cross-session dedup, clustering, diary, decay — runs on auto-scheduled jobs (8h post-session, daily, weekly) and is billed against the same tenant but not per-call. Your chat LLM is entirely your cost.
Open the session once with your provider/model defaults. Then for every turn: get context → call your LLM (running tool calls in your own loop) → submit the turn. End the session when done.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function runConversation(agentId: string, userId: string) {
const sessionId = `session-${Date.now()}`;
const history: { role: string; content: string }[] = [];
// Open a Session handle. agentId/userId/sessionId and provider/model
// defaults live on the handle so you don't repeat them on every call.
const session = await sonzai.agents.sessions.start(agentId, {
userId,
sessionId,
toolDefinitions: yourTools, // optional — register session-scoped tool schemas
provider: "gemini", // optional — default for .turn()
model: "gemini-3.1-flash-lite-preview", // optional — default for .turn()
});
async function turn(userMessage: string): Promise<string> {
// Fresh enriched context for this specific message
const ctx = await session.context({ query: userMessage });
// Your LLM — swap in any provider you like
let reply = await yourLLM.chat({
system: buildSystemPrompt(ctx),
messages: [...history, { role: "user", content: userMessage }],
tools: yourTools,
});
// Tool-calling loop is entirely yours — Sonzai is OUT of the loop here.
const toolMessages: any[] = [];
while (reply.tool_calls?.length) {
for (const call of reply.tool_calls) {
const result = await runYourTool(call);
toolMessages.push(
{ role: "assistant", tool_calls: [call] },
{ role: "tool", tool_call_id: call.id, content: result },
);
}
reply = await yourLLM.chat({
system: buildSystemPrompt(ctx),
messages: [...history, { role: "user", content: userMessage }, ...toolMessages],
tools: yourTools,
});
}
sendToUser(reply.content); // send first; don't block on Sonzai
// Submit just the new turn. Sync mood ~300ms, deferred extraction
// (facts, personality, habits) runs asynchronously 5–15s later.
// Pass the FULL exchange — including tool calls and tool results —
// so Sonzai can extract facts from tool outputs too.
const { mood, extraction_id } = await session.turn({
messages: [
{ role: "user", content: userMessage },
...toolMessages, // assistant tool_calls + tool results
{ role: "assistant", content: reply.content },
],
});
history.push({ role: "user", content: userMessage });
history.push({ role: "assistant", content: reply.content });
return reply.content;
}
return { turn, end: () => session.end() };
}
// The /context response is a flat object — there is no nested
// `profile` / `behavioral` / `memory` envelope.
function buildSystemPrompt(ctx: any): string {
const facts = (ctx.loaded_facts ?? []).map((f: any) => `- ${f.atomic_text}`).join("\n");
const goals = (ctx.active_goals ?? []).map((g: any) => g.description).join(", ");
return `${ctx.personality_prompt ?? "You are a helpful AI companion."}
Personality (Big5): ${JSON.stringify(ctx.big5 ?? {})}
Current mood: ${JSON.stringify(ctx.current_mood ?? {})}
Active goals: ${goals || "none"}
Relevant memories:
${facts || "none yet"}`;
}
The single most important habit in Pattern 1 is calling session.context(query=user_msg)before every LLM call. This is the load-bearing piece that closes the loop — without it, the LLM doesn't get the fresh mood (which lands inline on .turn()) or the freshly-extracted facts (which land 5–15 seconds after .turn()).
while (conversationActive) {
const userMsg = await getUserInput();
// 1. PULL FRESH CONTEXT — happens every turn, before the LLM call.
// ctx is a flat object — no `profile` / `behavioral` / `memory` envelope.
const ctx = await session.context({ query: userMsg });
// 2. Build system prompt from the context layers
const systemPrompt = renderPromptFromContext(ctx);
// 3. Run YOUR LLM — Sonzai is OUT of the loop here
const reply = await yourLLM.chat({
system: systemPrompt,
messages: [...history, { role: "user", content: userMsg }],
});
// 4. Submit the just-completed turn — sync mood + async deferred extraction
await session.turn({
messages: [
{ role: "user", content: userMsg },
{ role: "assistant", content: reply.content },
],
});
}
function renderPromptFromContext(ctx: any): string {
const parts: string[] = [];
if (ctx.personality_prompt) parts.push(ctx.personality_prompt);
if (ctx.big5) parts.push(`Personality (Big5): ${JSON.stringify(ctx.big5)}`);
if (ctx.speech_patterns?.length) parts.push(`Speech patterns: ${ctx.speech_patterns.join(", ")}`);
if (ctx.current_mood) parts.push(`Current mood: ${JSON.stringify(ctx.current_mood)}`);
const facts = (ctx.loaded_facts ?? []).slice(0, 5).map((f: any) => `- ${f.atomic_text ?? ""}`).join("\n");
if (facts) parts.push(`Relevant memories:\n${facts}`);
const kb = (ctx.knowledge?.results ?? []).slice(0, 3).map((r: any) => `- ${r.label}: ${(r.content ?? "").slice(0, 120)}`).join("\n");
if (kb) parts.push(`Knowledge base:\n${kb}`);
return parts.join("\n\n");
}
Save a roundtrip with fetchNextContext
session.turn() accepts a fetch_next_context={"query": next_user_message} argument (TS: fetchNextContext). When set, the server runs the deferred extraction trigger AND fetches the next /context payload in the same response, returning it under next_context. This eliminates the second roundtrip on the next turn — your client already has the context for turn N+1 by the time turn N has finished. Use this when you can predict the next user query (e.g., for the very next render of context).
Context freshness. Mood updates inline on each .turn() call (~300ms), so the very next .context() reflects the new mood. Personality / facts / inventory land 5–15 seconds after .turn() in the background, so they appear within a turn or two of being mentioned.
Why per-turn. State changes between turns. A user mentioning a new pet on turn 3 means turn 4's context should carry that fact. Skipping .context() between turns means the LLM works from stale state — and the value of a memory layer collapses.
Pass the actual user message as query.session.context() uses the query for memory recall, KB search, and proactive signal selection. Passing the raw user message gives the most relevant pull; passing a static placeholder gives generic context regardless of what the user asked.
The /turn schema accepts OpenAI/Anthropic-style tool messages: role: "tool" for tool results and tool_calls arrays on assistant messages. Pass the entire intermediate exchange — Sonzai's extractor reads tool results and can capture facts that only appeared in tool output (e.g. "user's last order shipped from Tokyo" from an order-lookup tool).
await session.turn({
messages: [
{ role: "user", content: "Where did my last order ship from?" },
{
role: "assistant",
tool_calls: [{ id: "call_1", type: "function", function: { name: "order-lookup", arguments: "{}" } }],
},
{
role: "tool",
tool_call_id: "call_1",
content: '{"order_id":"42","origin":"Tokyo","carrier":"DHL"}',
},
{ role: "assistant", content: "Your last order shipped from Tokyo via DHL." },
],
});
/turn returns immediately after the sync mood pass. The deeper extraction runs asynchronously and reaches done in 5–15s. You can poll the status if you need to gate something on it:
const { extraction_id } = await session.turn({ messages });
// Optional — only poll if you need to wait for facts/personality before doing something
let status = await session.status(extraction_id);
while (status.state !== "done" && status.state !== "failed") {
await new Promise((r) => setTimeout(r, 1000));
status = await session.status(extraction_id);
}
Pattern 1 hands the tool-calling loop entirely to you. Sonzai never executes a tool — but it does read tool calls and tool results out of the messages you submit on /turn, so the extractor can capture facts that surfaced inside a tool output. There are two flavors of tools you'll typically wire up.
Use whatever your agent framework provides — @function_tool in the OpenAI Agents SDK, tools= on Anthropic, function declarations on Gemini, @tool in LangChain. The pattern is the same: register the tool with your LLM, run the tool-calling loop on your side, and forward the full exchange (including the assistant's tool_calls message and the role: "tool" result message) to session.turn().
When the assistant says "It's 7:30 AM" and the user replies "Set my morning standup for 8", Sonzai's extractor sees the tool's actual output, not just the assistant's paraphrase — and can capture "user prefers 8 AM standups" with the right grounding.
You can also wrap Sonzai's own REST endpoints as tools your LLM can call mid-turn. The two most useful are knowledge base search and memory search — both let the LLM pull additional context on demand without you having to inject everything up-front through session.context().
// TypeScript — agents.memory.search is available directly
import { Sonzai } from "@sonzai-labs/agents";
import { tool } from "ai";
import { z } from "zod";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const kbSearch = tool({
description: "Search the agent's knowledge base.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
const res = await sonzai.agents.knowledgeSearch("agent-id", { query, limit: 5 });
return res.results.map((r) => `- ${r.label}: ${r.content}`).join("\n") || "No matching knowledge.";
},
});
const memorySearch = tool({
description: "Search the user's long-term memory.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
const res = await sonzai.agents.memory.search("agent-id", {
query,
user_id: "user-123",
limit: 5,
});
return res.results.map((r) => `- ${r.text}`).join("\n") || "No matching memories.";
},
});
Why expose Sonzai endpoints as tools?
session.context() returns the most relevant facts for the current query — a strong default. Exposing kb_search and memory_search as tools lets the LLM decide for itself when to dig deeper (e.g., when the user asks "what did I tell you last week about X?"). It's especially useful for agent frameworks that already think in terms of tools.
When the LLM calls these tools, the result lands in your tool-calling loop just like any other tool. Forward the full exchange to session.turn() and Sonzai's extractor will see the search results too — but be aware that re-extracting facts from a memory_search tool result can create echoes (the user's own past fact resurfaces as if it were new). Either skip extraction for those tool messages on your side, or trust the dedup pass.
Sonzai's memory pipeline is text-based today. The /turn and /process endpoints accept string content only — DialogueMessage.content is string. Your LLM can be fully multimodal (Gemini, Claude, GPT-4o all accept image URLs and audio natively) but to get image-related facts into Sonzai you need to bridge the multimodal content into text in the messages you send to /turn.
The recommended pattern is dual-output: have your vision-capable LLM produce both (a) the warm reply you show the user and (b) a hidden [MEMORY: ...] line with a detailed factual description. Strip the [MEMORY: ...] line out before showing the user, and embed it in the bridged text you submit to Sonzai.
import OpenAI from "openai";
import { Sonzai } from "@sonzai-labs/agents";
const gemini = new OpenAI({
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
apiKey: process.env.GEMINI_API_KEY!,
});
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const SYSTEM_PROMPT_IMAGE_AWARE = `You are a friendly companion. When the user shares an image, respond warmly
to what's emotionally important to THEM.
After your reply, ALWAYS include a single line:
[MEMORY: <detailed factual description of the image — setting, objects,
people, mood, time of day, what the user appears to be doing>]
The user does NOT see the [MEMORY: ...] line.`;
async function processImageTurn(session: any, userMsg: string, imageUrl: string): Promise<string> {
const result = await gemini.chat.completions.create({
model: "gemini-3.1-flash-lite-preview",
messages: [
{ role: "system", content: SYSTEM_PROMPT_IMAGE_AWARE },
{
role: "user",
content: [
{ type: "text", text: userMsg },
{ type: "image_url", image_url: { url: imageUrl } },
],
},
],
});
const raw = result.choices[0].message.content ?? "";
// Split the dual output
const m = raw.match(/\[MEMORY:\s*([\s\S]+?)\]/);
const memoryNote = m ? m[1].trim() : "";
const reply = raw.replace(/\[MEMORY:[\s\S]+?\]/, "").trim();
sendToUser(reply);
await session.turn({
messages: [
{ role: "user", content: `${userMsg}\n\n[Image attached: ${memoryNote}, URL: ${imageUrl}]` },
{ role: "assistant", content: reply },
],
});
return reply;
}
Why this pattern:
No backend multimodal yet./turn accepts string content. Text-bridging through your same vision-capable LLM is the cleanest workaround.
Why dual-output (vs. a separate vision call). The same LLM call serves both purposes — no extra cost, no extra latency, no second roundtrip. You're already paying for vision on the assistant turn; let it produce the description too.
Why a hidden line. Keeps user-facing replies emotionally warm — "Oh you have such nice shoulders!" — while still capturing the factual detail (gym, tank top, mirror, time of day) that memory extraction needs.
It's a developer pattern, not a Sonzai field. The [MEMORY: ...] convention is yours to define. Sonzai just sees text. You can use any sentinel — <<MEM>>...<</MEM>>, JSON, whatever your prompt and parser agree on.
Including the URL. Embedding the URL in the bridged text isn't required, but it lets Sonzai later surface the image as a memory artifact ("the photo you shared last week") without re-running vision on the image. Your app keeps using its own image storage; Sonzai just remembers the link as text.
Audio & voice follow the same pattern
Speech-to-text (STT) on your side, send the transcript in messages. Text-to-speech (TTS) is rendered after the assistant text exists, so you forward the assistant text to session.turn() exactly as you would for a text-only chat. See the Voice AI use case below.
Future direction
Sonzai may extend the /turn schema to accept OpenAI's multimodal content blocks directly (content: [{type: "text"}, {type: "image_url"}]) with platform-side vision extraction, removing the manual bridging step. Today, text-bridging via the dual-output pattern is the supported approach.
The canonical Pattern 1 example. You bring your own agent harness — here the OpenAI Agents SDK — and route it at Gemini via the OpenAI-compat endpoint, so no OPENAI_API_KEY is ever used. Sonzai sits outside the LLM/tool-calling loop entirely: it supplies the system prompt via session.context() and ingests the finished transcript via session.turn(). The Agents SDK does all multi-step reasoning and tool dispatch on your side; Sonzai does memory.
import osfrom openai import AsyncOpenAIfrom agents import ( Agent, Runner, OpenAIChatCompletionsModel, function_tool, set_tracing_disabled,)from sonzai import Sonzai# The Agents SDK ships traces to OpenAI by default — disable, since we# have no OpenAI key and aren't talking to OpenAI's servers at all.set_tracing_disabled(True)# Point the Agents SDK's AsyncOpenAI client at Gemini's OpenAI-compat URL.gemini = AsyncOpenAI( base_url="https://generativelanguage.googleapis.com/v1beta/openai/", api_key=os.environ["GEMINI_API_KEY"],)model = OpenAIChatCompletionsModel( model="gemini-3.1-flash-lite-preview", openai_client=gemini,)# Sonzai = memory layer only. It never sees the LLM client.sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])session = sonzai.agents.sessions.start( "agent-id", user_id="user-123", session_id="session-abc",)@function_tooldef get_current_time() -> str: """Return the current time.""" from datetime import datetime, timezone return datetime.now(timezone.utc).isoformat(timespec="seconds")while True: user_msg = input("You: ") if not user_msg: break # 1) Pull enriched context (mood, personality, relevant facts, …) from Sonzai. ctx = session.context(query=user_msg) mood = ctx.get("current_mood") or "neutral" instructions = f"You are a friendly companion. Current mood: {mood}." # 2) Run the Agents SDK loop — it handles tool-calling and multi-step reasoning. agent = Agent( name="Companion", instructions=instructions, model=model, tools=[get_current_time], ) result = Runner.run_sync(agent, user_msg) print(f"Assistant: {result.final_output}") # 3) Convert the run's items (assistant text + ToolCallItem + ToolCallOutputItem) # into Sonzai's tool-aware messages format. See the demo for the implementation. sonzai_messages = run_result_to_sonzai_messages(user_msg, result) # 4) Submit the turn. `mood` comes back inline (~300ms); facts / personality / # inventory are extracted asynchronously and land 5-15s later. turn_result = session.turn(messages=sonzai_messages) print(f" -> mood updated: {turn_result.mood}")session.end()
What's happening on each turn:
Sonzai is out of the LLM loop. The OpenAI Agents SDK runs the model, dispatches tools, and produces result.final_output. Sonzai never sees the LLM client and has no opinion on which model answered.
Mood is real-time.session.turn() returns fresh mood inline in ~300ms — you can render it the moment the response arrives.
Facts, personality drift, and inventory are deferred (5-15s). They run async under the returned extraction_id. Re-poll agents.memory.list_facts, agents.personality.get, etc. on the next turn; whatever didn't land yet will be there shortly.
Tool calls flow through to extraction. Sonzai's tool-aware message format accepts assistant messages with tool_calls plus a tool message carrying the result. The conversion helper packages the Agents SDK's ToolCallItem + ToolCallOutputItem into that shape so extraction can pick up facts from tool outputs too.
Want a working version? See the OpenAI Agents companion demo — a two-pane Streamlit app showing live mood, Big5, recent facts, inventory, and the constellation graph as you chat.
STT → enrich → LLM → TTS. Sonzai holds the memory; you own the audio pipeline. Submit the turn while TTS is synthesizing — sync mood is fast enough not to block, and deferred extraction never blocks.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processVoiceTurn(
session: any, // Session handle from sonzai.agents.sessions.start
audioBuffer: Buffer
): Promise<Buffer> {
// Your STT
const transcript = await yourSTT.transcribe(audioBuffer);
// Inject memory into a concise voice-friendly system prompt
const ctx = await session.context({ query: transcript });
const systemPrompt = `${ctx.personality_prompt ?? "You are a voice companion."} Keep replies under 2 sentences for voice.
Mood: ${JSON.stringify(ctx.current_mood)}.
Key memory: ${ctx.loaded_facts?.[0]?.atomic_text ?? "none"}.`;
const reply = await yourLLM.chat({ system: systemPrompt, message: transcript });
// Submit the turn while TTS synthesizes (run in parallel)
const [audioResponse] = await Promise.all([
yourTTS.synthesize(reply),
session.turn({
messages: [
{ role: "user", content: transcript },
{ role: "assistant", content: reply },
],
}),
]);
return audioResponse;
}
Sonzai injects user context into the agent's system prompt. The framework handles tool calling, multi-step reasoning, and memory of the current conversation; Sonzai handles what the agent knows about the user across sessions. Send the full transcript including any tool messages to session.turn() so extraction can pick up facts from tool results.
import { ChatOpenAI } from "@langchain/openai";
import { SystemMessage, HumanMessage, AIMessage } from "@langchain/core/messages";
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const llm = new ChatOpenAI({ model: "gpt-4o", tools: yourToolSchemas });
async function agentTurn(
session: any,
userInput: string,
messageHistory: (HumanMessage | AIMessage)[]
): Promise<string> {
const ctx = await session.context({ query: userInput });
const messages = [
new SystemMessage(buildSystemPrompt(ctx)),
...messageHistory,
new HumanMessage(userInput),
];
// Run the agent's full tool-calling loop on your side, then surface
// every intermediate message (assistant tool_calls + tool results)
// to Sonzai so it can extract from them.
const { reply, intermediate } = await runLangchainAgent(llm, messages);
await session.turn({
messages: [
{ role: "user", content: userInput },
...intermediate,
{ role: "assistant", content: reply },
],
});
return reply;
}
Route to different models based on task type while Sonzai stitches user memory across all of them. The Session-level provider/model default is just a default — every .turn() can override.
Endpoint walkthrough — full reference for sessions.start, context, turn, process, end, and read endpoints
KB & limitations — knowledge base behavior in standalone mode and what's not supported
Pattern 2: 会话后批处理
You own the entire conversation. Sonzai never sees it in real time. When the conversation ends, you send the full transcript to either /process or sessions.end({ messages }). Sonzai extracts facts, updates the user's behavioral profile, and makes the insights available via the API — ready for personalization, analytics, push notifications, or next-session context.
This pattern is ideal when Sonzai being in the hot path is undesirable (or impossible) — latency-sensitive real-time interactions, apps with their own LLM loop already in production, or cases where you want to process transcripts in bulk after the fact.
/process and sessions.end({ messages }) are functionally equivalent for batch ingest — both extract facts and side-effects from the full transcript inline. Don't do both for the same transcript or extraction runs twice. Use /process if you want a single call (it auto-creates the session and surfaces the generated session_id in the response). Use sessions.start + sessions.end({ messages }) if you want explicit lifecycle, async polling, or session-scoped tools.
Option A — /process only. One call. Auto-creates a session if you don't pass one. Returns the auto-generated session_id so you can correlate later.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string; tool_calls?: any[] }[]
) {
const result = await sonzai.agents.process(agentId, {
userId,
messages: transcript, // tool messages allowed
provider: "gemini", // optional override
model: "gemini-3.1-flash-lite-preview", // optional override
});
// result.session_id is the auto-created session id when none was passed.
// Read the extracted facts/mood/etc. via the dedicated endpoints below.
return result;
}
Option B — Explicit sessions.start + sessions.end({ messages }). Use this when you want async processing, session-scoped tools, or explicit lifecycle ownership.
async function processTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string }[]
) {
const sessionId = `session-${Date.now()}`;
const session = await sonzai.agents.sessions.start(agentId, { userId, sessionId });
// Pass the full transcript on end — extraction happens here, not via /process.
// sessions.end({ messages }) is functionally equivalent to /process({ messages }).
const result = await session.end({
messages: transcript,
totalMessages: transcript.length,
});
return result;
}
Pick one. The two options are equivalent for fact extraction — chaining them just runs extraction twice on the same messages.
Before the session, pull the student's profile to personalize the curriculum. After the session, extract what was learned and generate targeted practice exercises. One call to /process is enough.
Pull the user's fitness context before the workout for a personalized greeting. After the workout, send the session log to Sonzai to track habits, mood, and progress — without Sonzai ever being in the real-time exercise loop.
Your sales team runs calls through their existing tooling (Gong, Zoom, your own recorder). After each call, send the transcript to Sonzai to build a persistent customer profile.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function processSalesCall(
agentId: string,
customerId: string,
callId: string,
callTranscript: { role: "user" | "assistant"; content: string }[],
durationSeconds: number
) {
// Use the explicit lifecycle so we can pass durationSeconds.
const session = await sonzai.agents.sessions.start(agentId, {
userId: customerId,
sessionId: `call-${callId}`,
});
const result = await session.end({
messages: callTranscript,
totalMessages: callTranscript.length,
durationSeconds,
});
// Read extractions back from the analytics endpoints.
const personality = await sonzai.agents.personality.get(agentId);
// ...build CRM update from result + dedicated read endpoints
return result;
}
Your app handles the journaling conversation. After each session, send to Sonzai to track mood trends, detect emotional breakthroughs, and surface proactive insights.
import { Sonzai } from "@sonzai-labs/agents";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
async function afterJournalingSession(
agentId: string,
userId: string,
journalTranscript: { role: "user" | "assistant"; content: string }[]
) {
await sonzai.agents.process(agentId, { userId, messages: journalTranscript });
const [mood, notifications] = await Promise.all([
sonzai.agents.getMood(agentId, { userId }),
sonzai.agents.notifications.list(agentId),
]);
if ((mood?.valence ?? 0) < -0.4) {
await sendWellnessAlert(userId, {
message: "It sounds like you're going through a tough time. We're here for you.",
});
}
for (const notif of notifications) {
if (notif.user_id === userId) {
await scheduleReminder(userId, notif.generated_message, notif.scheduled_for);
}
}
await updateMoodDashboard(userId, { valence: mood?.valence, energy: mood?.arousal });
}
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const agent = await client.agents.generation.generateAndCreate({
name: "Luna",
description: "Luna is a warm, creative dreamer who speaks poetically. She loves stargazing, coffee shops at 2am, and asking the question beneath the question.",
language: "en",
});
console.log(agent.agent_id);
console.log(agent.personality); // full Big5 profile derived from the description
await client.agents.priming.primeUser("agent-id", "user-123", {
displayName: "Sam",
interests: ["astronomy", "lo-fi music", "photography"],
context: "Sam is a night-owl grad student who tends to overthink. They came to Luna after a tough week.",
});
// Poll periodically (or register a webhook).
const pending = await client.agents.notifications.list("agent-id", {
userId: "user-123",
status: "pending",
});
for (const n of pending.notifications) {
// Render n.content in your UI; mark consumed when shown.
await client.agents.notifications.consume("agent-id", n.notificationId);
}
// Trigger from your backend when something notable happens
await client.agents.triggerBackendEvent(AGENT_ID, {
userId: USER_ID,
eventType: "task_complete",
payload: {
task_name: "Q1 Revenue Analysis",
deliverable: "Revenue Report",
category: "Analytics",
time_taken: "3h 42m",
},
});
// Next time the user opens a conversation:
// Agent: "I see you finished the Q1 Revenue Analysis! That report is a key
// deliverable. Want to discuss the findings or start the next task?"
// Delete by key (finds and removes the state)
await client.agents.customStates.deleteByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
// Or delete by state_id if you already have it
await client.agents.customStates.delete(AGENT_ID, stateId);
等待翻译. This page is currently in English pending Chinese translation. See CONTRIBUTING for translation workflow.
This tutorial walks through a full medication-reminder implementation: define a medication entity type in your knowledge base, seed medications per user, create a Scheduled Reminder linking each medication to a cadence, and the agent proactively messages the user at the scheduled time — naming the medication and dosage in its own voice.
Tenant-agnostic primitive. The Sonzai platform has no medication-specific code. This tutorial wires two generic primitives — Inventory and Scheduled Reminders — into a medication use case. The same pattern works for watering plants, exercise reminders, bill payments, or any recurring-with-structured-data use case.
This is not a medical device. Reminders are a user-experience feature, not a clinical safety mechanism. Do not rely on Sonzai scheduled reminders as the sole adherence path for patients where missed doses cause harm.
Create a schema for the medication entity type so the platform knows how to store and index each drug's properties. The name and ndc_code fields are indexed for fast lookup; dosage, instructions, and prescribed_by are stored but not indexed (they are fetched whole at fire time).
You only need to create the schema once per project. All subsequent medication items written for any user will be validated and indexed against this definition.
Insert one medication into the user's inventory using inventory.create. Store the returned inventory_item_id — you will pass it to the schedule in the next step.
Create a twice-daily schedule at 08:00 and 20:00 Asia/Singapore, with active_window.hours set as a belt-and-braces quiet-hours guard. Pass the inventory_item_id returned in step 2. The platform will fetch the live item properties at every fire — no re-registration required when the dosage changes.
const schedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "Asia/Singapore",
},
active_window: {
hours: { start: "07:00", end: "22:00" },
},
intent: "remind the user to take their ibuprofen at the correct dose",
check_type: "reminder",
inventory_item_id: inventoryItemId,
metadata: { reminder_category: "medication" },
});
const scheduleId = schedule.schedule_id;
console.log(scheduleId); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-05-02T00:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-05-02T08:00:00+08:00"
What each field controls:
Field
Role
cadence.simple.times
Wall-clock fire times in the schedule's timezone
cadence.timezone
Per-user IANA zone; the platform does not auto-detect the user's location
active_window.hours
Quiet-hours guard; fires computed outside the window are skipped, not deferred
intent
The why the agent grounds its message in — written as a short natural-language instruction
inventory_item_id
Links to the medication's structured properties, fetched live at every fire
metadata
Opaque developer tags surfaced to the agent as "Additional context" in the wakeup block
When the schedule fires at 08:00 Singapore time, the platform assembles a structured intent block and delivers it to the agent as a proactive wakeup. The agent composes its opening message in its own voice using the intent and the injected inventory properties. A typical output might look like:
"Morning — quick reminder, it's 8 o'clock. Time for your 500mg of ibuprofen, and remember to take it with food."
Exact wording depends on the agent's personality configuration. The agent is not given a fixed template — it receives the intent and inventory data and decides how to phrase it naturally.
Updating the dosage. When a doctor reduces the ibuprofen dose from 500mg to 250mg, update the inventory item:
await client.agents.inventory.update(AGENT_ID, USER_ID, inventoryItemId, {
properties: {
dosage: "250mg",
},
});
// No schedule edit required.
// The next scheduled fire automatically reads "250mg" from the live item.
This separation is intentional: inventory is the source of truth for the what; the schedule is the source of truth for the when. They change independently. Changing the dose never touches the schedule row; moving a reminder time never touches the medication item.
For medications with a fixed course length, use starts_at and ends_at to auto-disable the schedule when the course completes. Here is a 3x/day amoxicillin course that fires every 8 hours over 14 days:
const amoxItem = await client.agents.inventory.create(AGENT_ID, USER_ID, {
item_type: "medication",
label: "Amoxicillin",
project_id: PROJECT_ID,
properties: {
medication_name: "amoxicillin",
dosage: "500mg",
instructions: "complete the full course even if you feel better",
prescribed_by: "Dr. Tan",
},
});
const amoxSchedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "interval_hours", interval_hours: 8 },
timezone: "Asia/Singapore",
},
active_window: {
hours: { start: "07:00", end: "23:00" },
},
intent: "remind the user to take their amoxicillin — emphasise completing the full course",
check_type: "reminder",
inventory_item_id: amoxItem.inventory_item_id,
metadata: { reminder_category: "medication" },
starts_at: "2026-05-01T00:00:00Z",
ends_at: "2026-05-15T00:00:00Z",
});
After ends_at passes, the schedule is automatically disabled (enabled flips to false). The inventory item for amoxicillin remains as a historical record and can be queried via the Memory API. No cleanup is required.
Create one schedule per medication. Three daily medications = three schedules. Fires that land at the same wall-clock time produce separate proactive messages by design — each message is grounded in its own medication's inventory item.
Avoid simultaneous fires. If you want the user to receive distinct messages rather than a burst, stagger the times across schedules:
Medication
Schedule times
Metformin
["08:00", "20:00"]
Atorvastatin
["08:15"]
Vitamin D
["08:30"]
Alternative: compose a "morning routine" item. If you prefer a single message covering all morning medications, create one inventory item of type medication_routine (define its own schema) with a medications property that lists all drugs and doses. Attach that single item to a single 08:00 schedule. The agent receives all the structured data in one wakeup block and can address all medications in a single message.
When the user replies "I took it, thanks" or similar, the agent's memory layer auto-captures this as a fact on the user. You can query recent user responses to a medication reminder via the Memory API:
For a harder signal, add a POST /adherence/{scheduleId} endpoint in your tenant backend that your mobile or web app calls when the user taps a confirmation button. This gives you a structured event log independent of the conversational memory layer. Sonzai does not provide this endpoint — it lives in your own backend and stores data in your own database.
Patch the schedule's cadence.timezone whenever the user's preferred timezone changes. Future fires are immediately recomputed in the new zone; past fire history is not modified.
// User travelling from Singapore to Los Angeles
await client.schedules.patch(AGENT_ID, USER_ID, scheduleId, {
cadence: {
simple: { frequency: "daily", times: ["08:00", "20:00"] },
timezone: "America/Los_Angeles",
},
});
// Next fire: 08:00 PDT (Los Angeles) — not 08:00 SGT
Any cadence tick after 21:00 or before 07:00 is discarded. A twice-daily schedule with times ["08:00", "20:00"] would still fire at both times; adding a 22:00 dose would be silently skipped.
Night-shift user — active overnight, sleeping during the day.
When start is greater than end, the window wraps midnight. This user receives reminders from 22:00 to 05:59 the next morning, and any cadence ticks during daytime hours are skipped.
// 在入职期间或 CRM 导入后调用一次
await client.agents.memory.seed(AGENT_ID, {
userId: USER_ID,
memories: [
{
content: "Mia is a 32-year-old UX designer based in Berlin.",
type: "user_fact",
},
{
content: "Mia subscribed to the Pro plan on 2024-11-03.",
type: "shared_experience",
occurred_at: "2024-11-03T00:00:00Z",
},
{
content: "Mia prefers email over SMS for notifications.",
type: "user_preference",
},
{
content: "Mia mentioned she wants to get into trail running.",
type: "user_goal",
},
],
});
A daily 09:00 Asia/Singapore check-in schedule that fires a proactive agent message every morning
An every-4-hours schedule with a quiet-hours active window that skips fires outside allowed hours
A bounded interval_hours course constrained by starts_at and ends_at — useful for multi-week programs
An understanding of how the same primitive powers the full Medication Reminders worked example
Scheduled Reminders are a first-class primitive: the platform recomputes next_fire_at after every fire, respects DST transitions automatically, and injects inventory context live at fire time so your agent always has current data.
Register a schedule by calling POST /api/v1/agents/{agentId}/users/{userId}/schedules. The body describes when to fire (cadence), what the agent should do (intent), and optional scoping fields (active_window, inventory_item_id, starts_at, ends_at).
Here is a minimal daily 09:00 SGT check-in:
{ "cadence": { "simple": { "frequency": "daily", "times": ["09:00"] }, "timezone": "Asia/Singapore" }, "intent": "check in on how the user is feeling", "check_type": "reminder"}
And a full example with all optional fields:
{ "cadence": { "simple": { "frequency": "daily", "times": ["09:00"] }, "timezone": "Asia/Singapore" }, "active_window": { "hours": { "start": "08:00", "end": "22:00" }, "days_of_week": ["mon", "tue", "wed", "thu", "fri"] }, "intent": "check in on how the user is feeling", "check_type": "reminder", "inventory_item_id": "01HX8F...", "metadata": { "campaign": "daily_checkin_v2" }, "starts_at": "2026-05-01T00:00:00Z", "ends_at": "2026-05-14T23:59:59Z"}
The response includes schedule_id, next_fire_at (UTC), and next_fire_at_local (the same instant expressed in the schedule's timezone — useful for displaying to the user).
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const AGENT_ID = "agent_abc";
const USER_ID = "user_123";
const schedule = await client.schedules.create(AGENT_ID, USER_ID, {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Asia/Singapore",
},
intent: "check in on how the user is feeling",
check_type: "reminder",
});
console.log(schedule.schedule_id); // "sched_01HX..."
console.log(schedule.next_fire_at); // "2026-05-02T01:00:00Z"
console.log(schedule.next_fire_at_local); // "2026-05-02T09:00:00+08:00"
Wall-clock times in HH:MM (24-hour), evaluated in the schedule's timezone
days_of_week
string[]
Yes for weekly
"mon", "tue", "wed", "thu", "fri", "sat", "sun"
interval_hours
number
Yes for interval_hours
Minimum 1, maximum 24
timezone
IANA string
Yes
Applied to times and days_of_week evaluation
A weekly schedule fires on the specified days at each listed time. A daily schedule fires every day at each listed time. An interval_hours schedule fires repeatedly at that interval starting from starts_at (or schedule creation if starts_at is omitted), bounded by the active window.
Standard 5-field cron — no seconds field. Example: "0 9 * * 1-5" fires at 09:00 on weekdays.
Rate limits. Cadences that resolve to more than one fire per minute are rejected with CADENCE_TOO_FREQUENT. Cadences that produce more than 96 raw ticks per 24-hour rolling window (before active-window filtering) are rejected with CADENCE_TOO_DENSE. For most use cases interval_hours: 1 (24 raw ticks/day) is the densest practical setting.
Every schedule requires a timezone field containing a valid IANA timezone name (e.g. "Asia/Singapore", "America/New_York", "Europe/London"). Offsets like "+08:00" are not accepted.
All cadence math — wall-clock time evaluation, days_of_week membership, DST skip logic — runs in the schedule's own timezone. The result is stored and returned as next_fire_at in UTC. next_fire_at_local is a convenience field that expresses the same instant with the zone offset applied.
When a user travels or changes their preferred timezone, patch the schedule timezone directly:
// User moved from Singapore to London
await client.schedules.update(AGENT_ID, USER_ID, scheduleId, {
cadence: {
simple: { frequency: "daily", times: ["09:00"] },
timezone: "Europe/London",
},
});
DST handling. On spring-forward transitions, a wall time that falls into the clocks-forward gap (e.g. 02:30 in a zone that jumps 02:00 → 03:00) is non-existent. The platform skips that occurrence and fires at the next valid occurrence. On fall-back transitions, a wall time that exists twice is never double-fired — the platform fires once and advances.
The active_window field restricts which fires actually produce a proactive wakeup. Fires computed by the cadence that land outside the window are skipped, not deferred — the cadence grid stays perfectly predictable and no backlog accumulates.
Both sub-fields are optional within active_window. You may specify hours only, days_of_week only, or both.
Overnight windows. When start is greater than end, the window wraps midnight. For example {"start": "22:00", "end": "06:00"} allows fires from 22:00 to 05:59 the next morning. This is useful for night-shift workers or schedules targeting early-morning time zones where local midnight matters.
Allowed days. Values must be lowercase three-letter abbreviations: "mon", "tue", "wed", "thu", "fri", "sat", "sun". Day membership is evaluated in the schedule's timezone, so a fire at 23:30 Friday Singapore time stays Friday even when stored as 15:30 UTC (Saturday in some zones).
Empty days array. Passing "days_of_week": [] (an explicit empty list) is rejected with INVALID_ACTIVE_WINDOW — it would produce a schedule that can never fire. To allow all days, omit the days_of_week field entirely.
Pass inventory_item_id on the create (or patch) body to associate a schedule with a specific item from the user's resource inventory. The item's properties are injected live at fire time — not at schedule creation — so any mid-program updates to the item (e.g. a medication dosage change, a price update) are automatically reflected in the agent's proactive message without requiring any schedule modification.
{ "cadence": { "simple": { "frequency": "daily", "times": ["08:00"] }, "timezone": "Asia/Singapore" }, "intent": "remind the user to take their morning medication", "check_type": "reminder", "inventory_item_id": "01HX8FKZQ3..."}
At fire time the platform fetches the current item properties and appends them to the intent block the agent receives. The Medication Reminders tutorial shows a complete worked example including how to structure medication inventory items for maximum agent context.
Graceful degradation. If the referenced inventory item is deleted before a fire occurs, the schedule continues firing. The intent block is delivered without the Reference item section — the agent receives the intent and metadata fields as normal. No error is surfaced to the user; the schedule itself is not affected.
Use starts_at and ends_at to create a time-bounded program. Both fields are optional and accept RFC 3339 UTC timestamps.
{ "cadence": { "simple": { "frequency": "interval_hours", "interval_hours": 4 }, "timezone": "Asia/Singapore" }, "active_window": { "hours": { "start": "08:00", "end": "22:00" } }, "intent": "prompt the user to log a pain score", "check_type": "check_in", "starts_at": "2026-05-01T00:00:00Z", "ends_at": "2026-05-14T23:59:59Z"}
starts_at — no fire is produced before this timestamp. Cadence expansion begins from this point. If omitted, the schedule starts immediately.
ends_at — once this timestamp passes, the schedule is automatically disabled (enabled flips to false). The row is not deleted, so the audit trail and historical fire log remain accessible.
Passing ends_at that is less than or equal to starts_at returns INVALID_WINDOW. Passing a past ends_at at creation time also returns INVALID_WINDOW — a schedule that has already expired cannot be created.
GET /api/v1/agents/{agentId}/users/{userId}/schedules/{id}/upcoming?limit=N returns the next N computed fire times as an array of UTC timestamps. The preview applies the active window, so what you see is exactly what will fire.
For example, a 4-hourly schedule (interval_hours: 4) with an 08:00–22:00 active window produces at most 4 fires per calendar day (08:00, 12:00, 16:00, 20:00 local) — not 6 (which would be the raw cadence count before filtering). The preview array reflects this.
When a schedule fires, the platform constructs a structured intent block and delivers it to the agent as a proactive wakeup. The block looks like this:
[PROACTIVE WAKEUP — SCHEDULED REMINDER]Why you're reaching out: check in on how the user is feelingScheduled fire time (user's local): 2026-05-02T09:00:00+08:00Reference item (from inventory): Daily Vitamin D dosage: 1000 IU form: softgel timing_notes: take with foodAdditional context: campaign: daily_checkin_v2
Key points:
[PROACTIVE WAKEUP — SCHEDULED REMINDER] — the stable header the agent detects to know it is initiating a conversation, not responding to one.
Why you're reaching out — verbatim content of the intent field you set on the schedule. Write this as a short natural-language instruction to the agent. The agent composes the actual opening message in its own voice — no prompt template is exposed; you control intent, not wording.
Scheduled fire time (user's local) — the next_fire_at_local value at fire time. Useful for agents that want to acknowledge the time explicitly ("Good morning" vs "Good afternoon").
Reference item (from inventory) — present only if inventory_item_id was set and the item still exists. The item's label and all of its properties are included. Item properties are fetched live at fire time.
Additional context — present only if metadata was set. All metadata key-value pairs are rendered here. Use this for campaign tracking, A/B variant labels, or any additional instruction to the agent that doesn't belong in the core intent.
There is no prompt template field. Clients control agent behavior through intent, inventory_item_id, and metadata. The agent is free to adapt its tone, greeting, and language based on the user's personality and the conversation history it already has.
Medication Reminders — a full worked example using Scheduled Reminders to drive a medication adherence program, including inventory schema design for medication items and multi-dose daily schedules.
Resource Inventory + Knowledge Base — how to design inventory schemas and push live data, powering the inventory_item_id linkage described above.
Memory-Aware Chat — how the agent remembers user responses from previous proactive conversations and incorporates them into future interactions.
await client.agents.sessions.setTools("agent-id", { userId: "user-123", tools: [ { name: "change_scene", description: "Move to a new location in the story. Use when the scene has run its course or a new chapter begins.", parameters: { type: "object", properties: { location: { type: "string" } }, required: ["location"] }, }, ],});