Public HTTP endpoints for agent lifecycle, real-time agent interaction, and proactive delivery. Memory, mood, relationship, and context-management internals are handled by the platform.
Server-side only. The API does not accept browser requests. For web apps, proxy through your backend. See the Integration Guide.
Generates or regenerates an AI-created avatar for the agent. Uses LLM to create an image prompt from personality data, then generates and uploads the image. Costs 1 credit. Avatars are auto-generated on agent creation unless disabled.
Request:
agent_id (string): Agent UUID (URL param)
style (string): Optional style hint (e.g. 'watercolor anime', 'realistic portrait')
Primary public conversation RPC. Send the agent, user, application context, and message history; the platform handles context assembly and state updates automatically.
Bidirectional streaming voice chat with server-side VAD (voice activity detection). Client streams audio chunks continuously; server handles speech detection, transcription, AI response, and TTS.
Notify the platform about significant application events. The platform may generate diary entries, update goals, or take other AI actions. Fires OnDiaryGenerated webhook when diary is created.
Project-scoped knowledge graph. Upload documents or push structured data via the API — the platform extracts entities, builds a graph, and gives agents a knowledge_search tool to query it during conversations.
Pre-load user metadata and content so AI agents already "know" users from their first conversation. Metadata (name, company, title) becomes instant facts; content blocks (text, chat transcripts) are processed asynchronously via LLM extraction.
The Mind Layer is a standalone platform that separates agent intelligence (personality, memory, mood) from your application logic. Any backend integrates via REST API or the official SDKs.
Your Backend Mind Layer Platform
| |
|--- Create Agent ---------------->|
|<-- Agent ID + Profile -----------|
| |
|--- Chat (SSE streaming) -------->|
| (messages + app context) |-- Build context
|<-- Streaming AI response --------|-- Stream AI response
| |-- Update memory, mood, personality
|<-- Proactive notifications -------| (automatic, no extra calls)
User-facing application. Sends messages to your backend and renders agent responses. Examples: React, Next.js, Vue, mobile app.
Your Backend
Handles auth, application state, user sessions, and business logic. Calls the Mind Layer via SDK, REST API, MCP, or OpenClaw plugin for AI interactions. Examples: Express, Django, Go, OpenClaw.
Sonzai Mind Layer
Owns agent intelligence: personality, memory, mood, habits, goals, and relationships. A single chat call handles context assembly, AI streaming, and post-chat learning. Examples: api.sonz.ai.
On each chat call, the platform automatically assembles relevant context from personality, memory, mood, and relationship data before generating the AI response. Post-chat state updates happen automatically — no extra API calls needed.
Context Assembly
Personality, mood, memories, relationship narrative, and application state — all assembled per request.
Memory Extraction
Facts, events, and commitments are extracted from each conversation and stored automatically.
Mood & Personality Evolution
Mood and Big5 personality drift naturally based on interaction patterns.
Proactive Notifications
Agents can schedule proactive outreach between sessions. Deliver via polling or webhook.
Every conversation follows a simple three-step lifecycle:
1. Send a chat request — Pass agent ID, user ID, application context, and messages
2. Receive stream — Render tokens to the user in real time
3. Platform auto-updates — Memory, mood, relationships, and personality evolve
Pass application state per request so the agent can reference it during conversation. The platform doesn't cache this state — send it on every chat call.
for await (const event of client.agents.chatStream("agent-id", {
messages: [{ role: "user", content: "What should I do next?" }],
userId: "user-123",
gameContext: {
customFields: {
department: "Engineering",
currentTask: "Q2 roadmap review",
ticketsOpen: 12,
role: "Senior Developer",
},
},
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Use the dialogue API to orchestrate conversations between multiple agents, or to let agents respond to shared scene context without a direct user message.
const response = await client.agents.dialogue("agent-id", {
userId: "user-123",
messages: [
{ role: "user", content: "Tell the group about your weekend plans" },
],
sceneGuidance: "A casual team standup meeting",
});
console.log(response.content);
Tool Calling
Internal tools (memory, state) run automatically. You only configure opt-in tools for end-user capabilities:
Configure an OpenAI-compatible API endpoint for your project. Sonzai routes all chat generation through your endpoint while handling everything else: context assembly, tool execution, side-effect extraction, memory storage, personality tracking, and consolidation.
Full Managed Experience
Built-in tools (web search, memory recall, image generation, inventory), streaming SSE, per-message side effects — everything works exactly as with our default providers.
Your Model, Your Control
Use fine-tuned models, self-hosted endpoints, or any OpenAI-compatible provider (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.).
Encrypted at Rest
Your API key is encrypted with AES-256 before storage. Only the first 8 characters are visible in the dashboard for identification.
Per-Project Configuration
Each project can have its own custom LLM endpoint. Toggle it on/off without deleting the config.
Custom LLM is the right choice when you want to use your own model but still want the full Sonzai experience (tools, streaming, per-message extraction). Standalone Memory is for when you need to control the entire chat loop yourself — e.g., for privacy preprocessing, data anonymization, or deep integration with an agent framework. See the Standalone Memory docs for the tradeoffs.
Once configured, here is what happens when a chat request is made:
Context assembly — Sonzai builds the 7-layer enriched context (personality, memory, mood, habits, goals, relationships, game state) exactly as with default providers.
Tool injection — Built-in tools (sonzai_memory_recall, sonzai_web_search, etc.) and any custom tools are added to the request.
Your endpoint called — The request is sent to your configured endpoint with your model name, API key, and the full message history including system prompt.
Streaming proxy — SSE chunks from your endpoint are streamed back to the client in real time.
Post-stream processing — After the stream completes, Sonzai extracts side effects (memory facts, mood changes, personality shifts, habits, tool calls) and stores them — same as with default providers.
Background tasks like fact extraction, memory consolidation, diary generation, and summarization automatically use the same model family you configured. Sonzai tracks the last-used provider/model for each agent and routes background LLM calls accordingly.
Custom LLM usage is billed at a flat per-token rate under the custom_llm billing model, regardless of which actual model your endpoint serves. Sonzai tracks input/output tokens from your endpoint's usage response. Your own endpoint costs (API fees, compute) are entirely yours.
Flexible key-value storage injected into the LLM context at chat time. Use it to pass environment state, task progress, resources, or any application data directly into every AI response.
Per Instance — Shared across all users in an instance. Use for environment state, configuration, agent status, global events.
Per-User State
Per Instance + User — Scoped to one user in an instance. Use for assigned tasks, workflow progress, user preferences, active tools.
Instances
All states are scoped to an instanceId — one deployment context of your agent (e.g. a workspace or environment). Omit instanceId to use the default instance. See Instances for managing multiple contexts.
// All global states for an instance
const globals = await client.agents.customStates.list("agent-id", {
scope: "global",
instanceId: "workspace-1",
});
// All per-user states for a specific user
const userStates = await client.agents.customStates.list("agent-id", {
scope: "user",
userId: "user-123",
});
Tools let the LLM call functions during inference. Sonzai handles sonzai_-prefixed built-in tools. Custom tools are defined by you and executed by your backend — Sonzai surfaces the call as a side effect.
Using your own LLM?
If you use standalone memory mode (BYO-LLM), Sonzai exposes tool schemas you can wire into your agent framework (LangChain, Vercel AI SDK, Gemini function calling, etc.). See the Tool Integration guide for details.
Inject tools dynamically for a specific session. Session tools merge with agent-level tools — same-name session tools take precedence. Discarded when the session ends.
When the LLM decides to call a custom tool, it appears as a side effect in the SSE stream. Your backend executes the tool and returns the result in the next message.
Seed what an agent knows about a user before their first conversation. Use this for CRM data, user profiles, previous purchase history, or any user-specific knowledge that should be available from day one.
Mood, emotions, and goals are all managed automatically by the context engine. Every conversation, application event, and time-based decay is processed without any code on your side.
The APIs and SDK calls on this page are for reading state (dashboards, analytics) or manually overriding values when your application needs to drive a specific emotional state or goal — for example, triggering a mood boost after an in-app achievement, or setting a goal based on a workflow milestone.
Retrieve a historical snapshot of an agent's mood at any point in time.
const snapshot = await client.agents.getTimeMachine("agent-id", {
at: "2026-02-14T12:00:00Z",
userId: "user-123",
});
console.log(snapshot.mood); // mood state at that timestamp
console.log(snapshot.personality); // personality state at that timestamp
Constellations are automatically detected clusters of related memories that form meaningful patterns. The platform identifies these from recurring themes across conversations.
const constellations = await client.agents.getConstellation("agent-id", {
userId: "user-123",
});
for (const cluster of constellations.clusters) {
console.log(cluster.theme, cluster.memories.length);
}
Breakthroughs are significant personality shifts or emotional milestones detected by the platform. They represent moments where an agent's understanding or relationship with a user meaningfully evolved.
const breakthroughs = await client.agents.getBreakthroughs("agent-id", {
userId: "user-123",
});
for (const b of breakthroughs.items) {
console.log(b.type, b.description, b.timestamp);
}
Goals represent what the agent is working toward with a user — they are detected and updated automatically by the context engine as conversations unfold. You do not need to set or manage them manually.
Common use cases for manual goal management:
Seeding a goal when a user starts a new workflow (e.g., "complete onboarding")
Marking a goal achieved after a business event (e.g., a purchase or milestone)
Abandoning a goal when the user changes direction
// Goals are managed automatically — override only when needed
// List current goals
const goals = await client.agents.listGoals("agent-id", { userId: "user-123" });
// Manually create a goal (optional)
const goal = await client.agents.createGoal("agent-id", {
userId: "user-123",
title: "Onboarding",
description: "Complete onboarding checklist",
type: "task",
priority: 1,
});
// Mark a goal achieved after a business event
await client.agents.updateGoal("agent-id", goal.goal_id, {
userId: "user-123",
status: "achieved",
});
// List runs
const runs = await client.evalRuns.list({ agentId: "agent-id" });
// Get a specific run
const run = await client.evalRuns.get("run-id");
// Reconnect to a streaming run
for await (const event of client.evalRuns.streamEvents("run-id")) {
console.log(event.type, event.message);
}
Async Simulations
Simulations support async mode via simulateAsync() which returns a RunRef immediately, allowing you to poll or reconnect later.
Generate or regenerate a biography for an existing agent.
const bio = await client.generation.generateBio("agent-id", {
description: "A friendly barista who remembers every customer's order",
style: "warm and conversational",
});
console.log(bio.bio);
For precise control, create an agent with explicit Big5 scores. The platform derives a full personality profile, speech patterns, and emotional tendencies from your scores.
import { Sonzai } from "@sonzai-labs/agents";
import { v5 as uuidv5 } from "uuid";
const client = new Sonzai({ apiKey: "sk-..." });
// Derive a stable UUID from your own entity ID
const MY_NAMESPACE = "your-uuid-namespace-here";
const agentId = uuidv5("support-agent-001", MY_NAMESPACE);
const agent = await client.agents.create({
agentId, // pass your own UUID — safe to repeat
name: "Luna",
gender: "female",
big5: {
openness: 0.75,
conscientiousness: 0.60,
extraversion: 0.80,
agreeableness: 0.70,
neuroticism: 0.30,
},
language: "en",
});
console.log(agent.agentId); // same UUID every time
Idempotent by Design
Agent creation is always a create-or-update. Calling it twice with the same ID updates the existing agent — it never errors or creates a duplicate. This means your startup code, CI pipelines, and provisioning scripts can call agents.create() unconditionally.
With agentId: Server uses your UUID directly. Recommended — link agents to your own entity IDs (agents, assistants, employees) for a deterministic mapping you control.
Without agentId: Server derives a UUID from your project ID + agent name. The same name always maps to the same agent within your project.
Use streaming chat to get real-time AI responses. The platform automatically handles context, memory, and state updates.
for await (const event of client.agents.chatStream("agent-id", {
messages: [{ role: "user", content: "I had a great day hiking!" }],
userId: "user-123",
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Server-Side Only
The SDK is for server-side use only. Never expose API keys in client-side code. For web apps, proxy through your backend. See the Integration Guide for examples.
Browse the API Reference for all available endpoints
Set up a Knowledge Base so agents can query your domain data
GETTING STARTED
Sonzai Mind Layer
The Mind Layer tracks personality, memory, mood, habits, goals, and
relationships for AI agents, updating them through user interactions. Your
application connects via REST API, sends events, and gets back enriched
context for AI conversations.
Your backend handles business logic and user sessions while the Mind Layer
owns agent intelligence — personality, memory, mood, and relationships.
Connect via REST API, pass application context per request, and let the
platform manage everything else.
An Instance is a deployment context for an agent. The agent itself (personality, memory, tools) is shared — but custom state is isolated per instance.
Agent "Luna"
├── Instance: default ← used when instanceId is omitted
├── Instance: ws-us-east ← US-East workspace
├── Instance: ws-eu-west ← EU-West workspace
└── Instance: ws-staging ← separate deployment
Each instance has its own:
• Global custom states (environment state, configuration)
• Per-user custom states scoped to this instance
• Isolated from other instances
Default Instance
Every agent has a default instance. If you don't pass instanceId to chat or state operations, the default instance is used. You only need multiple instances if you run the same agent in parallel isolated contexts.
Pass instanceId to chat calls to scope state reads to that instance. The agent will see global custom states for that instance and per-user states scoped to it.
for await (const event of client.agents.chatStream({
agent: "agent-id",
messages: [{ role: "user", content: "What's the current status?" }],
userId: "user-123",
instanceId: "ws-us-east", // scopes state reads to this instance
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
Your backend manages business logic and user sessions. Call the Mind Layer for agent intelligence — it owns memory, personality, mood, relationships, and context assembly.
Integrate via the REST API using official SDKs for Go, TypeScript, and Python.
Official SDKs for Go, TypeScript, and Python, plus an OpenClaw plugin. Each SDK wraps the full REST API with typed methods, SSE streaming, automatic retries, and error handling.
All REST requests use Bearer authentication with your project API key:
# All REST requests use Bearer auth with your project API keycurl -H "Authorization: Bearer sk_your_api_key" \ https://api.sonz.ai/api/v1/agents/{agentId}/chat
For Node.js backends, serverless functions, and server-side frameworks. Works with Node.js >= 18, Bun, and Deno. Not for browser/client-side use — API keys would be exposed.
The Sonzai API does not accept browser (client-side) requests. API keys must never be exposed in frontend code. This is the same pattern used by OpenAI, Anthropic, and other AI API providers.
For web apps (React, Next.js, Vue, etc.), create a backend API route that proxies to Sonzai. Your frontend calls your server; your server calls Sonzai with the API key.
import sonzai "github.com/sonz-ai/sonzai-go"// Connects to api.sonz.ai by default — just provide your API keyclient := sonzai.NewClient("sk_your_api_key")// Override the base URL for local dev / self-hostedclient := sonzai.NewClient("sk_your_api_key", sonzai.WithBaseURL("http://localhost:8090"),)
When a user creates a new agent in your application, call CreateAgent with their personality configuration:
resp, err := client.Agents.Create(ctx, sonzai.CreateAgentRequest{ Name: "Luna", Gender: "female", Big5: sonzai.Big5Scores{ Openness: 0.75, Conscientiousness: 0.60, Extraversion: 0.80, Agreeableness: 0.70, Neuroticism: 0.30, }, Language: "en",})// resp.AgentID is the platform-generated UUID// Store this in your user record
Agents can reach out to users between conversations. When triggered, the platform generates a contextual message using the agent's full state and stores it as "pending". Your app polls and marks notifications consumed after delivery.
# Poll for pending proactive messagesGET /api/v1/agents/{agentId}/notifications?status=pending&user_id=user-123# Response{ "notifications": [{ "message_id": "msg-uuid", "user_id": "user-123", "check_type": "check_in", "intent": "Ask about yesterday's hiking trip", "generated_message": "Hey! How was the hike at Mount Rainier?", "status": "pending", "created_at": "2026-03-07T10:00:00Z" }]}# After delivering to user, mark consumedPOST /api/v1/agents/{agentId}/notifications/{messageId}/consume
Delivery Best Practice
Poll every 30-60 seconds. Always mark consumed after delivery to prevent re-delivery.
Your backend translates application events into Mind Layer API calls. You can swap the backend without changing agent behavior, or reuse agents across applications.
Pre-load user metadata and content so AI agents already know users from their first conversation. Metadata (name, company, title) becomes instant facts; content blocks are extracted asynchronously via LLM.
Metadata facts (name, company, title) are created synchronously. Content blocks (text, chat transcripts) are processed in the background via LLM extraction. Poll the job status to track progress.
The Knowledge Base is a domain-agnostic system that turns your data into a structured knowledge graph. AI agents search this graph during conversations to provide accurate, grounded responses instead of hallucinating.
Upload document / API push
|
v
Extract entities + relationships
|
v
Build knowledge graph (deduplicated nodes + edges)
|
v
Run analytics rules (recommendations, trends)
|
v
Agent queries graph during conversations
Upload PDFs, DOCX, Markdown, or plain text files. The platform automatically processes the document, extracts entities and relationships, and adds them to the knowledge graph.
If an entity with the same label + type already exists, properties are merged and the version is incremented. Every change is recorded in the version history with the source and timestamp.
Define rules that match source entities to target entities based on field matching. The engine prevents feedback loops and improves recommendations over time using conversion feedback.
Field Matching
Match by exact value, range tolerance, minimum threshold, array overlap, or numeric proximity. Each rule has configurable weights.
Conversion Tracking
Record when recommendations lead to actions (purchases, signups). The system learns and improves over time.
Staleness Filter
Entities not updated within a configurable window are automatically excluded from recommendations.
During conversations, AI agents have access to a knowledge_search tool that queries the knowledge base. Instead of hallucinating facts, agents pull accurate data from your graph — including entity properties, relationships, and recommendations.
User: "What's the best card under $500?"
|
v
Agent calls knowledge_search("cards under 500")
|
v
KB returns: Charizard ($450, +12%), Blastoise ($380, +8%)
|
v
Agent: "Charizard Base Set at $450 is trending up 12%
this month -- great investment pick under $500."
Grounded Responses
Knowledge-backed responses are grounded in your real data. The agent knows current prices, stock levels, specs, and relationships because it queries the graph in real-time.
The Sonzai MCP server exposes the entire Mind Layer API as tools that AI assistants can use directly. Instead of writing code to call the REST API, you configure Claude Desktop or Claude Code to connect to the MCP server and it can create agents, chat with them, manage memories, track behavior, and more -- all through natural language.
The server implements the Model Context Protocol open standard and provides 34 tools, 4 resources, and 3 guided prompts.
stdio (default) runs the server as a local process -- best for Claude Desktop and Claude Code. SSE runs an HTTP server for remote or networked clients.
The MCP server is a thin translation layer between MCP clients and the Platform API. It converts MCP tool calls into HTTP requests and returns the results.
Claude / AI Assistant | | MCP Protocol (stdio or SSE) v Sonzai MCP Server (Go binary) | | HTTP REST + SSE v Sonzai Platform API | +-- Context Engine (memory, personality, behavior) +-- AI Service (LLM generation) +-- ScyllaDB, Redis, CockroachDB
Server-Side Only
The MCP server requires your API key and should only run on trusted machines. Never expose it to untrusted networks without proper authentication.
Next Steps
Read the API Reference for the full REST API that the MCP server wraps
Memory is fully automatic — you don't need to manage it. The platform analyzes each conversation and extracts facts, events, and commitments. Before each response, the most relevant memories are assembled and included in context automatically.
No orchestration needed
Just call chat. The platform handles memory extraction, storage, and retrieval on every interaction.
Pre-load what an agent knows about a user before their first conversation using memory.seed().
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: "sk-..." });
await client.agents.memory.seed("agent-id", "user-123", {
facts: [
"User's name is Jane Smith",
"Jane is a senior product manager at Acme Corp",
"Jane lives in San Francisco and enjoys hiking",
],
});
Navigate the hierarchical memory structure for a user.
// Browse the memory tree at a given path
const nodes = await client.agents.memory.browse("agent-id", {
userId: "user-123",
path: "/facts", // optional: filter by path
});
OpenClaw is an open-source framework for building conversational AI agents. It uses a modular plugin system with named slots — each slot controls a specific part of the agent pipeline.
The most important slot is contextEngine. This is the plugin responsible for deciding what context gets injected into the system prompt before every LLM call. It controls what your agent remembers, knows, and feels.
OpenClaw's plugin system works like middleware. Each plugin implements lifecycle hooks that fire at specific points during a conversation turn:
bootstrap(sessionId): Called when a new chat session starts. The plugin initializes any connections or state it needs.
assemble(messages, tokenBudget): Called before every LLM call. The plugin returns a systemPromptAddition — extra context injected into the system prompt.
afterTurn(sessionId): Called after the LLM responds. The plugin processes the conversation (e.g., extract facts, update state).
compact(sessionId): Called when context needs to be consolidated (e.g., merging short-term memory into long-term).
dispose(): Called when the session ends. Clean up connections and state.
By default, OpenClaw ships with a basic context engine that stores memories as local Markdown files. The Sonzai plugin replaces this with the Mind Layer — giving your agent persistent memory, personality evolution, mood tracking, and relationship modeling with zero additional code.
When you install @sonzai-labs/openclaw-context, the package exports a register() function as its default export. On startup, OpenClaw loads all installed plugins and calls their register functions. Ours registers a context engine factory under the name "sonzai":
// Inside @sonzai-labs/openclaw-context (you don't write this)export default function register(api) { api.registerContextEngine("sonzai", () => { return new SonzaiContextEngine(client, config); });}
Then in openclaw.json, you tell OpenClaw which registered engine to use for the contextEngine slot. The name "sonzai" must match what the plugin registered:
So the flow is: install the npm package → OpenClaw discovers and calls register() → the plugin registers under "sonzai" → your config assigns it to the contextEngine slot.
Why Sonzai as a Context Layer?
Sonzai serves as a pure context engine for OpenClaw. Instead of the framework managing its own memory files, every conversation flows through the Mind Layer — which handles fact extraction, semantic search, mood updates, and personality evolution automatically. Your OpenClaw agent gets rich, structured context without writing any memory logic.
# Install via OpenClaw CLIopenclaw plugins install @sonzai-labs/openclaw-context# Or install directly with your package managernpm install @sonzai-labs/openclaw-context# bun add @sonzai-labs/openclaw-context
Your API key is stored in openclaw.json alongside your plugin config — no environment variables needed. Make sure openclaw.json is in your .gitignore to avoid committing secrets.
You can selectively disable specific context sources via the disable map. This is useful when you want the Mind Layer for memory but don't need mood tracking, or when you want to reduce token usage:
On each turn, the plugin injects a structured <sonzai-context> block into the system prompt. Sections are ordered by priority and dropped lowest-first if the token budget is exceeded:
Relevant Memories (priority 2): Semantically searched facts matching the latest user message
Current Mood (priority 3): 4D emotional state (valence, arousal, tension, affiliation)
Relationship (priority 4): Relationship narrative, love scores, chemistry with the current user
Goals (priority 5): Active goals (growth, mastery, relationship, discovery)
Interests (priority 6): Detected interests with confidence levels
Habits (priority 7, lowest): Behavioral patterns with strength scores
Token Budget
The default budget is 2000 tokens (~8000 characters). The plugin estimates token count at ~4 characters per token and drops the lowest-priority sections first when the budget is exceeded. Adjust with contextTokenBudget in your config.
The plugin automatically extracts user identity from OpenClaw's session key format. This enables per-user memory and relationships without any configuration:
Agent IDs are generated deterministically from SHA1(tenantID + agentName). Calling setup multiple times with the same name returns the same agent — safe for restarts and redeployments.
The context engine handles all communication with the Mind Layer. During assemble, it fetches context sources (memory, personality, mood, relationships, goals, interests, habits), ranks them by priority, and trims to the token budget. During afterTurn, it sends the conversation back for fact extraction and state updates. The engine never runs LLM calls locally — all intelligence lives on the Sonzai side.
Graceful Degradation
All API calls are wrapped in error handlers. If the Mind Layer is unreachable, the engine returns empty context and never blocks OpenClaw — your agent continues working without enriched context.
Every agent has Big Five (OCEAN) personality scores. Behavioral traits, mood baselines, speech patterns, and interaction preferences all derive from these scores.
Openness (0.0 - 1.0): Curiosity, creativity, openness to experience. High = imaginative, adventurous. Low = practical, conventional.
Adjust personality behavior for specific users without affecting the base agent profile.
// Apply a per-user overlay (e.g. from assessment results)
await client.agents.personality.setUserOverlay("agent-id", "user-123", {
big5: {
extraversion: 0.55, // agent speaks more reservedly with this user
},
confidence: 0.6,
});
In standalone mode, you bring your own LLM for chat generation but route conversation transcripts through our Context Engine for extraction and behavioral processing. This lets you:
Anonymize or transform data before sending to your own LLM
Use any LLM provider (Gemini, Anthropic, local models, etc.)
Still get full behavioral intelligence — memory, personality evolution, mood tracking, habit detection, goal tracking, relationship dynamics, and proactive outreach
Bill extraction through our managed LLM (you choose the provider and model)
Steps 4 and 5 use Sonzai's LLM (billed to your account) — extraction and consolidation. Steps 1, 2, and 3 use no Sonzai LLM credits. Your own LLM costs in Step 3 are entirely yours.
Want to use your own model? Before choosing standalone mode, consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.) while keeping the full managed experience — built-in tools, streaming, per-message extraction, and memory prewarming all work automatically.
Standalone mode is designed for the narrow set of scenarios where you must control the entire chat loop yourself. The managed mode (using Sonzai's LLM or Custom LLM) provides a significantly richer experience. Choose standalone only when:
Privacy & Data Preprocessing
You need to anonymize, redact PII, or transform conversation data before it reaches any LLM. Standalone lets you intercept and sanitize the enriched context before sending to your own model.
Regulatory Requirements
Compliance mandates that conversation data never leaves your infrastructure for chat generation, while still allowing metadata extraction via Sonzai's LLM.
Deep Agent Framework Integration
Your architecture requires an agent framework (LangChain, CrewAI, Vercel AI SDK) that manages its own LLM loop, tool orchestration, and multi-step reasoning.
Custom Prompt Pipeline
You need full control over prompt construction, few-shot examples, chain-of-thought, or multi-model routing that goes beyond what the managed chat supports.
Standalone mode trades convenience for control. If you just want to use your own model, use Custom LLM instead — it gives you the full managed experience with your endpoint. Only choose standalone if you need to preprocess data or control the entire chat loop.
Managed mode: Built-in tools (web search, memory recall, image generation, inventory) are called automatically by the LLM during chat and executed by the platform.
Standalone mode: Built-in tools are unavailable. You must implement tool calling yourself using the tool schemas endpoint. See the Tool Integration guide.
Managed mode: Side effects (memory, mood, personality, habits) are extracted inline after every message — the agent evolves in real time.
Standalone mode: Side effects are extracted in batch when you call /process. If you send multiple messages before calling /process, behavioral updates are delayed.
Managed mode: Chat responses stream via SSE with real-time deltas. Side effects (mood changes, emotional themes) appear as live events during the stream.
Standalone mode: /context and /process are synchronous request-response calls. There is no streaming — you handle streaming with your own LLM.
Managed mode: Memory bundles are prewarmed automatically on every chat request for near-instant retrieval (~10ms vs ~2000ms cold).
Standalone mode: Memory prewarming triggers when you call /sessions/start and caches for 2 hours. You must explicitly start a session to benefit — skipping session start means cold context builds every time.
Managed mode: Session start/end, message history caching, and consolidation triggers are handled automatically.
Standalone mode: You must explicitly call /sessions/start, /sessions/end, and /process at the right times. Missing these calls means lost behavioral data.
What's the same in both modes?
The extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood changes, habit detection, and consolidation. The 7-layer enriched context returned by /context is the same data the managed chat builds internally. The difference is in tooling, streaming, and real-time responsiveness — not in the intelligence of the memory layer itself.
Fetch the full 7-layer enriched context. This includes personality traits, current mood, relevant memories, active goals, detected habits, relationship state, and proactive signals. Use this to construct your own system prompt.
const context = await client.agents.getContext("agent-id", {
userId: "user-123",
sessionId: "session-abc",
query: "What should we talk about?", // current user message
});
// context.layers contains:
// profile — agent identity, Big5 personality, speech patterns
// behavioral — current mood, habits, goals, interests
// relationship — love scores, narrative arc
// evolution — recent personality shifts
// memory — recalled facts, long-term summaries
// proactive — pending wakeup intents
// game — custom game state (if set)
// Build your own system prompt with this context
const systemPrompt = `You are ${context.profile.name}.
Personality: ${JSON.stringify(context.profile.big5)}
Current mood: ${JSON.stringify(context.behavioral.mood)}
Relevant memories: ${context.memory.facts.map(f => f.text).join("\n")}
`;
Send the enriched context to your own LLM. This step happens entirely on your infrastructure — you can anonymize, transform, or filter the context however you need.
// Example: using Google Gemini with Sonzai contextimport { GoogleGenAI } from "@google/genai";const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });const response = await gemini.models.generateContent({ model: "gemini-3.1-flash-lite-preview", contents: [ { role: "user", parts: [{ text: systemPrompt + "\n\n" + userMessage }] }, ],});const assistantMessage = response.text;
Privacy Note
You have full control over what data reaches your LLM. Strip PII, redact sensitive facts, or anonymize the context before sending. The Context Engine only sees the transcript you send back in Step 4.
Send the conversation transcript back to the Context Engine. We extract memories, personality shifts, mood changes, habits, interests, relationship dynamics, and proactive signals using our managed LLM. You choose which provider and model to use.
const result = await client.agents.process("agent-id", {
userId: "user-123",
sessionId: "session-abc",
messages: [
{ role: "user", content: userMessage },
{ role: "assistant", content: assistantMessage },
],
provider: "gemini", // our LLM for extraction
model: "gemini-3.1-flash-lite-preview",
includeExtractions: true, // get full extraction details back
});
console.log(result.memories_created); // 3
console.log(result.side_effects); // { mood_updated: true, ... }
console.log(result.extractions); // full details (when requested)
// Extraction includes:
// memory_facts — new facts extracted from conversation
// personality_deltas — Big5 trait shifts with reasoning
// mood_delta — 4D mood change (happiness, energy, calmness, affection)
// habit_observations — detected behavioral patterns
// interests_detected — topics the user engaged with
// relationship_delta — love score change with reason
// proactive_suggestions — scheduled check-ins or follow-ups
// emotional_themes — detected emotional tones
After processing conversations, all behavioral data is available via dedicated endpoints. Use these to display agent state in your UI, drive game mechanics, or feed into your own systems.
The Context Engine schedules proactive outreach (check-ins, follow-ups) based on conversation patterns. Poll for pending notifications and consume them when delivered.
// List pending notifications
const notifications = await client.agents.notifications.list("agent-id");
for (const notif of notifications) {
// Deliver to user via your channel (push, email, in-app, etc.)
await deliverToUser(notif.user_id, notif.message);
// Mark as consumed
await client.agents.notifications.consume("agent-id", notif.message_id);
}
// View notification history
const history = await client.agents.notifications.history("agent-id");
When you call GET /context with a query parameter, the endpoint automatically searches the agent's knowledge base and includes matching results in a knowledge field:
{ "profile": { ... }, "memory": { ... }, "knowledge": { "results": [ { "content": "Refund policy: customers can request a full refund within 30 days...", "label": "Refund Policy", "type": "policy", "source": "policies.pdf", "score": 0.92 } ] }}
After /process extracts side effects, it also searches the KB with topics and entities found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals. The next/context call automatically includes them — so the agent gets smarter with each turn.
Turn 1: /context → (no KB results yet)
↓
chat with your LLM
↓
/process → extracts "hiking gear" as topic
→ searches KB, finds "Hiking Equipment Guide"
→ stores as proactive signal
Turn 2: /context → includes "Hiking Equipment Guide" from KB
+ any direct search results for the new query
↓
chat with your LLM (now knows about hiking gear!)
For frameworks like OpenClaw where the LLM can call tools, use the standalone knowledge search endpoint:
const results = await client.agents.knowledgeSearch("agent-id", {
query: "refund policy",
limit: 5,
});
for (const result of results.results) {
console.log(result.label, result.content);
}
How it all fits together
The automatic /context inclusion and learning loop handle most cases with zero configuration. The explicit tool endpoint is for advanced use cases where your LLM needs to search on-demand (e.g., RAG pipelines or agent frameworks with tool calling). See the Tool Integration guide for wiring these into agent frameworks like LangChain, Vercel AI SDK, and Gemini function calling.
Use any agent framework for orchestration and tool calling. Route conversation transcripts through Sonzai for persistent memory, personality, and proactive behavior across sessions.
Privacy-Sensitive Applications
Anonymize or redact conversation data before sending to your LLM. Only structured extractions (facts, mood deltas) are stored — no raw conversation text.
Custom LLM Providers
Run local models, fine-tuned models, or specialized providers for chat. Sonzai handles the intelligence layer regardless of your chat LLM.
Multi-Agent Systems
Each agent maintains its own memory tree, personality, and behavioral state. Cross-agent memory sharing enables collaborative intelligence.
There are two complementary ways your agent can access Sonzai knowledge and memory:
Automatic (Recommended)
Call GET /context with a query param. The endpoint automatically searches the knowledge base and injects recalled memories. The deferred learning loop primes the next context call with KB results that the agent missed. No tool calling needed.
Explicit Tool Calling
Register Sonzai tools with your LLM so it can search on demand mid-conversation. This is for agent frameworks (LangChain, Vercel AI SDK, CrewAI) where the LLM decides when to search. You fetch tool schemas from Sonzai and wire them into your framework.
When to use which?
Start with automatic enrichment — it covers most cases with zero configuration. Add explicit tool calling when your agent needs to search mid-conversation (e.g., the user asks a question not covered by the initial context fetch) or when your framework expects tool definitions.
Fetch the tool catalog for an agent. This returns JSON schemas in OpenAI function-calling format that you can pass directly to your LLM's tool configuration.
Search the agent's knowledge base for relevant documents and facts. Uses hybrid search (BM25 + semantic) when embeddings are available, falling back to BM25 full-text search.
Search the agent's memory for previously extracted facts about a user. This is a synchronous BM25 full-text search that returns immediately — no deferred processing.
Unlike KB enrichment (which has a deferred path), memory search returns immediately from BM25 indexes. There is no async component. The /context endpoint already includes the most relevant memories automatically — this tool is for cases where the LLM needs to search for additional facts mid-conversation.
from langchain_core.tools import toolfrom langchain_google_genai import ChatGoogleGenerativeAIfrom langgraph.prebuilt import create_react_agentfrom sonzai import Sonzaisonzai_client = Sonzai(api_key="sk_your_api_key")agent_id = "agent-id"user_id = "user-123"@tooldef knowledge_search(query: str, limit: int = 5) -> list[dict]: """Search the agent's knowledge base for relevant documents and facts. Use when the user asks about topics that may be in uploaded documents.""" results = sonzai_client.agents.knowledge_search(agent_id, query=query, limit=limit) return [{"content": r.content, "label": r.label, "score": r.score} for r in results.results]@tooldef memory_search(query: str) -> list[dict]: """Search agent memory for previously learned facts about the user. Use when the conversation references past interactions or personal details.""" results = sonzai_client.agents.memory.search(agent_id, query=query, user_id=user_id) return [{"content": f.content, "type": f.fact_type} for f in results.results]# Get enriched contextctx = sonzai_client.agents.get_context( agent_id, user_id=user_id, session_id="session-abc", query=user_message)llm = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview")agent = create_react_agent(llm, [knowledge_search, memory_search])result = agent.invoke({ "messages": [ {"role": "system", "content": build_system_prompt(ctx)}, {"role": "user", "content": user_message}, ]})
The most powerful aspect of standalone mode is the self-improving learning loop. Even without explicit tool calls, the agent gets smarter each turn because /process detects knowledge gaps and primes the next /context call.
One-shot signals: Deferred KB results are consumed when /context reads them. They appear exactly once, preventing stale or repeated information.
TTL-based expiry: Deferred signals expire after 1 hour. If the user doesn't continue the conversation, stale signals are automatically cleaned up.
Deduplication: If the direct /context query matches the same KB document as a deferred signal, the duplicate is removed. You never get the same result twice.
Capped searches: /process runs at most 5 KB queries per call and stores at most 10 deferred results, preventing resource explosion on topic-heavy conversations.
Unlike KB enrichment, memory search has no deferred/async path. When /context is called, it recalls the most relevant memories immediately using the hierarchical memory tree and BM25 indexes. When you call GET /memory/search explicitly, results return immediately.
The deferred behavior only applies to knowledge base content, where /process proactively discovers KB documents the agent should have known about. Memory facts are always available synchronously because they are indexed at write time (during /process).
Not necessarily. /context automatically includes KB results and recalled memories. Tool calling is useful when the LLM needs to search for something specific mid-conversation that wasn't covered by the initial context fetch, or when your framework expects tool definitions.
No. Memory search is always synchronous. When you call GET /memory/search, results return immediately from BM25 indexes. The deferred/async flow only applies to knowledge base enrichment via the /process learning loop.
The deferred signals expire after 1 hour (TTL-based cleanup). No stale data persists. If the user resumes the conversation later, they get fresh results from the next /context call.
Absolutely. The Sonzai tool schemas are standard OpenAI function definitions. Mix them with your own tools in whatever framework you use. The LLM decides which tool to call based on the conversation.
Custom tools (created via POST /agents/{agentId}/tools or the dashboard) are for agent-side tool calling in Sonzai's managed chat mode. The tool schemas described here (/tools/schemas) are for BYO-LLM mode where your LLM calls Sonzai endpoints.
A custom state is a key-value record scoped to an agent + user (or just an agent). Values can be any JSON-serializable type: strings, numbers, booleans, arrays, or nested objects.
Unlike memory (which is unstructured text extracted from conversations), custom states are structured data you write explicitly from your backend. The agent can read them via the get_custom_state tool during conversation, so it always knows the user's current tier, streak, balance, etc.
When the agent has access to the get_custom_state tool (enabled automatically when custom states exist), it fetches current state at the start of a conversation. You can also read it from your backend at any time.
// Read by key from your backend
const state = await client.agents.customStates.getByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
const progress = state.value as {
tier: string; score: number; score_to_next: number; streak_days: number;
};
console.log(`${progress.tier} tier · ${progress.score}/${progress.score_to_next} pts · ${progress.streak_days}-day streak`);
During conversation, the agent calls get_custom_state("user_progress") and incorporates the progress data into its responses naturally — no prompt injection required.
Use upsert from your backend whenever the user's state changes — after a session ends, after a purchase, or on a schedule. upsert creates the state if it doesn't exist, or replaces it if it does.
Workflow events let your backend tell the agent about something that happened outside the conversation. The next time the user chats, the agent sees the pending event and reacts naturally.
// Trigger from your backend when something notable happens
await client.agents.triggerGameEvent(AGENT_ID, {
userId: USER_ID,
eventType: "task_complete",
payload: {
task_name: "Q1 Revenue Analysis",
deliverable: "Revenue Report",
category: "Analytics",
time_taken: "3h 42m",
},
});
// Next time the user opens a conversation:
// Agent: "I see you finished the Q1 Revenue Analysis! That report is a key
// deliverable. Want to discuss the findings or start the next task?"
Event delivery
Workflow events are queued and delivered on the next conversation turn. They don't interrupt an active session. The agent consumes pending events at the start of the next chat or chatStream call and incorporates them into its opening message or first response.
Use update when you want to change a state by its state_id. Unlike upsert, update does a partial merge — you only need to pass the fields you want to change.
// Add a milestone without overwriting the whole state
const state = await client.agents.customStates.getByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
const progress = state.value as { milestones: string[]; [k: string]: unknown };
await client.agents.customStates.update(AGENT_ID, state.state_id, {
value: {
...progress,
milestones: [...progress.milestones, "100_tasks"],
},
});
Delete a state by its ID or by key. On next conversation, the agent won't have access to it.
// Delete by key (finds and removes the state)
await client.agents.customStates.deleteByKey(AGENT_ID, {
userId: USER_ID,
key: "user_progress",
});
// Or delete by state_id if you already have it
await client.agents.customStates.delete(AGENT_ID, stateId);
Create a onboarding state on sign-up with { step: 0, completed: false }. The agent checks it at the start of early conversations and guides the user through setup naturally.
Subscription context
Store { plan: 'pro', expires_at: '...' } so the agent knows which features to offer or upsell without you having to pass it in every chat request.
Daily summary cache
Write a daily_summary state at the end of each day with key metrics. The agent opens the next-day conversation referencing the user's activity — "Yesterday you completed 3 tasks and hit a 12-day streak. Ready to keep going?"
Schemas tell the KB what fields to extract and index for each entity type. Create one for software_license so the platform knows how to store and search your license data.
Insert your first set of entities using insertFacts. This is also how you load historical data before going live. Include relationships so the KB can surface alternative or complementary tool recommendations.
Run a cost-sync job on a schedule (e.g. daily cron) that fetches current pricing from your vendor data source and pushes it into the KB. bulkUpdate merges properties into existing nodes matched by label — no need to delete and re-insert.
// cost-sync.ts — run daily
import { Sonzai } from "@sonzai-labs/agents";
import { fetchLatestPricing } from "./vendor-api"; // your data source
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const PROJECT_ID = "proj_abc123";
async function syncPricing() {
const pricing = await fetchLatestPricing(); // [{ name, price, trend }]
await client.knowledge.bulkUpdate(PROJECT_ID, {
updates: pricing.map((license) => ({
entity_type: "software_license",
label: license.name,
properties: {
market_price: license.price,
trend_30d: license.trend,
last_synced: new Date().toISOString(),
},
// upsert: true — creates the node if it doesn't exist yet
upsert: true,
})),
});
console.log(`Synced ${pricing.length} license prices`);
}
syncPricing();
Batch size
Batches of ≤100 items are processed synchronously (immediate response). Larger batches are queued and processed asynchronously — the response includes a job ID you can poll for completion.
Enable the inventory and knowledge capabilities on your agent. This gives the agent the sonzai_inventory_update and sonzai_inventory tools automatically — no prompt engineering required.
const AGENT_ID = "agent_xyz";
await client.agents.updateCapabilities(AGENT_ID, {
inventory: true, // enables sonzai_inventory_update + sonzai_inventory tools
knowledge: true, // enables knowledge_search tool
project_id: PROJECT_ID, // which KB to join against
});
You can also set this from the dashboard: go to Agents → your agent → Capabilities and toggle Inventory on.
Once inventory is enabled, the agent calls sonzai_inventory_update on its own whenever a user mentions a tool or subscription they use. You just chat normally — the platform does the KB resolution and storage.
// Your backend chat endpoint
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: "user_123",
messages: [
{
role: "user",
content: "We just provisioned 10 Figma Enterprise seats at $75/seat.",
},
],
})) {
// The agent streams its reply — and internally calls
// sonzai_inventory_update({ action: "add", item_type: "software_license",
// description: "Figma Enterprise", properties: { plan: "Enterprise",
// purchase_price: 75, quantity: 10 } })
// The platform resolves the KB node, stores the link, and the agent
// continues the conversation without interruption.
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
How KB resolution works
The platform searches the KB for the item description. If exactly one node matches, it links automatically. If there are multiple candidates, the response returns status: "disambiguation_needed" with a list of candidates so the agent can ask the user to clarify.
Use mode="value" to get each user resource joined with the latest KB pricing data. The platform computes gain_loss automatically: (market_price - purchase_price) × quantity.
You can also use mode="aggregate" with the aggregations parameter to get portfolio-level totals without listing every resource — useful for organizations with many subscriptions.
If a user already has an existing set of subscriptions (from a CSV, a procurement system export, etc.), import them in bulk rather than waiting for the agent to discover each resource in conversation.
The batch endpoint processes up to 1,000 items per call. For larger imports, split into multiple calls or use the CSV priming feature in the dashboard.
Start chatting. Memory extraction happens automatically after the response streams. Nothing special needed on your end.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const AGENT_ID = "agent_abc";
const USER_ID = "user_123";
// First conversation — agent has no memory yet
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: USER_ID,
messages: [
{ role: "user", content: "My name is Mia. I'm allergic to peanuts and I love hiking." },
],
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
// Platform extracts: name="Mia", allergy="peanuts", interest="hiking"
// Second conversation — agent recalls all of the above
for await (const event of client.agents.chatStream(AGENT_ID, {
userId: USER_ID,
messages: [
{ role: "user", content: "What snacks should I bring on my next hike?" },
],
})) {
process.stdout.write(event.choices?.[0]?.delta?.content ?? "");
}
// Agent knows Mia loves hiking and is allergic to peanuts — no re-intro needed.
Memory is per-user
Facts extracted from user A's conversation are never surfaced to user B. Always pass userId (or user_id / UserID) in every chat call so the platform scopes memory correctly.
If a user has history in your system — a CRM profile, onboarding answers, past orders — inject it before the first conversation so the agent feels like it already knows them.
// Call once during onboarding or after CRM import
await client.agents.memory.seed(AGENT_ID, {
userId: USER_ID,
memories: [
{
content: "Mia is a 32-year-old UX designer based in Berlin.",
type: "user_fact",
},
{
content: "Mia subscribed to the Pro plan on 2024-11-03.",
type: "shared_experience",
occurred_at: "2024-11-03T00:00:00Z",
},
{
content: "Mia prefers email over SMS for notifications.",
type: "user_preference",
},
{
content: "Mia mentioned she wants to get into trail running.",
type: "user_goal",
},
],
});
Query the memory store directly to find what the agent has extracted about a topic. Useful for building user-facing "what does my agent remember?" features or for debugging.
const results = await client.agents.memory.search(AGENT_ID, {
query: "diet restrictions food allergies",
userId: USER_ID,
limit: 10,
});
for (const fact of results.facts) {
console.log(`[${fact.type}] ${fact.content} (confidence: ${fact.confidence})`);
}
// [user_fact] Mia is allergic to peanuts (confidence: 0.97)
// [user_preference] Mia prefers nut-free snacks on hikes (confidence: 0.85)
The memory tree is a 7-level hierarchy that organises facts by category (/identity/traits, /preferences/interests, /episodes/sessions, etc.). You can walk it node by node.
// Get top-level nodes
const tree = await client.agents.memory.list(AGENT_ID, {
userId: USER_ID,
includeContents: false, // just node metadata, no fact text
});
for (const node of tree.nodes) {
console.log(`${node.path} — ${node.fact_count} facts`);
}
// /identity/traits — 3 facts
// /preferences/interests — 5 facts
// /episodes/sessions — 12 facts
// /temporal — 2 facts
// Drill into a node
const identityNode = await client.agents.memory.list(AGENT_ID, {
userId: USER_ID,
parentId: "node_identity_traits_id",
includeContents: true, // include fact text
});
You can explore the memory tree interactively in the dashboard under Agents → your agent → Users → select user → Memory → Tree Explorer.
The timeline shows every fact in chronological order — when it was created, updated, or superseded. Use it to audit memory growth or build a "conversation history" view.
const timeline = await client.agents.memory.timeline(AGENT_ID, {
userId: USER_ID,
// Optional: narrow to a date range
start: "2025-01-01T00:00:00Z",
end: "2025-12-31T23:59:59Z",
});
for (const entry of timeline.entries) {
console.log(
`${new Date(entry.created_at).toLocaleDateString()} — ${entry.type}: ${entry.content}`
);
}
For admin UIs or compliance exports, list all raw facts for a user without going through the tree hierarchy. Supports filtering by category.
// All facts for this user (paginated)
const facts = await client.agents.memory.listFacts(AGENT_ID, {
userId: USER_ID,
limit: 50,
offset: 0,
category: "user_preference", // optional filter
});
console.log(`Total facts: ${facts.total}`);
for (const f of facts.facts) {
console.log(` ${f.content}`);
}
GDPR / right to erasure
To delete all memory for a user, call client.agents.memory.reset(agentId, { userId }). This creates tombstone records that prevent deleted facts from being re-surfaced; the data is removed from retrieval immediately.
The time machine lets you see what the agent knew about a user at any specific point in the past — useful for debugging why the agent said something, or for auditing how its understanding evolved.
const snapshot = await client.agents.getTimeMachine(AGENT_ID, {
userId: USER_ID,
at: "2025-03-01T00:00:00Z", // what did the agent know at this moment?
});
console.log("Known facts at 2025-03-01:");
for (const fact of snapshot.facts) {
console.log(` ${fact.content}`);
}
How supersession works
When a fact is updated, the old record is retired (not deleted) and a new one is created with a SupersedesID pointer. The time machine replays this chain to reconstruct the state at any timestamp.
const audio = await client.voice.tts("agent-id", {
text: "Hello! How can I help you today?",
voiceName: "aria",
language: "en",
outputFormat: "mp3",
});
// audio.data contains the audio bytes
Real-time duplex voice conversation. Get a token, then open a bidirectional stream.
// 1. Get a streaming token
const token = await client.voice.getToken("agent-id", {
voiceName: "aria",
userId: "user-123",
});
// 2. Connect to live stream
const stream = await client.voice.stream(token);
// Send audio chunks
stream.sendAudio(audioChunk);
// Or send text for the agent to speak
stream.sendText("Tell me about your day");
// Receive events
for await (const event of stream) {
if (event.type === "audio") {
playAudio(event.data);
} else if (event.type === "transcript") {
console.log(event.text);
}
}
// End session
stream.endSession();
WebSocket Transport
Live streaming is powered by WebSocket and supports real-time duplex audio. The client sends microphone audio chunks upstream while simultaneously receiving synthesized speech and transcripts downstream, enabling natural conversational flow.
Register webhook URLs to receive HTTP POST callbacks when events occur. The platform sends a signed JSON payload to your endpoint whenever a subscribed event fires, enabling real-time integrations without polling.
Agents can generate proactive messages triggered by wakeups, mood shifts, or other internal events. Poll for pending notifications when push delivery is not feasible.
// List pending notifications
const notifications = await client.notifications.list("agent-id", {
userId: "user-123",
status: "pending",
});
for (const notif of notifications.items) {
console.log(notif.type, notif.content);
// Mark as consumed after processing
await client.notifications.consume("agent-id", notif.id);
}
// View notification history
const history = await client.notifications.history("agent-id", {
userId: "user-123",
limit: 50,
});
The agent automatically schedules its own wakeups based on relationship context, conversation patterns, and emotional signals — all handled by the context engine. An agent.wakeup webhook fires when one triggers.
Manual scheduling below is supplementary — use it when you want to trigger specific outreach tied to a business event (e.g., a follow-up after a purchase, a birthday check-in, or a recurring workflow reminder).
Schedule a proactive check-in so the agent reaches out to a user at a specific time:
const wakeup = await client.agents.scheduleWakeup("agent-id", {
userId: "user-123",
checkType: "interest_check",
intent: "Check in about the job interview preparation",
delayHours: 24, // schedule 24 hours from now
});
Webhooks vs. Notifications
Webhooks (push) deliver events to your server in real time via HTTP POST. Use webhooks when you have a server that can receive callbacks and you need instant notification of events.
Notifications (poll) queue proactive agent messages for you to fetch on demand. Use polling when your client cannot receive inbound HTTP requests (e.g., mobile apps, browser clients) or when you want to batch-process notifications on your own schedule.