Standalone Memory Layer
Use the Context Engine as a pure memory and behavioral intelligence layer while chatting with your own LLM. You control the conversation — we handle memory, personality evolution, mood, habits, goals, relationships, and proactive outreach.
How It Works
In standalone mode, you bring your own LLM for chat generation but route conversation transcripts through our Context Engine for extraction and behavioral processing. This lets you:
- Anonymize or transform data before sending to your own LLM
- Use any LLM provider (Gemini, Anthropic, local models, etc.)
- Still get full behavioral intelligence — memory, personality evolution, mood tracking, habit detection, goal tracking, relationship dynamics, and proactive outreach
- Bill extraction through our managed LLM (you choose the provider and model)
Architecture Flow
┌─────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Your App │ │ Sonzai API │ │ Your LLM │
└──────┬──────┘ └────────┬─────────┘ └──────┬───────┘
│ │ │
│ 1. Start Session │ │
│────────────────────>│ │
│ │ │
│ 2. Get Context │ │
│────────────────────>│ │
│ <── enriched ctx ──│ │
│ (personality, │ │
│ mood, memory, │ │
│ goals, habits) │ │
│ │ │
│ 3. Chat ───────────┼──────────────────────>│
│ <── LLM response ──┼───────────────────────│
│ │ │
│ 4. Process │ │
│────────────────────>│ │
│ (transcript) │── extract side ──> │
│ │ effects with │
│ │ our LLM │
│ <── extractions ───│ │
│ (facts, mood, │ │
│ personality, │ │
│ habits, goals) │ │
│ │ │
│ 5. End Session │ │
│────────────────────>│ │
│ │── consolidate │
│ │ long-term memory │
│ │ (Sonzai LLM) │
└─────────────────────┴───────────────────────┘Billing Model
Steps 4 and 5 use Sonzai's LLM (billed to your account) — extraction and consolidation. Steps 1, 2, and 3 use no Sonzai LLM credits. Your own LLM costs in Step 3 are entirely yours.
When to Use Standalone Mode
Want to use your own model? Before choosing standalone mode, consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.) while keeping the full managed experience — built-in tools, streaming, per-message extraction, and memory prewarming all work automatically.
Standalone mode is designed for the narrow set of scenarios where you mustcontrol the entire chat loop yourself. The managed mode (using Sonzai's LLM or Custom LLM) provides a significantly richer experience. Choose standalone only when:
Privacy & Data Preprocessing
You need to anonymize, redact PII, or transform conversation data before it reaches any LLM. Standalone lets you intercept and sanitize the enriched context before sending to your own model.
Regulatory Requirements
Compliance mandates that conversation data never leaves your infrastructure for chat generation, while still allowing metadata extraction via Sonzai's LLM.
Deep Agent Framework Integration
Your architecture requires an agent framework (LangChain, CrewAI, Vercel AI SDK) that manages its own LLM loop, tool orchestration, and multi-step reasoning.
Custom Prompt Pipeline
You need full control over prompt construction, few-shot examples, chain-of-thought, or multi-model routing that goes beyond what the managed chat supports.
Limitations vs. Managed Mode
Standalone mode trades convenience for control. If you just want to use your own model, use Custom LLM instead — it gives you the full managed experience with your endpoint. Only choose standalone if you need to preprocess data or control the entire chat loop.
No built-in tool execution
Built-in tools (web search, memory recall, image generation, inventory) are called automatically by the LLM during chat and executed by the platform.
Built-in tools are unavailable. You must implement tool calling yourself using the tool schemas endpoint. See the Tool Integration guide.
Batch extraction instead of per-message
Side effects (memory, mood, personality, habits) are extracted inline after every message — the agent evolves in real time.
Side effects are extracted in batch when you call /process. If you send multiple messages before calling /process, behavioral updates are delayed.
No streaming
Chat responses stream via SSE with real-time deltas. Side effects (mood changes, emotional themes) appear as live events during the stream.
/context and /process are synchronous request-response calls. There is no streaming — you handle streaming with your own LLM.
Deferred knowledge base enrichment
Knowledge base content is available to the LLM's built-in tools in real time during the conversation.
KB enrichment is deferred — /process detects knowledge gaps, but results only appear in the next /context call, not the current turn.
Memory prewarming requires session lifecycle
Memory bundles are prewarmed automatically on every chat request for near-instant retrieval (~10ms vs ~2000ms cold).
Memory prewarming triggers when you call /sessions/start and caches for 2 hours. You must explicitly start a session to benefit — skipping session start means cold context builds every time.
Manual session lifecycle
Session start/end, message history caching, and consolidation triggers are handled automatically.
You must explicitly call /sessions/start, /sessions/end, and /process at the right times. Missing these calls means lost behavioral data.
What's the same in both modes?
The extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood changes, habit detection, and consolidation. The 7-layer enriched context returned by /context is the same data the managed chat builds internally. The difference is in tooling, streaming, and real-time responsiveness — not in the intelligence of the memory layer itself.
Step 1 — Start Session
Initialize a session to begin tracking memory and behavior for a user-agent pair.
import { Sonzai } from "@sonzai-labs/agents";
const client = new Sonzai({ apiKey: "sk_your_api_key" });
await client.agents.sessions.start("agent-id", {
userId: "user-123",
sessionId: "session-abc",
userDisplayName: "Alice",
});Step 2 — Get Enriched Context
Fetch the full 7-layer enriched context. This includes personality traits, current mood, relevant memories, active goals, detected habits, relationship state, and proactive signals. Use this to construct your own system prompt.
const context = await client.agents.getContext("agent-id", {
userId: "user-123",
sessionId: "session-abc",
query: "What should we talk about?", // current user message
});
// context.layers contains:
// profile — agent identity, Big5 personality, speech patterns
// behavioral — current mood, habits, goals, interests
// relationship — love scores, narrative arc
// evolution — recent personality shifts
// memory — recalled facts, long-term summaries
// proactive — pending wakeup intents
// game — custom game state (if set)
// Build your own system prompt with this context
const systemPrompt = `You are ${context.profile.name}.
Personality: ${JSON.stringify(context.profile.big5)}
Current mood: ${JSON.stringify(context.behavioral.mood)}
Relevant memories: ${context.memory.facts.map(f => f.text).join("\n")}
`;Step 3 — Chat with Your Own LLM
Send the enriched context to your own LLM. This step happens entirely on your infrastructure — you can anonymize, transform, or filter the context however you need.
// Example: using Google Gemini with Sonzai context
import { GoogleGenAI } from "@google/genai";
const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
const response = await gemini.models.generateContent({
model: "gemini-3.1-flash-lite-preview",
contents: [
{ role: "user", parts: [{ text: systemPrompt + "\n\n" + userMessage }] },
],
});
const assistantMessage = response.text;Privacy Note
You have full control over what data reaches your LLM. Strip PII, redact sensitive facts, or anonymize the context before sending. The Context Engine only sees the transcript you send back in Step 4.
Step 4 — Process the Conversation
Send the conversation transcript back to the Context Engine. We extract memories, personality shifts, mood changes, habits, interests, relationship dynamics, and proactive signals using our managed LLM. You choose which provider and model to use.
const result = await client.agents.process("agent-id", {
userId: "user-123",
sessionId: "session-abc",
messages: [
{ role: "user", content: userMessage },
{ role: "assistant", content: assistantMessage },
],
provider: "gemini", // our LLM for extraction
model: "gemini-3.1-flash-lite-preview",
includeExtractions: true, // get full extraction details back
});
console.log(result.memories_created); // 3
console.log(result.side_effects); // { mood_updated: true, ... }
console.log(result.extractions); // full details (when requested)
// Extraction includes:
// memory_facts — new facts extracted from conversation
// personality_deltas — Big5 trait shifts with reasoning
// mood_delta — 4D mood change (happiness, energy, calmness, affection)
// habit_observations — detected behavioral patterns
// interests_detected — topics the user engaged with
// relationship_delta — love score change with reason
// proactive_suggestions — scheduled check-ins or follow-ups
// emotional_themes — detected emotional tonesStep 5 — End Session
Close the session to trigger consolidation — the engine summarizes the conversation, detects breakthroughs, and stores long-term memories.
await client.agents.sessions.end("agent-id", {
userId: "user-123",
sessionId: "session-abc",
totalMessages: 12,
durationSeconds: 600,
});Reading Behavioral Data
After processing conversations, all behavioral data is available via dedicated endpoints. Use these to display agent state in your UI, drive game mechanics, or feed into your own systems.
Memory & Facts
// List all memories
const memory = await client.agents.memory.list("agent-id", {
userId: "user-123",
});
// memory.facts — array of extracted atomic facts
// memory.tree — hierarchical memory structure
// Search memories
const results = await client.agents.memory.search("agent-id", {
userId: "user-123",
query: "hiking",
});Personality & Mood
// Big5 personality
const personality = await client.agents.personality.get("agent-id");
// { openness: 72, conscientiousness: 65, extraversion: 80, ... }
// Current mood (4D)
const mood = await client.agents.getMood("agent-id", { userId: "user-123" });
// { valence: 0.7, arousal: 0.3, tension: -0.1, affiliation: 0.5 }
// Recent personality shifts
const shifts = await client.agents.personality.recentShifts("agent-id");
// Significant moments (breakthroughs)
const moments = await client.agents.personality.significantMoments("agent-id");Goals, Habits & Relationships
// Goals and breakthroughs
const goals = await client.agents.getGoals("agent-id");
const breakthroughs = await client.agents.getBreakthroughs("agent-id");
// Habits
const habits = await client.agents.getHabits("agent-id", { userId: "user-123" });
// Interests
const interests = await client.agents.getInterests("agent-id");
// Relationships
const relationships = await client.agents.getRelationships("agent-id");Consolidation & Diary
// Daily/weekly consolidation summaries
const summaries = await client.agents.getConsolidation("agent-id", {
period: "daily",
});
// AI-generated diary entries
const diary = await client.agents.getDiary("agent-id", {
userId: "user-123",
});Proactive Notifications
The Context Engine schedules proactive outreach (check-ins, follow-ups) based on conversation patterns. Poll for pending notifications and consume them when delivered.
// List pending notifications
const notifications = await client.agents.notifications.list("agent-id");
for (const notif of notifications) {
// Deliver to user via your channel (push, email, in-app, etc.)
await deliverToUser(notif.user_id, notif.message);
// Mark as consumed
await client.agents.notifications.consume("agent-id", notif.message_id);
}
// View notification history
const history = await client.agents.notifications.history("agent-id");Choosing an Extraction Model
When calling /process, you can specify which of our LLM providers to use for extraction. List available models:
const models = await client.agents.getModels("agent-id");
// {
// default_provider: "gemini",
// default_model: "gemini-3.1-flash-lite-preview",
// providers: [
// { provider: "gemini", provider_name: "Google Gemini", default_model: "..." },
// { provider: "zhipu", provider_name: "Zhipu AI", default_model: "..." },
// ...
// ]
// }Knowledge Base in Standalone Mode
When your agent has a knowledge base configured, it works automatically in standalone mode through two mechanisms:
Automatic: KB results in /context
When you call GET /context with a query parameter, the endpoint automatically searches the agent's knowledge base and includes matching results in a knowledge field:
{
"profile": { ... },
"memory": { ... },
"knowledge": {
"results": [
{
"content": "Refund policy: customers can request a full refund within 30 days...",
"label": "Refund Policy",
"type": "policy",
"source": "policies.pdf",
"score": 0.92
}
]
}
}Learning Loop: /process detects knowledge gaps
After /process extracts side effects, it also searches the KB with topics and entities found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals. The next /context call automatically includes them — so the agent gets smarter with each turn.
Turn 1: /context → (no KB results yet)
↓
chat with your LLM
↓
/process → extracts "hiking gear" as topic
→ searches KB, finds "Hiking Equipment Guide"
→ stores as proactive signal
Turn 2: /context → includes "Hiking Equipment Guide" from KB
+ any direct search results for the new query
↓
chat with your LLM (now knows about hiking gear!)Explicit: Tool endpoint for agent frameworks
For frameworks like OpenClaw where the LLM can call tools, use the standalone knowledge search endpoint:
const results = await client.agents.knowledgeSearch("agent-id", {
query: "refund policy",
limit: 5,
});
for (const result of results.results) {
console.log(result.label, result.content);
}How it all fits together
The automatic /context inclusion and learning loop handle most cases with zero configuration. The explicit tool endpoint is for advanced use cases where your LLM needs to search on-demand (e.g., RAG pipelines or agent frameworks with tool calling). See the Tool Integration guide for wiring these into agent frameworks like LangChain, Vercel AI SDK, and Gemini function calling.
What Gets Extracted
The /process endpoint extracts the following from each conversation turn:
Memory Facts
Atomic facts (preferences, events, commitments) with importance scoring, deduplication, and topic tagging.
Personality Deltas
Big5 trait shifts (openness, conscientiousness, extraversion, agreeableness, neuroticism) with reasoning.
Mood Changes
4D mood delta (happiness, energy, calmness, affection) with trigger identification.
Habit Detection
New and reinforced behavioral patterns — exercise routines, reading habits, social patterns.
Interest Tracking
Topics the user engages with, categorized by domain with confidence and engagement scores.
Relationship Dynamics
Love score changes with reasoning — tracks rapport, trust, and emotional connection.
Proactive Outreach
Scheduled check-ins and follow-ups based on conversation context (e.g., 'ask about the hike tomorrow').
Emotional Themes
Detected emotional tones — joy, creative spark, feeling overwhelmed, seeking connection, etc.
Use Cases
OpenClaw / Agent Frameworks
Use any agent framework for orchestration and tool calling. Route conversation transcripts through Sonzai for persistent memory, personality, and proactive behavior across sessions.
Privacy-Sensitive Applications
Anonymize or redact conversation data before sending to your LLM. Only structured extractions (facts, mood deltas) are stored — no raw conversation text.
Custom LLM Providers
Run local models, fine-tuned models, or specialized providers for chat. Sonzai handles the intelligence layer regardless of your chat LLM.
Multi-Agent Systems
Each agent maintains its own memory tree, personality, and behavioral state. Cross-agent memory sharing enables collaborative intelligence.
Complete Example
Full BYO-LLM loop with Gemini for chat and Sonzai for memory:
import { Sonzai } from "@sonzai-labs/agents";
import { GoogleGenAI } from "@google/genai";
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
async function chat(agentId: string, userId: string, userMessage: string) {
const sessionId = `session-${Date.now()}`;
// 1. Start session
await sonzai.agents.sessions.start(agentId, { userId, sessionId });
// 2. Get enriched context
const ctx = await sonzai.agents.getContext(agentId, {
userId,
sessionId,
query: userMessage,
});
// 3. Chat with your own LLM
const completion = await gemini.models.generateContent({
model: "gemini-3.1-flash-lite-preview",
contents: [
{ role: "user", parts: [{ text: buildPrompt(ctx) + "\n\n" + userMessage }] },
],
});
const reply = completion.text!;
// 4. Process — extract memories, mood, personality, habits
const result = await sonzai.agents.process(agentId, {
userId,
sessionId,
messages: [
{ role: "user", content: userMessage },
{ role: "assistant", content: reply },
],
provider: "gemini",
includeExtractions: true,
});
console.log(`Memories created: ${result.memories_created}`);
console.log(`Mood updated: ${result.side_effects.mood_updated}`);
// 5. End session
await sonzai.agents.sessions.end(agentId, {
userId,
sessionId,
totalMessages: 2,
durationSeconds: 30,
});
return reply;
}
function buildPrompt(ctx: any): string {
return `You are ${ctx.profile?.name || "an AI companion"}.
Personality: ${JSON.stringify(ctx.profile?.big5 || {})}
Mood: ${JSON.stringify(ctx.behavioral?.mood || {})}
Memories:\n${(ctx.memory?.facts || []).map((f: any) => `- ${f.text}`).join("\n")}
`;
}