Standalone Memory Layer

Use the Context Engine as a pure memory and behavioral intelligence layer while chatting with your own LLM. You control the conversation — we handle memory, personality evolution, mood, habits, goals, relationships, and proactive outreach.

How It Works

In standalone mode, you bring your own LLM for chat generation but route conversation transcripts through our Context Engine for extraction and behavioral processing. This lets you:

Anonymize or transform data before sending to your own LLM
Use any LLM provider (Gemini, Anthropic, local models, etc.)
Still get full behavioral intelligence — memory, personality evolution, mood tracking, habit detection, goal tracking, relationship dynamics, and proactive outreach
Bill extraction through our managed LLM (you choose the provider and model)

Architecture Flow

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Your App   │     │   Sonzai API     │     │   Your LLM   │
└──────┬──────┘     └────────┬─────────┘     └──────┬───────┘
       │                     │                       │
       │  1. Start Session   │                       │
       │────────────────────>│                       │
       │                     │                       │
       │  2. Get Context     │                       │
       │────────────────────>│                       │
       │  <── enriched ctx ──│                       │
       │    (personality,    │                       │
       │     mood, memory,   │                       │
       │     goals, habits)  │                       │
       │                     │                       │
       │  3. Chat ───────────┼──────────────────────>│
       │  <── LLM response ──┼───────────────────────│
       │                     │                       │
       │  4. Process         │                       │
       │────────────────────>│                       │
       │    (transcript)     │── extract side ──>    │
       │                     │   effects with        │
       │                     │   our LLM             │
       │  <── extractions ───│                       │
       │    (facts, mood,    │                       │
       │     personality,    │                       │
       │     habits, goals)  │                       │
       │                     │                       │
       │  5. End Session     │                       │
       │────────────────────>│                       │
       │                     │── consolidate         │
       │                     │   long-term memory    │
       │                     │   (Sonzai LLM)        │
       └─────────────────────┴───────────────────────┘

Billing Model

Steps 4 and 5 use Sonzai's LLM (billed to your account) — extraction and consolidation. Steps 1, 2, and 3 use no Sonzai LLM credits. Your own LLM costs in Step 3 are entirely yours.

When to Use Standalone Mode

Want to use your own model? Before choosing standalone mode, consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint (vLLM, Ollama, Together, Groq, Azure OpenAI, etc.) while keeping the full managed experience — built-in tools, streaming, per-message extraction, and memory prewarming all work automatically.

Standalone mode is designed for the narrow set of scenarios where you mustcontrol the entire chat loop yourself. The managed mode (using Sonzai's LLM or Custom LLM) provides a significantly richer experience. Choose standalone only when:

Privacy & Data Preprocessing

You need to anonymize, redact PII, or transform conversation data before it reaches any LLM. Standalone lets you intercept and sanitize the enriched context before sending to your own model.

Regulatory Requirements

Compliance mandates that conversation data never leaves your infrastructure for chat generation, while still allowing metadata extraction via Sonzai's LLM.

Deep Agent Framework Integration

Your architecture requires an agent framework (LangChain, CrewAI, Vercel AI SDK) that manages its own LLM loop, tool orchestration, and multi-step reasoning.

Custom Prompt Pipeline

You need full control over prompt construction, few-shot examples, chain-of-thought, or multi-model routing that goes beyond what the managed chat supports.

Limitations vs. Managed Mode

Standalone mode trades convenience for control. If you just want to use your own model, use Custom LLM instead — it gives you the full managed experience with your endpoint. Only choose standalone if you need to preprocess data or control the entire chat loop.

No built-in tool execution

Managed mode

Built-in tools (web search, memory recall, image generation, inventory) are called automatically by the LLM during chat and executed by the platform.

Standalone mode

Built-in tools are unavailable. You must implement tool calling yourself using the tool schemas endpoint. See the Tool Integration guide.

Batch extraction instead of per-message

Managed mode

Side effects (memory, mood, personality, habits) are extracted inline after every message — the agent evolves in real time.

Standalone mode

Side effects are extracted in batch when you call /process. If you send multiple messages before calling /process, behavioral updates are delayed.

No streaming

Managed mode

Chat responses stream via SSE with real-time deltas. Side effects (mood changes, emotional themes) appear as live events during the stream.

Standalone mode

/context and /process are synchronous request-response calls. There is no streaming — you handle streaming with your own LLM.

Deferred knowledge base enrichment

Managed mode

Knowledge base content is available to the LLM's built-in tools in real time during the conversation.

Standalone mode

KB enrichment is deferred — /process detects knowledge gaps, but results only appear in the next /context call, not the current turn.

Memory prewarming requires session lifecycle

Managed mode

Memory bundles are prewarmed automatically on every chat request for near-instant retrieval (~10ms vs ~2000ms cold).

Standalone mode

Memory prewarming triggers when you call /sessions/start and caches for 2 hours. You must explicitly start a session to benefit — skipping session start means cold context builds every time.

Manual session lifecycle

Managed mode

Session start/end, message history caching, and consolidation triggers are handled automatically.

Standalone mode

You must explicitly call /sessions/start, /sessions/end, and /process at the right times. Missing these calls means lost behavioral data.

What's the same in both modes?

The extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood changes, habit detection, and consolidation. The 7-layer enriched context returned by /context is the same data the managed chat builds internally. The difference is in tooling, streaming, and real-time responsiveness — not in the intelligence of the memory layer itself.

Step 1 — Start Session

Initialize a session to begin tracking memory and behavior for a user-agent pair.

import { Sonzai } from "@sonzai-labs/agents";

const client = new Sonzai({ apiKey: "sk_your_api_key" });

await client.agents.sessions.start("agent-id", {
  userId: "user-123",
  sessionId: "session-abc",
  userDisplayName: "Alice",
});

Step 2 — Get Enriched Context

Fetch the full 7-layer enriched context. This includes personality traits, current mood, relevant memories, active goals, detected habits, relationship state, and proactive signals. Use this to construct your own system prompt.

const context = await client.agents.getContext("agent-id", {
  userId: "user-123",
  sessionId: "session-abc",
  query: "What should we talk about?", // current user message
});

// context.layers contains:
//   profile     — agent identity, Big5 personality, speech patterns
//   behavioral  — current mood, habits, goals, interests
//   relationship — love scores, narrative arc
//   evolution   — recent personality shifts
//   memory      — recalled facts, long-term summaries
//   proactive   — pending wakeup intents
//   game        — custom game state (if set)

// Build your own system prompt with this context
const systemPrompt = `You are ${context.profile.name}.
Personality: ${JSON.stringify(context.profile.big5)}
Current mood: ${JSON.stringify(context.behavioral.mood)}
Relevant memories: ${context.memory.facts.map(f => f.text).join("\n")}
`;

Step 3 — Chat with Your Own LLM

Send the enriched context to your own LLM. This step happens entirely on your infrastructure — you can anonymize, transform, or filter the context however you need.

// Example: using Google Gemini with Sonzai context
import { GoogleGenAI } from "@google/genai";

const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

const response = await gemini.models.generateContent({
  model: "gemini-3.1-flash-lite-preview",
  contents: [
    { role: "user", parts: [{ text: systemPrompt + "\n\n" + userMessage }] },
  ],
});

const assistantMessage = response.text;

Privacy Note

You have full control over what data reaches your LLM. Strip PII, redact sensitive facts, or anonymize the context before sending. The Context Engine only sees the transcript you send back in Step 4.

Step 4 — Process the Conversation

Send the conversation transcript back to the Context Engine. We extract memories, personality shifts, mood changes, habits, interests, relationship dynamics, and proactive signals using our managed LLM. You choose which provider and model to use.

const result = await client.agents.process("agent-id", {
  userId: "user-123",
  sessionId: "session-abc",
  messages: [
    { role: "user", content: userMessage },
    { role: "assistant", content: assistantMessage },
  ],
  provider: "gemini",           // our LLM for extraction
  model: "gemini-3.1-flash-lite-preview",
  includeExtractions: true,     // get full extraction details back
});

console.log(result.memories_created);      // 3
console.log(result.side_effects);          // { mood_updated: true, ... }
console.log(result.extractions);           // full details (when requested)

// Extraction includes:
//   memory_facts          — new facts extracted from conversation
//   personality_deltas    — Big5 trait shifts with reasoning
//   mood_delta            — 4D mood change (happiness, energy, calmness, affection)
//   habit_observations    — detected behavioral patterns
//   interests_detected    — topics the user engaged with
//   relationship_delta    — love score change with reason
//   proactive_suggestions — scheduled check-ins or follow-ups
//   emotional_themes      — detected emotional tones

Step 5 — End Session

Close the session to trigger consolidation — the engine summarizes the conversation, detects breakthroughs, and stores long-term memories.

await client.agents.sessions.end("agent-id", {
  userId: "user-123",
  sessionId: "session-abc",
  totalMessages: 12,
  durationSeconds: 600,
});

Reading Behavioral Data

After processing conversations, all behavioral data is available via dedicated endpoints. Use these to display agent state in your UI, drive game mechanics, or feed into your own systems.

Memory & Facts

// List all memories
const memory = await client.agents.memory.list("agent-id", {
  userId: "user-123",
});
// memory.facts — array of extracted atomic facts
// memory.tree  — hierarchical memory structure

// Search memories
const results = await client.agents.memory.search("agent-id", {
  userId: "user-123",
  query: "hiking",
});

Personality & Mood

// Big5 personality
const personality = await client.agents.personality.get("agent-id");
// { openness: 72, conscientiousness: 65, extraversion: 80, ... }

// Current mood (4D)
const mood = await client.agents.getMood("agent-id", { userId: "user-123" });
// { valence: 0.7, arousal: 0.3, tension: -0.1, affiliation: 0.5 }

// Recent personality shifts
const shifts = await client.agents.personality.recentShifts("agent-id");

// Significant moments (breakthroughs)
const moments = await client.agents.personality.significantMoments("agent-id");

Goals, Habits & Relationships

// Goals and breakthroughs
const goals = await client.agents.getGoals("agent-id");
const breakthroughs = await client.agents.getBreakthroughs("agent-id");

// Habits
const habits = await client.agents.getHabits("agent-id", { userId: "user-123" });

// Interests
const interests = await client.agents.getInterests("agent-id");

// Relationships
const relationships = await client.agents.getRelationships("agent-id");

Consolidation & Diary

// Daily/weekly consolidation summaries
const summaries = await client.agents.getConsolidation("agent-id", {
  period: "daily",
});

// AI-generated diary entries
const diary = await client.agents.getDiary("agent-id", {
  userId: "user-123",
});

Proactive Notifications

The Context Engine schedules proactive outreach (check-ins, follow-ups) based on conversation patterns. Poll for pending notifications and consume them when delivered.

// List pending notifications
const notifications = await client.agents.notifications.list("agent-id");

for (const notif of notifications) {
  // Deliver to user via your channel (push, email, in-app, etc.)
  await deliverToUser(notif.user_id, notif.message);

  // Mark as consumed
  await client.agents.notifications.consume("agent-id", notif.message_id);
}

// View notification history
const history = await client.agents.notifications.history("agent-id");

Choosing an Extraction Model

When calling /process, you can specify which of our LLM providers to use for extraction. List available models:

const models = await client.agents.getModels("agent-id");
// {
//   default_provider: "gemini",
//   default_model: "gemini-3.1-flash-lite-preview",
//   providers: [
//     { provider: "gemini", provider_name: "Google Gemini", default_model: "..." },
//     { provider: "zhipu", provider_name: "Zhipu AI", default_model: "..." },
//     ...
//   ]
// }

Knowledge Base in Standalone Mode

When your agent has a knowledge base configured, it works automatically in standalone mode through two mechanisms:

Automatic: KB results in /context

When you call GET /context with a query parameter, the endpoint automatically searches the agent's knowledge base and includes matching results in a knowledge field:

{
  "profile": { ... },
  "memory": { ... },
  "knowledge": {
    "results": [
      {
        "content": "Refund policy: customers can request a full refund within 30 days...",
        "label": "Refund Policy",
        "type": "policy",
        "source": "policies.pdf",
        "score": 0.92
      }
    ]
  }
}

Learning Loop: /process detects knowledge gaps

After /process extracts side effects, it also searches the KB with topics and entities found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals. The next /context call automatically includes them — so the agent gets smarter with each turn.

Turn 1: /context → (no KB results yet)
         ↓
        chat with your LLM
         ↓
        /process → extracts "hiking gear" as topic
                   → searches KB, finds "Hiking Equipment Guide"
                   → stores as proactive signal

Turn 2: /context → includes "Hiking Equipment Guide" from KB
                   + any direct search results for the new query
         ↓
        chat with your LLM (now knows about hiking gear!)

Explicit: Tool endpoint for agent frameworks

For frameworks like OpenClaw where the LLM can call tools, use the standalone knowledge search endpoint:

const results = await client.agents.knowledgeSearch("agent-id", {
  query: "refund policy",
  limit: 5,
});

for (const result of results.results) {
  console.log(result.label, result.content);
}

How it all fits together

The automatic /context inclusion and learning loop handle most cases with zero configuration. The explicit tool endpoint is for advanced use cases where your LLM needs to search on-demand (e.g., RAG pipelines or agent frameworks with tool calling). See the Tool Integration guide for wiring these into agent frameworks like LangChain, Vercel AI SDK, and Gemini function calling.

What Gets Extracted

The /process endpoint extracts the following from each conversation turn:

Memory Facts

Atomic facts (preferences, events, commitments) with importance scoring, deduplication, and topic tagging.

Personality Deltas

Big5 trait shifts (openness, conscientiousness, extraversion, agreeableness, neuroticism) with reasoning.

Mood Changes

4D mood delta (happiness, energy, calmness, affection) with trigger identification.

Habit Detection

New and reinforced behavioral patterns — exercise routines, reading habits, social patterns.

Interest Tracking

Topics the user engages with, categorized by domain with confidence and engagement scores.

Relationship Dynamics

Love score changes with reasoning — tracks rapport, trust, and emotional connection.

Proactive Outreach

Scheduled check-ins and follow-ups based on conversation context (e.g., 'ask about the hike tomorrow').

Emotional Themes

Detected emotional tones — joy, creative spark, feeling overwhelmed, seeking connection, etc.

Use Cases

OpenClaw / Agent Frameworks

Use any agent framework for orchestration and tool calling. Route conversation transcripts through Sonzai for persistent memory, personality, and proactive behavior across sessions.

Privacy-Sensitive Applications

Anonymize or redact conversation data before sending to your LLM. Only structured extractions (facts, mood deltas) are stored — no raw conversation text.

Custom LLM Providers

Run local models, fine-tuned models, or specialized providers for chat. Sonzai handles the intelligence layer regardless of your chat LLM.

Multi-Agent Systems

Each agent maintains its own memory tree, personality, and behavioral state. Cross-agent memory sharing enables collaborative intelligence.

Complete Example

Full BYO-LLM loop with Gemini for chat and Sonzai for memory:

import { Sonzai } from "@sonzai-labs/agents";
import { GoogleGenAI } from "@google/genai";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

async function chat(agentId: string, userId: string, userMessage: string) {
  const sessionId = `session-${Date.now()}`;

  // 1. Start session
  await sonzai.agents.sessions.start(agentId, { userId, sessionId });

  // 2. Get enriched context
  const ctx = await sonzai.agents.getContext(agentId, {
    userId,
    sessionId,
    query: userMessage,
  });

  // 3. Chat with your own LLM
  const completion = await gemini.models.generateContent({
    model: "gemini-3.1-flash-lite-preview",
    contents: [
      { role: "user", parts: [{ text: buildPrompt(ctx) + "\n\n" + userMessage }] },
    ],
  });
  const reply = completion.text!;

  // 4. Process — extract memories, mood, personality, habits
  const result = await sonzai.agents.process(agentId, {
    userId,
    sessionId,
    messages: [
      { role: "user", content: userMessage },
      { role: "assistant", content: reply },
    ],
    provider: "gemini",
    includeExtractions: true,
  });

  console.log(`Memories created: ${result.memories_created}`);
  console.log(`Mood updated: ${result.side_effects.mood_updated}`);

  // 5. End session
  await sonzai.agents.sessions.end(agentId, {
    userId,
    sessionId,
    totalMessages: 2,
    durationSeconds: 30,
  });

  return reply;
}

function buildPrompt(ctx: any): string {
  return `You are ${ctx.profile?.name || "an AI companion"}.
Personality: ${JSON.stringify(ctx.profile?.big5 || {})}
Mood: ${JSON.stringify(ctx.behavioral?.mood || {})}
Memories:\n${(ctx.memory?.facts || []).map((f: any) => `- ${f.text}`).join("\n")}
`;
}

← OpenClaw Integration Custom LLM→