Skip to main content
Standalone Memory Layer

Knowledge Base & Limitations

How the knowledge base behaves in standalone mode, plus what isn't supported vs. managed mode.

Knowledge base in standalone mode

Automatic — KB results in /context

When you call session.context({ query }) (or GET /context), the endpoint searches the agent's knowledge base and includes matching results in a knowledge field automatically.

{
  "personality_prompt": "You are a helpful AI companion...",
  "big5": { "openness": 0.7, "conscientiousness": 0.6, "extraversion": 0.5, "agreeableness": 0.8, "neuroticism": 0.3 },
  "current_mood": { "valence": 0.4, "arousal": 0.2, "tension": -0.1, "affiliation": 0.3 },
  "loaded_facts": [{ "atomic_text": "User prefers morning workouts", "fact_type": "behavioral", "importance": 0.8 }],
  "active_goals": [{ "description": "Run a 5K by June" }],
  "habits": [{ "label": "Daily exercise" }],
  "knowledge": {
    "results": [
      {
        "content": "Refund policy: customers can request a full refund within 30 days...",
        "label": "Refund Policy",
        "type": "policy",
        "source": "policies.pdf",
        "score": 0.92
      }
    ]
  }
}

Learning loop — extraction detects knowledge gaps

After /turn or /process extracts side effects, it also searches the KB with topics found in the conversation. If relevant KB content exists that the agent missed, it stores these as proactive signals — the next session.context() call includes them automatically.

Turn 1: session.context() → (no KB results yet)
       ↓
      chat with your LLM
       ↓
      session.turn() → extracts "hiking gear" as topic
                     → searches KB, finds "Hiking Equipment Guide"
                     → stores as proactive signal

Turn 2: session.context() → includes "Hiking Equipment Guide" from KB
                        + any direct search results for the new query
       ↓
      chat with your LLM (now knows about hiking gear!)

Explicit — tool endpoint for agent frameworks

const results = await client.agents.knowledgeSearch("agent-id", {
query: "refund policy",
limit: 5,
});

for (const result of results.results) {
console.log(result.label, result.content);
}

You can also expose this as a function tool to your LLM — see Tool Calling in Pattern 1.

Limitations vs. managed mode

Want to use your own model without managing the chat loop? Consider Custom LLM instead. It lets you point Sonzai at any OpenAI-compatible endpoint while keeping streaming, built-in tools, and per-message extraction fully automatic.

No built-in tool execution

Managed mode calls built-in tools (web search, memory recall, image generation) automatically. In standalone mode you must implement tool calling yourself — the tool-calling loop is yours, but the resulting tool messages flow into /turn or /process for extraction. See the Tool Integration guide.

No streaming on extraction

session.context(), /turn, and /process are synchronous request-response calls. Streaming is handled by your own LLM. Background extraction is asynchronous but you poll for state, not stream.

Deferred knowledge base enrichment

KB enrichment is deferred — extraction detects knowledge gaps but the next session.context() call surfaces them, not the current turn.

Manual extraction trigger

You must pick one of the three integration shapes per conversation: /process (one-shot batch), sessions.startsessions.end({ messages }) (lifecycle batch), or sessions.startsession.turn() per turn → session.end() (real-time). Picking none means the transcript is never seen by the Context Engine and no behavioral data is captured. Picking two — for example calling .turn() per turn and passing messages on .end() — runs extraction twice on the same content. (Heavy consolidation runs on its own schedule and doesn't need to be triggered manually.)

Text-only memory pipeline

Sonzai's extraction reads messages as text. Multimodal content (images, audio) must be bridged to text before submission — see Working with Images & Multimodal Input in Pattern 1.

What's the same in both modes

Extraction quality is identical — both modes use the same LLM pipeline for fact extraction, personality shifts, mood, habits, and consolidation. The 7-layer enriched context from session.context() is the same data the managed chat builds internally.

On this page