What is the best AI for a real estate brokerage or developer?

Sonzai is an AI operating system for real estate. It runs the operational work that usually depends on someone remembering: lead generation, qualification, concierge replies, sales follow-up, collections, and segmentation. Everything runs on one record that learns the business and gets sharper with every deal. See https://sonz.ai/real-estate.

What AI can run follow-up and renewals for an insurance agency?

Sonzai runs an insurance book: it follows up on quotes that went quiet, chases renewals early, answers claims questions, and remembers every client. It works for agencies and for independent, commission-based agents who want their own book run for them. See https://sonz.ai/insurance.

Won't a smarter AI model make a tool like this obsolete?

No. Sonzai is model-agnostic, so a better model just does better work over the same record. The value lives in your private record, which a smarter model can't reach on its own. Intelligence isn't the bottleneck; permission is.

How do I give an AI agent memory that persists across sessions?

Build on the Sonzai Mind Layer: persistent memory, relationships, knowledge, and learning through one API, with any model underneath. The agent reads the record before it replies and writes back what it learns. See https://sonz.ai/mind-layer.

Does Sonzai replace our CRM or property system?

No. Sonzai works on top of the systems you already run (Salesforce, Yardi, ERP). It reads from them, reconciles across them, and writes back. It adds the operational layer and the cross-system record, it doesn't replace your systems.

Pattern 1: 记忆中间件（实时）

逐轮集成。你掌控 LLM 与工具调用循环；Sonzai 负责让该 LLM 了解用户。包含工具调用、多模态/图像处理。

You control the LLM. Sonzai handles what that LLM knows about the user.

Open a Session once. For every turn: call session.context({ query }) to pull the enriched user profile, build your system prompt, call your own LLM (with your own tools), then call session.turn({ messages }) to submit just the new exchange. Sync mood updates inline (~300–500ms); deeper extraction (facts, personality, habits) lands asynchronously 5–15 seconds later in the background.

This is the same data model mem0 provides (relevant memories injected before generation), extended with personality evolution, mood tracking, habit detection, goal tracking, proactive outreach scheduling, and relationship dynamics.

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Your App   │     │   Sonzai API     │     │   Your LLM   │
└──────┬──────┘     └────────┬─────────┘     └──────┬───────┘
     │                     │                       │
     │  sessions.start     │                       │
     │────────────────────>│ (prewarms memory)     │
     │  <── Session ───────│                       │
     │                     │                       │
     │  ─── Per turn ──────────────────────────── │
     │                     │                       │
     │  session.context()  │                       │
     │────────────────────>│                       │
     │  <── enriched ctx ──│                       │
     │    personality, mood│                       │
     │    memories, goals  │                       │
     │                     │                       │
     │  Your LLM loop ─────┼──────────────────────>│
     │  + your tools       │                       │
     │  <── reply ─────────┼───────────────────────│
     │                     │                       │
     │  sendToUser(reply) (no waiting on Sonzai)   │
     │                     │                       │
     │  session.turn()     │                       │
     │────────────────────>│ ⇒ sync mood ~300ms    │
     │  <── mood, status ──│ ⇒ background extraction│
     │                     │   (5–15s)             │
     │                     │                       │
     │  ─── Repeat ────────────────────────────── │
     │                     │                       │
     │  session.end()      │                       │
     │────────────────────>│── consolidate         │
     │                     │   long-term memory    │
     └─────────────────────┴───────────────────────┘

What Sonzai's LLM is used for

session.context() and sessions.start use no Sonzai LLM credits — they are pure reads. session.turn(), /process, and sessions.end({ messages }) use Sonzai's LLM for fact extraction + session summary (light, per-call, billed). Heavy background work — cross-session dedup, clustering, diary, decay — runs on auto-scheduled jobs (8h post-session, daily, weekly) and is billed against the same tenant but not per-call. Your chat LLM is entirely your cost.

核心循环

Open the session once with your provider/model defaults. Then for every turn: get context → call your LLM (running tool calls in your own loop) → submit the turn. End the session when done.

import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

async function runConversation(agentId: string, userId: string) {
const sessionId = `session-${Date.now()}`;
const history: { role: string; content: string }[] = [];

// Open a Session handle. agentId/userId/sessionId and provider/model
// defaults live on the handle so you don't repeat them on every call.
const session = await sonzai.agents.sessions.start(agentId, {
  userId,
  sessionId,
  toolDefinitions: yourTools,                   // optional — register session-scoped tool schemas
  provider: "gemini",                           // optional — default for .turn()
  model: "gemini-3.1-flash-lite-preview",       // optional — default for .turn()
});

async function turn(userMessage: string): Promise<string> {
  // Fresh enriched context for this specific message
  const ctx = await session.context({ query: userMessage });

  // Your LLM — swap in any provider you like
  let reply = await yourLLM.chat({
    system: buildSystemPrompt(ctx),
    messages: [...history, { role: "user", content: userMessage }],
    tools: yourTools,
  });

  // Tool-calling loop is entirely yours — Sonzai is OUT of the loop here.
  const toolMessages: any[] = [];
  while (reply.tool_calls?.length) {
    for (const call of reply.tool_calls) {
      const result = await runYourTool(call);
      toolMessages.push(
        { role: "assistant", tool_calls: [call] },
        { role: "tool", tool_call_id: call.id, content: result },
      );
    }
    reply = await yourLLM.chat({
      system: buildSystemPrompt(ctx),
      messages: [...history, { role: "user", content: userMessage }, ...toolMessages],
      tools: yourTools,
    });
  }

  sendToUser(reply.content); // send first; don't block on Sonzai

  // Submit just the new turn. Sync mood ~300ms, deferred extraction
  // (facts, personality, habits) runs asynchronously 5–15s later.
  // Pass the FULL exchange — including tool calls and tool results —
  // so Sonzai can extract facts from tool outputs too.
  const { mood, extraction_id } = await session.turn({
    messages: [
      { role: "user", content: userMessage },
      ...toolMessages,                          // assistant tool_calls + tool results
      { role: "assistant", content: reply.content },
    ],
  });

  history.push({ role: "user", content: userMessage });
  history.push({ role: "assistant", content: reply.content });

  return reply.content;
}

return { turn, end: () => session.end() };
}

// The /context response is a flat object — there is no nested
// `profile` / `behavioral` / `memory` envelope.
function buildSystemPrompt(ctx: any): string {
const facts = (ctx.loaded_facts ?? []).map((f: any) => `- ${f.atomic_text}`).join("\n");
const goals = (ctx.active_goals ?? []).map((g: any) => g.description).join(", ");
return `${ctx.personality_prompt ?? "You are a helpful AI companion."}
Personality (Big5): ${JSON.stringify(ctx.big5 ?? {})}
Current mood: ${JSON.stringify(ctx.current_mood ?? {})}
Active goals: ${goals || "none"}
Relevant memories:
${facts || "none yet"}`;
}

每一轮都拉取最新上下文

The single most important habit in Pattern 1 is calling session.context(query=user_msg) before every LLM call. This is the load-bearing piece that closes the loop — without it, the LLM doesn't get the fresh mood (which lands inline on .turn()) or the freshly-extracted facts (which land 5–15 seconds after .turn()).

while (conversationActive) {
const userMsg = await getUserInput();

// 1. PULL FRESH CONTEXT — happens every turn, before the LLM call.
//    ctx is a flat object — no `profile` / `behavioral` / `memory` envelope.
const ctx = await session.context({ query: userMsg });

// 2. Build system prompt from the context layers
const systemPrompt = renderPromptFromContext(ctx);

// 3. Run YOUR LLM — Sonzai is OUT of the loop here
const reply = await yourLLM.chat({
  system: systemPrompt,
  messages: [...history, { role: "user", content: userMsg }],
});

// 4. Submit the just-completed turn — sync mood + async deferred extraction
await session.turn({
  messages: [
    { role: "user", content: userMsg },
    { role: "assistant", content: reply.content },
  ],
});
}

function renderPromptFromContext(ctx: any): string {
const parts: string[] = [];
if (ctx.personality_prompt) parts.push(ctx.personality_prompt);
if (ctx.big5) parts.push(`Personality (Big5): ${JSON.stringify(ctx.big5)}`);
if (ctx.speech_patterns?.length) parts.push(`Speech patterns: ${ctx.speech_patterns.join(", ")}`);
if (ctx.current_mood) parts.push(`Current mood: ${JSON.stringify(ctx.current_mood)}`);
const facts = (ctx.loaded_facts ?? []).slice(0, 5).map((f: any) => `- ${f.atomic_text ?? ""}`).join("\n");
if (facts) parts.push(`Relevant memories:\n${facts}`);
const kb = (ctx.knowledge?.results ?? []).slice(0, 3).map((r: any) => `- ${r.label}: ${(r.content ?? "").slice(0, 120)}`).join("\n");
if (kb) parts.push(`Knowledge base:\n${kb}`);
return parts.join("\n\n");
}

Save a roundtrip with fetchNextContext

session.turn() accepts a fetch_next_context={"query": next_user_message} argument (TS: fetchNextContext). When set, the server runs the deferred extraction trigger AND fetches the next /context payload in the same response, returning it under next_context. This eliminates the second roundtrip on the next turn — your client already has the context for turn N+1 by the time turn N has finished. Use this when you can predict the next user query (e.g., for the very next render of context).

Context freshness. Mood updates inline on each .turn() call (~300ms), so the very next .context() reflects the new mood. Personality / facts / inventory land 5–15 seconds after .turn() in the background, so they appear within a turn or two of being mentioned.

Why per-turn. State changes between turns. A user mentioning a new pet on turn 3 means turn 4's context should carry that fact. Skipping .context() between turns means the LLM works from stale state — and the value of a memory layer collapses.

Pass the actual user message as query. session.context() uses the query for memory recall, KB search, and proactive signal selection. Passing the raw user message gives the most relevant pull; passing a static placeholder gives generic context regardless of what the user asked.

工具消息会进入抽取流程

The /turn schema accepts OpenAI/Anthropic-style tool messages: role: "tool" for tool results and tool_calls arrays on assistant messages. Pass the entire intermediate exchange — Sonzai's extractor reads tool results and can capture facts that only appeared in tool output (e.g. "user's last order shipped from Tokyo" from an order-lookup tool).

await session.turn({
messages: [
  { role: "user", content: "Where did my last order ship from?" },
  {
    role: "assistant",
    tool_calls: [{ id: "call_1", type: "function", function: { name: "order-lookup", arguments: "{}" } }],
  },
  {
    role: "tool",
    tool_call_id: "call_1",
    content: '{"order_id":"42","origin":"Tokyo","carrier":"DHL"}',
  },
  { role: "assistant", content: "Your last order shipped from Tokyo via DHL." },
],
});

轮询后台抽取

/turn returns immediately after the sync mood pass. The deeper extraction runs asynchronously and reaches done in 5–15s. You can poll the status if you need to gate something on it:

const { extraction_id } = await session.turn({ messages });

// Optional — only poll if you need to wait for facts/personality before doing something
let status = await session.status(extraction_id);
while (status.state !== "done" && status.state !== "failed") {
await new Promise((r) => setTimeout(r, 1000));
status = await session.status(extraction_id);
}

工具调用

Pattern 1 hands the tool-calling loop entirely to you. Sonzai never executes a tool — but it does read tool calls and tool results out of the messages you submit on /turn, so the extractor can capture facts that surfaced inside a tool output. There are two flavors of tools you'll typically wire up.

A. Your own tools

Use whatever your agent framework provides — @function_tool in the OpenAI Agents SDK, tools= on Anthropic, function declarations on Gemini, @tool in LangChain. The pattern is the same: register the tool with your LLM, run the tool-calling loop on your side, and forward the full exchange (including the assistant's tool_calls message and the role: "tool" result message) to session.turn().

from agents import Agent, Runner, function_tool

@function_tool
def get_current_time() -> str:
    """Return the current time."""
    from datetime import datetime, timezone
    return datetime.now(timezone.utc).isoformat(timespec="seconds")

agent = Agent(name="Companion", tools=[get_current_time], model=gemini_model)
result = Runner.run_sync(agent, user_msg)

# Build the tool-aware messages array Sonzai expects.
sonzai_messages = [
    {"role": "user", "content": user_msg},
    {
        "role": "assistant",
        "content": None,
        "tool_calls": [{
            "id": "call_1",
            "type": "function",
            "function": {"name": "get_current_time", "arguments": "{}"},
        }],
    },
    {"role": "tool", "tool_call_id": "call_1", "content": "2026-05-07T07:30:00Z"},
    {"role": "assistant", "content": result.final_output},
]
session.turn(messages=sonzai_messages)

When the assistant says "It's 7:30 AM" and the user replies "Set my morning standup for 8", Sonzai's extractor sees the tool's actual output, not just the assistant's paraphrase — and can capture "user prefers 8 AM standups" with the right grounding.

B. Sonzai's capabilities as tools

You can also wrap Sonzai's own REST endpoints as tools your LLM can call mid-turn. The two most useful are knowledge base search and memory search — both let the LLM pull additional context on demand without you having to inject everything up-front through session.context().

// TypeScript — agents.memory.search is available directly
import { Sonzai } from "@sonzai-labs/agents";
import { tool } from "ai";
import { z } from "zod";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

const kbSearch = tool({
description: "Search the agent's knowledge base.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
  const res = await sonzai.agents.knowledgeSearch("agent-id", { query, limit: 5 });
  return res.results.map((r) => `- ${r.label}: ${r.content}`).join("\n") || "No matching knowledge.";
},
});

const memorySearch = tool({
description: "Search the user's long-term memory.",
parameters: z.object({ query: z.string() }),
execute: async ({ query }) => {
  const res = await sonzai.agents.memory.search("agent-id", {
    query,
    user_id: "user-123",
    limit: 5,
  });
  return res.results.map((r) => `- ${r.text}`).join("\n") || "No matching memories.";
},
});

Why expose Sonzai endpoints as tools?

session.context() returns the most relevant facts for the current query — a strong default. Exposing kb_search and memory_search as tools lets the LLM decide for itself when to dig deeper (e.g., when the user asks "what did I tell you last week about X?"). It's especially useful for agent frameworks that already think in terms of tools.

When the LLM calls these tools, the result lands in your tool-calling loop just like any other tool. Forward the full exchange to session.turn() and Sonzai's extractor will see the search results too — but be aware that re-extracting facts from a memory_search tool result can create echoes (the user's own past fact resurfaces as if it were new). Either skip extraction for those tool messages on your side, or trust the dedup pass.

For deeper coverage of Sonzai's tool endpoints, see the Tool Integration guide.

What's available as a tool

Sonzai endpoint	SDK method	Useful as an LLM tool?
Knowledge base search	`agents.knowledge_search(agent_id, query, limit)`	Yes — LLM looks up policies, products, docs
Memory search	`agents.memory.search(agentId, { query, userId })` (TS/Go); `agents.memory.list_facts(agent_id, user_id)` (Python)	Yes — LLM looks up past user statements
Mood / personality / habits / goals reads	`agents.get_mood`, `agents.personality.get`, `agents.list_habits`, `agents.list_goals`	Mostly inject via `session.context()` instead — read-only state changes rarely with the user query
Image generation	`generation.generate_image`	Possible, but typically your app exposes this as its own UI action, not as an LLM tool

处理图像与多模态输入

Sonzai's memory pipeline is text-based today. The /turn and /process endpoints accept string content only — DialogueMessage.content is string. Your LLM can be fully multimodal (Gemini, Claude, GPT-4o all accept image URLs and audio natively) but to get image-related facts into Sonzai you need to bridge the multimodal content into text in the messages you send to /turn.

The recommended pattern is dual-output: have your vision-capable LLM produce both (a) the warm reply you show the user and (b) a hidden [MEMORY: ...] line with a detailed factual description. Strip the [MEMORY: ...] line out before showing the user, and embed it in the bridged text you submit to Sonzai.

import OpenAI from "openai";
import { Sonzai } from "@sonzai-labs/agents";

const gemini = new OpenAI({
baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
apiKey: process.env.GEMINI_API_KEY!,
});
const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

const SYSTEM_PROMPT_IMAGE_AWARE = `You are a friendly companion. When the user shares an image, respond warmly
to what's emotionally important to THEM.

After your reply, ALWAYS include a single line:
[MEMORY: <detailed factual description of the image — setting, objects,
people, mood, time of day, what the user appears to be doing>]

The user does NOT see the [MEMORY: ...] line.`;

async function processImageTurn(session: any, userMsg: string, imageUrl: string): Promise<string> {
const result = await gemini.chat.completions.create({
  model: "gemini-3.1-flash-lite-preview",
  messages: [
    { role: "system", content: SYSTEM_PROMPT_IMAGE_AWARE },
    {
      role: "user",
      content: [
        { type: "text", text: userMsg },
        { type: "image_url", image_url: { url: imageUrl } },
      ],
    },
  ],
});
const raw = result.choices[0].message.content ?? "";

// Split the dual output
const m = raw.match(/\[MEMORY:\s*([\s\S]+?)\]/);
const memoryNote = m ? m[1].trim() : "";
const reply = raw.replace(/\[MEMORY:[\s\S]+?\]/, "").trim();

sendToUser(reply);

await session.turn({
  messages: [
    { role: "user", content: `${userMsg}\n\n[Image attached: ${memoryNote}, URL: ${imageUrl}]` },
    { role: "assistant", content: reply },
  ],
});
return reply;
}

Why this pattern:

No backend multimodal yet. /turn accepts string content. Text-bridging through your same vision-capable LLM is the cleanest workaround.
Why dual-output (vs. a separate vision call). The same LLM call serves both purposes — no extra cost, no extra latency, no second roundtrip. You're already paying for vision on the assistant turn; let it produce the description too.
Why a hidden line. Keeps user-facing replies emotionally warm — "Oh you have such nice shoulders!" — while still capturing the factual detail (gym, tank top, mirror, time of day) that memory extraction needs.
It's a developer pattern, not a Sonzai field. The [MEMORY: ...] convention is yours to define. Sonzai just sees text. You can use any sentinel — <<MEM>>...<</MEM>>, JSON, whatever your prompt and parser agree on.

Including the URL. Embedding the URL in the bridged text isn't required, but it lets Sonzai later surface the image as a memory artifact ("the photo you shared last week") without re-running vision on the image. Your app keeps using its own image storage; Sonzai just remembers the link as text.

Audio & voice follow the same pattern

Speech-to-text (STT) on your side, send the transcript in messages. Text-to-speech (TTS) is rendered after the assistant text exists, so you forward the assistant text to session.turn() exactly as you would for a text-only chat. See the Voice AI use case below.

Future direction

Sonzai may extend the /turn schema to accept OpenAI's multimodal content blocks directly (content: [{type: "text"}, {type: "image_url"}]) with platform-side vision extraction, removing the manual bridging step. Today, text-bridging via the dual-output pattern is the supported approach.

使用场景：聊天伙伴（OpenAI Agents SDK + Gemini）

The canonical Pattern 1 example. You bring your own agent harness — here the OpenAI Agents SDK — and route it at Gemini via the OpenAI-compat endpoint, so no OPENAI_API_KEY is ever used. Sonzai sits outside the LLM/tool-calling loop entirely: it supplies the system prompt via session.context() and ingests the finished transcript via session.turn(). The Agents SDK does all multi-step reasoning and tool dispatch on your side; Sonzai does memory.

import os
from openai import AsyncOpenAI
from agents import (
    Agent,
    Runner,
    OpenAIChatCompletionsModel,
    function_tool,
    set_tracing_disabled,
)
from sonzai import Sonzai

# The Agents SDK ships traces to OpenAI by default — disable, since we
# have no OpenAI key and aren't talking to OpenAI's servers at all.
set_tracing_disabled(True)

# Point the Agents SDK's AsyncOpenAI client at Gemini's OpenAI-compat URL.
gemini = AsyncOpenAI(
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
    api_key=os.environ["GEMINI_API_KEY"],
)
model = OpenAIChatCompletionsModel(
    model="gemini-3.1-flash-lite-preview",
    openai_client=gemini,
)

# Sonzai = memory layer only. It never sees the LLM client.
sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])
session = sonzai.agents.sessions.start(
    "agent-id",
    user_id="user-123",
    session_id="session-abc",
)

@function_tool
def get_current_time() -> str:
    """Return the current time."""
    from datetime import datetime, timezone
    return datetime.now(timezone.utc).isoformat(timespec="seconds")

while True:
    user_msg = input("You: ")
    if not user_msg:
        break

    # 1) Pull enriched context (mood, personality, relevant facts, …) from Sonzai.
    ctx = session.context(query=user_msg)

    mood = ctx.get("current_mood") or "neutral"
    instructions = f"You are a friendly companion. Current mood: {mood}."

    # 2) Run the Agents SDK loop — it handles tool-calling and multi-step reasoning.
    agent = Agent(
        name="Companion",
        instructions=instructions,
        model=model,
        tools=[get_current_time],
    )
    result = Runner.run_sync(agent, user_msg)
    print(f"Assistant: {result.final_output}")

    # 3) Convert the run's items (assistant text + ToolCallItem + ToolCallOutputItem)
    # into Sonzai's tool-aware messages format. See the demo for the implementation.
    sonzai_messages = run_result_to_sonzai_messages(user_msg, result)

    # 4) Submit the turn. `mood` comes back inline (~300ms); facts / personality /
    # inventory are extracted asynchronously and land 5-15s later.
    turn_result = session.turn(messages=sonzai_messages)
    print(f"  -> mood updated: {turn_result.mood}")

session.end()

What's happening on each turn:

Sonzai is out of the LLM loop. The OpenAI Agents SDK runs the model, dispatches tools, and produces result.final_output. Sonzai never sees the LLM client and has no opinion on which model answered.
Mood is real-time. session.turn() returns fresh mood inline in ~300ms — you can render it the moment the response arrives.
Facts, personality drift, and inventory are deferred (5-15s). They run async under the returned extraction_id. Re-poll agents.memory.list_facts, agents.personality.get, etc. on the next turn; whatever didn't land yet will be there shortly.
Tool calls flow through to extraction. Sonzai's tool-aware message format accepts assistant messages with tool_calls plus a tool message carrying the result. The conversion helper packages the Agents SDK's ToolCallItem + ToolCallOutputItem into that shape so extraction can pick up facts from tool outputs too.

Want a working version? See the OpenAI Agents companion demo — a two-pane Streamlit app showing live mood, Big5, recent facts, inventory, and the constellation graph as you chat.

使用场景：语音 AI 助手

STT → enrich → LLM → TTS. Sonzai holds the memory; you own the audio pipeline. Submit the turn while TTS is synthesizing — sync mood is fast enough not to block, and deferred extraction never blocks.

import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

async function processVoiceTurn(
session: any, // Session handle from sonzai.agents.sessions.start
audioBuffer: Buffer
): Promise<Buffer> {
// Your STT
const transcript = await yourSTT.transcribe(audioBuffer);

// Inject memory into a concise voice-friendly system prompt
const ctx = await session.context({ query: transcript });

const systemPrompt = `${ctx.personality_prompt ?? "You are a voice companion."} Keep replies under 2 sentences for voice.
Mood: ${JSON.stringify(ctx.current_mood)}.
Key memory: ${ctx.loaded_facts?.[0]?.atomic_text ?? "none"}.`;

const reply = await yourLLM.chat({ system: systemPrompt, message: transcript });

// Submit the turn while TTS synthesizes (run in parallel)
const [audioResponse] = await Promise.all([
  yourTTS.synthesize(reply),
  session.turn({
    messages: [
      { role: "user", content: transcript },
      { role: "assistant", content: reply },
    ],
  }),
]);

return audioResponse;
}

使用场景：Agent 框架（LangChain / LlamaIndex）

Sonzai injects user context into the agent's system prompt. The framework handles tool calling, multi-step reasoning, and memory of the current conversation; Sonzai handles what the agent knows about the user across sessions. Send the full transcript including any tool messages to session.turn() so extraction can pick up facts from tool results.

import { ChatOpenAI } from "@langchain/openai";
import { SystemMessage, HumanMessage, AIMessage } from "@langchain/core/messages";
import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const llm = new ChatOpenAI({ model: "gpt-4o", tools: yourToolSchemas });

async function agentTurn(
session: any,
userInput: string,
messageHistory: (HumanMessage | AIMessage)[]
): Promise<string> {
const ctx = await session.context({ query: userInput });

const messages = [
  new SystemMessage(buildSystemPrompt(ctx)),
  ...messageHistory,
  new HumanMessage(userInput),
];

// Run the agent's full tool-calling loop on your side, then surface
// every intermediate message (assistant tool_calls + tool results)
// to Sonzai so it can extract from them.
const { reply, intermediate } = await runLangchainAgent(llm, messages);

await session.turn({
  messages: [
    { role: "user", content: userInput },
    ...intermediate,
    { role: "assistant", content: reply },
  ],
});

return reply;
}

使用场景：多 LLM 路由

Route to different models based on task type while Sonzai stitches user memory across all of them. The Session-level provider/model default is just a default — every .turn() can override.

import Anthropic from "@anthropic-ai/sdk";
import { GoogleGenAI } from "@google/genai";
import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const claude = new Anthropic();
const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

type TaskType = "creative" | "analytical" | "casual";

function classifyTask(message: string): TaskType {
if (/write|story|poem|imagine/i.test(message)) return "creative";
if (/analyze|compare|explain|why/i.test(message)) return "analytical";
return "casual";
}

async function routedTurn(session: any, userMessage: string): Promise<string> {
const ctx = await session.context({ query: userMessage });
const systemPrompt = buildSystemPrompt(ctx);
const task = classifyTask(userMessage);

let reply: string;

if (task === "creative") {
  const response = await claude.messages.create({
    model: "claude-sonnet-4-6",
    max_tokens: 1024,
    system: systemPrompt,
    messages: [{ role: "user", content: userMessage }],
  });
  reply = response.content[0].type === "text" ? response.content[0].text : "";
} else {
  const response = await gemini.models.generateContent({
    model: "gemini-2.5-flash",
    contents: [{ role: "user", parts: [{ text: systemPrompt + "\n\n" + userMessage }] }],
  });
  reply = response.text ?? "";
}

// Same .turn() call regardless of which chat model answered.
await session.turn({
  messages: [
    { role: "user", content: userMessage },
    { role: "assistant", content: reply },
  ],
});

return reply;
}

使用场景：隐私优先（送入 LLM 前匿名化）

Redact PII from the enriched context before it reaches your LLM. Only structured extracted facts are stored by Sonzai — never raw text.

async function privacyTurn(session: any, userMessage: string): Promise<string> {
const ctx = await session.context({ query: userMessage });

// Scrub PII from facts before they reach your LLM
const sanitizedFacts = (ctx.loaded_facts ?? []).map((f: any) => ({
  ...f,
  atomic_text: redactPII(f.atomic_text), // your PII redaction logic
}));

const sanitizedCtx = { ...ctx, loaded_facts: sanitizedFacts };
const systemPrompt = buildSystemPrompt(sanitizedCtx);

const reply = await yourLLM.chat({ system: systemPrompt, message: userMessage });

// Send unredacted transcript to Sonzai for extraction
// (Sonzai stores structured facts, not raw text)
await session.turn({
  messages: [
    { role: "user", content: userMessage },
    { role: "assistant", content: reply },
  ],
});

return reply;
}

下一步

Pattern 2: Post-Session Batch Processing — when Sonzai shouldn't be in the hot path
Endpoint walkthrough — full reference for sessions.start, context, turn, process, end, and read endpoints
KB & limitations — knowledge base behavior in standalone mode and what's not supported

Pattern 1: 记忆中间件（实时）

On this page