你拥有 LLM 和聊天循环。Sonzai 拥有记忆、情绪、人格与关系。每回合 — sessions.start → session.context() + 你的 LLM + session.turn() 循环 → sessions.end。

你保留现有的聊天循环。每次调用 LLM 之前，向 Sonzai 拉取与用户消息相关的丰富上下文；LLM 回复之后，仅把这次交换提交给 session.turn()。情绪在 ~300–500 ms 内同步落地。更深入的提取 — 事实、人格漂移、习惯检测、目标更新 — 在背景里异步跑 5–15 秒。Sonzai 永远看不到你的工具执行，也永远不替你选模型。

适合陪伴、语音智能体、智能体框架（OpenAI Agents SDK、LangChain、LiveKit）以及任何在采用 Sonzai 之前就已经有生产级 LLM 循环的场景。

何时使用

你已经有生产环境里的 LLM 循环 —— 自定义工具、评估、提示词模板，或者锁定的特定提供商。
你需要每回合的新鲜上下文，而不是整段对话只拉一次。
你想要情绪、事实、人格、习惯、目标和关系信号 — 而不放弃对 LLM 选择和工具执行的控制权。

何时切换

一段对话来不及等每次 .turn() 同步返回 — 切换到模式 5：独立批处理。
可以接受 Sonzai 持有 LLM 调用 — 切换到模式 1：托管运行时，删掉绝大多数代码。

架构

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Your App   │     │   Sonzai API     │     │   Your LLM   │
└──────┬──────┘     └────────┬─────────┘     └──────┬───────┘
     │                     │                       │
     │  sessions.start     │                       │
     │────────────────────>│ (prewarms memory)     │
     │  <── Session ───────│                       │
     │                     │                       │
     │  ─── Per turn ──────────────────────────── │
     │                     │                       │
     │  session.context()  │                       │
     │────────────────────>│                       │
     │  <── enriched ctx ──│                       │
     │    personality, mood│                       │
     │    memories, goals  │                       │
     │                     │                       │
     │  Your LLM loop ─────┼──────────────────────>│
     │  + your tools       │                       │
     │  <── reply ─────────┼───────────────────────│
     │                     │                       │
     │  sendToUser(reply)  (no waiting on Sonzai)  │
     │                     │                       │
     │  session.turn()     │                       │
     │────────────────────>│ ⇒ sync mood ~300ms    │
     │  <── mood, status ──│ ⇒ background extract  │
     │                     │   (5–15s)             │
     │                     │                       │
     │  ─── Repeat ────────────────────────────── │
     │                     │                       │
     │  session.end()      │                       │
     │────────────────────>│── consolidate         │
     │                     │   long-term memory    │
     └─────────────────────┴───────────────────────┘

端到端示例

最小可运行的循环：开会话；每个回合拉上下文、调你的 LLM、提交交换；结束时关闭会话。

import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

async function runConversation(agentId: string, userId: string) {
const sessionId = `session-${Date.now()}`;
const history: { role: string; content: string }[] = [];

// Session handle bundles agentId/userId/sessionId + provider/model
// defaults so you don't repeat them on every call.
const session = await sonzai.agents.sessions.start(agentId, {
  userId,
  sessionId,
  provider: "gemini",
  model:    "gemini-3.1-flash-lite-preview",
});

async function turn(userMessage: string): Promise<string> {
  // 1. Pull fresh, query-relevant context BEFORE the LLM call.
  const ctx = await session.context({ query: userMessage });

  // 2. Your LLM, your tools — Sonzai is OUT of the loop here.
  const reply = await yourLLM.chat({
    system:   buildSystemPrompt(ctx),
    messages: [...history, { role: "user", content: userMessage }],
  });

  sendToUser(reply.content);

  // 3. Submit the exchange. Sync mood ~300ms; deeper extraction
  //    (facts, personality, habits) runs asynchronously 5–15s later.
  await session.turn({
    messages: [
      { role: "user",      content: userMessage },
      { role: "assistant", content: reply.content },
    ],
  });

  history.push({ role: "user",      content: userMessage });
  history.push({ role: "assistant", content: reply.content });
  return reply.content;
}

return { turn, end: () => session.end() };
}

// /context returns a flat object — read what you need, drop the rest.
function buildSystemPrompt(ctx: any): string {
const facts = (ctx.loaded_facts ?? []).map((f: any) => `- ${f.atomic_text}`).join("\n");
return [
  ctx.personality_prompt ?? "You are a helpful AI companion.",
  `Personality (Big5): ${JSON.stringify(ctx.big5 ?? {})}`,
  `Current mood: ${JSON.stringify(ctx.current_mood ?? {})}`,
  facts ? `Relevant memories:\n${facts}` : "",
].filter(Boolean).join("\n\n");
}

最关键的一步

在调用 LLM 之前始终调用 session.context(query=user_msg) — 每个回合都要。这是闭环的关键步骤。跳过它就意味着 LLM 在用过期状态工作，记忆层的价值也就崩了。

用 fetchNextContext 省一次往返

session.turn() 接受 fetchNextContext: { query: nextMessage } （Python：fetch_next_context={"query": ...}）。设置后，响应里会带回下一次 /context 的负载到 next_context，于是回合 N 还没结束，客户端就已经拿到了回合 N+1 的上下文。

下一步

模式 1：记忆中间件（深入指南）

工具调用、多模态/图像桥接、双输出提示词、把 Sonzai 知识库与记忆搜索暴露为 LLM 工具、轮询延迟提取。

端点逐一解析

sessions.start、session.context、session.turn、/process、sessions.end 与读取端点的完整参考。

模式 5：独立批处理

同样的数据模型，但你在结尾一次性提交整段对话，而不是每个回合单独提交。

模式 4：独立记忆（实时）

何时使用

何时切换

架构

端到端示例

下一步

On this page