Skip to main content

Pattern 5: Standalone Memory (Batch)

One call after the conversation is done. Ship the full transcript to /process (or sessions.end with messages) and let Sonzai extract facts, mood, personality, and habits in the background.

You own the entire conversation. Sonzai never sees it in real time. When the conversation ends — call wraps, support case closes, journaling session finishes — you POST the transcript once and Sonzai's extractor turns it into facts, mood updates, personality drift, habit detection, and proactive-outreach signal. Best for tutoring, fitness, CRM, voice calls, journaling, and any flow where Sonzai in the hot path is undesirable or impossible.

When to use this

  • Latency budget can't tolerate a per-turn /turn round-trip.
  • The transcript already exists (recorded calls, Gong/Zoom exports, journal entries).
  • You want bulk ingest after the fact — replay logs, migrate users, benchmark agent quality.

When to switch

Architecture

┌─────────────┐     ┌──────────────────┐     ┌──────────────┐
│  Your App   │     │   Sonzai API     │     │   Your LLM   │
└──────┬──────┘     └────────┬─────────┘     └──────┬───────┘
     │                     │                       │
     │  GET /context       │                       │
     │────────────────────>│ (optional pre-session │
     │  <── user profile ──│  personalization)     │
     │                     │                       │
     │  ══ Your conversation (Sonzai not involved) ═════════│
     │                     │                       │              │
     │  Chat ──────────────┼──────────────────────>│             │
     │  <── reply ─────────┼───────────────────────│             │
     │  [N turns, your loop, your tools]            │             │
     │                     │                       │             │
     │  ════════════════════════════════════════════════════════│
     │                     │                       │
     │  /process or sessions.end({ messages })     │
     │────────────────────>│── extract facts,      │
     │  (full transcript)  │   personality, mood,  │
     │                     │   habits, interests   │
     │  <── extractions ───│   (Sonzai LLM)        │
     │                     │                       │
     │  Use insights       │                       │
     │  (push notif,       │                       │
     │   dashboard,        │                       │
     │   exercises, …)     │                       │
     └─────────────────────┴───────────────────────┘

End-to-end snippet

The simplest path is /process: one call, auto-creates the session, returns the generated session_id for correlation. Use the explicit sessions.start → end({ messages }) lifecycle when you need session-scoped tools, durations, or async polling.

import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

async function ingestTranscript(
agentId: string,
userId: string,
transcript: { role: "user" | "assistant" | "tool"; content: string; tool_calls?: any[] }[],
) {
// One call. Auto-creates a session. Tool messages allowed.
const result = await sonzai.agents.process(agentId, {
  userId,
  messages: transcript,
  provider: "gemini",                          // optional override
  model:    "gemini-3.1-flash-lite-preview",   // optional override
});

// result.session_id is the auto-created session id.
// Pull extractions from the read endpoints when ready:
const memory = await sonzai.agents.memory.list(agentId, { userId });
const mood   = await sonzai.agents.getMood(agentId, { userId });

return { sessionId: result.session_id, memory, mood };
}

Pick one trigger, not both

/process and sessions.end({ messages }) are functionally equivalent for batch ingest — both extract facts and side effects from the full transcript inline. Don't do both for the same transcript or extraction runs twice. Use /process for the simple one-call shape. Use sessions.start + sessions.end({ messages }) when you want explicit lifecycle, async polling, or session-scoped tools.

What runs when

/process and sessions.end are intentionally lightweight: extract facts and a session summary inline (one LLM call per chunk). The expensive cross-session work (dedup, clustering, diary, decay) is scheduled automatically by the platform — you don't pay for it on every call.

Where to next

On this page