Skip to main content
Standalone Memory Layer

Standalone Memory Layer

Three integration shapes — one-shot batch (/process), lifecycle-scoped batch (sessions.start → end with messages), or real-time turn-by-turn (sessions.start → turn → end). All three give you Sonzai's behavioral intelligence without giving up control of the chat loop.

Choosing Your Integration Shape

There are three ways to feed conversations into Sonzai. The first two are batch (you send a transcript after the conversation); the third is real-time (you submit each turn as it happens). Pick exactly one per conversation — chaining them runs extraction twice on the same messages.

A. /process — one-shot batch
Single call. Auto-creates a session if you don't pass one. Best for external LLM transcripts, benchmarks, and any flow without a long-lived session lifecycle.
B. sessions.start → end({ messages }) — lifecycle batch
Open a session, do your full conversation off-platform, then close with the transcript on .end(). Use when you want explicit session boundaries, async polling, or session-scoped tools — but still ingest in one shot.
C. sessions.start → turn() × N → end() — real-time
Open a session and submit each exchange via .turn() as the conversation happens. Sync mood lands inline (~300–500ms); deeper extraction runs asynchronously 5–15s later. Best for chat companions, voice AI, and agent frameworks.
A. /processB. sessions.end({ messages })C. sessions.turn() × N
Calls per conversation12 (start + end)2 + N (start + N × turn + end)
Sonzai in the hot path?NoNoYes — .context() and .turn() flank each turn
Context per turnPre-session only (optional getContext call)Pre-session only (optional getContext call)Fresh, query-specific via .context()
Extraction timingWhole transcript, inlineWhole transcript, inline (or async on tenants where enabled)Per-turn — sync mood inline, deeper extraction 5–15s later
Lifecycle ownershipImplicit (auto-session)ExplicitExplicit
Best forExternal transcripts, benchmarks, no-lifecycle ingestExplicit boundaries + async processing, session-scoped tools, batch ingestChat companions, voice AI, agent frameworks

A and B are functionally equivalent for fact extraction — both extract facts and side-effects from the full transcript inline. The only differences are lifecycle ergonomics (B gives you an explicit session and supports async polling) and call count.

C is a different shape: Sonzai is part of every turn instead of seeing the conversation only at the end.

Don't mix shapes within one conversation

Calling .turn() per turn (C) and .end({ messages }) with the same transcript (B) extracts the same messages twice. Pick one shape per conversation. The pattern docs below show C and B/A separately.

The rest of this section groups A and B together as Pattern 2: Post-Session Processing (since they share the same "extract a transcript at the end" semantics) and treats C as Pattern 1: Memory Middleware (real-time turn submission).

What runs when — extraction is light, consolidation is automatic

/turn, /process, and sessions.end are intentionally lightweight. They extract facts and a session summary from the transcript and persist them — that's it. The expensive work (cross-session dedup, clustering, diary deepening, decay) is scheduled automatically by the platform and is rate-limited so it doesn't run on every call.

LayerWhen it runsTriggered byCost
Sync mood update (Pattern 1 /turn only)Inline, ~300–500msYour .turn() callLight — one short LLM call
Background extraction (facts, personality, habits)5–15 seconds after /turnAutomatic — no caller actionLight — one LLM call per chunk
Fact extraction + session summary (batch)Inline, on every /process or sessions.end({ messages })Your callLight — one LLM call per chunk
Post-session consolidation (dedup, crossref, bundle precompute, pattern detection)~8 hours after the session endsAutomaticMedium
Daily consolidation + diaryOnce per dayAutomatic scheduleMedium
Deep consolidation (wakeup/habit dedup, decay, cluster reconcile, weekly summaries)Daily / weeklyAutomatic scheduleHeavy

This means you can call /turn per turn (Pattern 1), or /process once at the end (Pattern 2), without paying for heavy consolidation each time. The platform de-duplicates and consolidates in the background.

Practical implication

Don't try to "save calls" by skipping /turn between turns. Each call only does sync mood + queues deferred extraction (cheap). Skipping it means losing per-turn behavioral signal. The expensive consolidation runs on its own schedule no matter how many times you call.

Where to next

On this page