独立记忆层
三种集成形态 —— 一次性批处理(/process)、生命周期批处理(sessions.start → end 携带 messages),或实时逐轮(sessions.start → turn → end)。三者都能让你拿到 Sonzai 的行为智能,同时不放弃对对话循环的控制。
选择集成形态
There are three ways to feed conversations into Sonzai. The first two are batch (you send a transcript after the conversation); the third is real-time (you submit each turn as it happens). Pick exactly one per conversation — chaining them runs extraction twice on the same messages.
A. /process | B. sessions.end({ messages }) | C. sessions.turn() × N | |
|---|---|---|---|
| Calls per conversation | 1 | 2 (start + end) | 2 + N (start + N × turn + end) |
| Sonzai in the hot path? | No | No | Yes — .context() and .turn() flank each turn |
| Context per turn | Pre-session only (optional getContext call) | Pre-session only (optional getContext call) | Fresh, query-specific via .context() |
| Extraction timing | Whole transcript, inline | Whole transcript, inline (or async on tenants where enabled) | Per-turn — sync mood inline, deeper extraction 5–15s later |
| Lifecycle ownership | Implicit (auto-session) | Explicit | Explicit |
| Best for | External transcripts, benchmarks, no-lifecycle ingest | Explicit boundaries + async processing, session-scoped tools, batch ingest | Chat companions, voice AI, agent frameworks |
A and B are functionally equivalent for fact extraction — both extract facts and side-effects from the full transcript inline. The only differences are lifecycle ergonomics (B gives you an explicit session and supports async polling) and call count.
C is a different shape: Sonzai is part of every turn instead of seeing the conversation only at the end.
Don't mix shapes within one conversation
Calling .turn() per turn (C) and .end({ messages }) with the same transcript (B) extracts the same messages twice. Pick one shape per conversation. The pattern docs below show C and B/A separately.
The rest of this section groups A and B together as Pattern 2: Post-Session Processing (since they share the same "extract a transcript at the end" semantics) and treats C as Pattern 1: Memory Middleware (real-time turn submission).
各层何时运行 —— 抽取轻量,整合自动
/turn, /process, and sessions.end are intentionally lightweight. They extract facts and a session summary from the transcript and persist them — that's it. The expensive work (cross-session dedup, clustering, diary deepening, decay) is scheduled automatically by the platform and is rate-limited so it doesn't run on every call.
| Layer | When it runs | Triggered by | Cost |
|---|---|---|---|
Sync mood update (Pattern 1 /turn only) | Inline, ~300–500ms | Your .turn() call | Light — one short LLM call |
| Background extraction (facts, personality, habits) | 5–15 seconds after /turn | Automatic — no caller action | Light — one LLM call per chunk |
| Fact extraction + session summary (batch) | Inline, on every /process or sessions.end({ messages }) | Your call | Light — one LLM call per chunk |
| Post-session consolidation (dedup, crossref, bundle precompute, pattern detection) | ~8 hours after the session ends | Automatic | Medium |
| Daily consolidation + diary | Once per day | Automatic schedule | Medium |
| Deep consolidation (wakeup/habit dedup, decay, cluster reconcile, weekly summaries) | Daily / weekly | Automatic schedule | Heavy |
This means you can call /turn per turn (Pattern 1), or /process once at the end (Pattern 2), without paying for heavy consolidation each time. The platform de-duplicates and consolidates in the background.
Practical implication
Don't try to "save calls" by skipping /turn between turns. Each call only does sync mood + queues deferred extraction (cheap). Skipping it means losing per-turn behavioral signal. The expensive consolidation runs on its own schedule no matter how many times you call.