Skip to main content

Technology · Architecture Reference

The Relationship Layer for consumer AI agents — how it actually works.

Sonzai is the persistent cognitive substrate that sits between any LLM and any application. Eight composable modules — memory, personality, mood, relationships, knowledge, learning, media, and an agent runtime — on managed infrastructure most teams would otherwise spend a year building. This page is the system, end to end.

Modules
8 composable
Surfaces
6 SDKs / protocols
Adoption
5 deploy modes
Providers
Gemini · GPT · Claude · Grok
01What Sonzai isin one diagram

A composable substrate between any LLM and any application.

Sonzai is a multi-tenant AI Relationship Layer: a managed substrate that sits between any LLM and any application. Stateless at the edge, stateful at the core, and provider-agnostic — drop it in front of Gemini, GPT, Grok, Claude, or any combination, and the agent retains memory, personality, mood, relationships, and learning across every session.

OverviewFig. 1 — Three-tier view
Your application
Web · mobile · API · your product
Holds the user, the UX, the business logic.
↓ SDK · MCP · REST · plugins ↓
Sonzai · managed mind layer
Mind layer · 8 modules
Memory Layer
Personality
Knowledge Base
Learning Systems
Relationship
Media Generation
Mood & Emotion
Agent Runtime
Each module independently consumable. Memory Layer stands alone.
Managed infrastructure
  • · Transactional + columnar stores
  • · Vector / entity / temporal index
  • · Message queue · DLQ · retries
  • · Distributed compute (extract/embed)
  • · Per-user model-weight store
  • · Background scheduler · key vault
  • · Cost ledger · observability · evals
↓ routed · failover · BYOK or hosted keys ↓
Provider
Gemini
Provider
OpenAI · GPT
Provider
Anthropic · Claude
Provider
xAI · Grok

Core principle

Memory is the floor of an agent's mind, not the whole of it. Sonzai treats memory, personality, mood, relationships, knowledge, learning, media, and orchestration as one composable substrate — because in practice they're entangled, and pretending otherwise leaks complexity into every application that tries to do it itself.

02The Stack8 modules, independently consumable

Eight modules. Pick one, pick all.

Each layer is independently consumable through the same SDK. The Memory Layer can stand alone. Personality, Mood, and Relationships compose on top. Learning Systems rewrite every layer below them. Agent Runtime caps the stack with provider-agnostic orchestration.

The 8-module stackFig. 2 — Modular by design
Provider · LLM
Gemini · GPT · Claude · Grok
08
Agent Runtime
Provider-agnostic orchestration · tool calling · SSE streaming · multi-agent scenes · auto-failover
07
Personality
Big-5 (OCEAN) traits · evolution over time · per-scenario overlays · cross-agent composition
06
Mood & Emotion
4-D affective vector (happiness · energy · calmness · affection) · theme detection · contagion
05
Relationship
Directional bond scores · shared-memory channels · privacy tiers (private / shared / public)
04
Knowledge Base
Tenant- + project-scoped knowledge store · ACL-gated retrieval & write tools
03
Learning Systems
Reinforcement (shadow → live) · self-learning · federated / cross-agent concept catalog
02
Memory Layer· standalone-usable
Atomic facts · hierarchical tree · confidence decay · embedding-dedup · nightly consolidation
01
Media Generation
Image · video · TTS · music · SFX through best-of-breed providers, one orchestration layer
Your application
Via SDK (Python · TS · Go) · MCP · REST · framework plugins
ModuleWhat it doesWhat it enables
Memory LayerCite-and-verify extraction of atomic facts, ranked by confidence, decayed on the Ebbinghaus curve, deduped by embedding similarity, consolidated nightly.Recall that ages gracefully.
PersonalityBig-5 trait tracking with evolution deltas, full history, and per-scenario overlays.A self that drifts with experience.
Mood & EmotionLive 4-D affective vector with theme detection per turn and contagion across multi-agent scenes.Affective state, not robotic transcript.
RelationshipDirectional love/trust scores per pair · shared-memory channels · 3 privacy tiers.Multi-agent dynamics, not just calls.
Knowledge BaseTenant- and project-scoped knowledge store with retrieval + write tools, gated by access control.Grounded, ACL-aware retrieval.
Learning SystemsRL with shadow-model promotion · nightly self-learning · federated cross-agent concept catalog.Agents that improve in production.
Agent RuntimeProvider-agnostic LLM orchestration with tool calling, SSE streaming, and priority failover.One integration, zero SPOF.
Media GenerationImage, video, TTS, music, and SFX through best-of-breed providers.Multi-modal output, one client.
03The Managed Platformthe part beneath the API

A production agent stack is a dozen distributed systems wired together. Sonzai operates all of them.

The Relationship Layer is the part visible through the API. Underneath sits the infrastructure that makes a multi-tenant, evolving, learning agent platform actually work in production. Transactional and columnar stores for state, a hybrid index for retrieval, a cache for hot context, a queue with DLQ and retries for async work, a worker pool for extraction and embedding, a versioned per-user weight store for RL policies, a scheduler for cadence-driven jobs, a key vault for credentials, a cost ledger for guardrails, observability for traces and metrics, and an eval gate for releases.

DATA · TRANSACTIONAL
Transactional store
Multi-region · ACID · async replication
3 replicas · RPO ≪ 1s
DATA · COLUMNAR
Columnar memory
Per-tenant · time-series · append-only
Shard by tenant · auto-rebalance
DATA · HYBRID INDEX
Vector + entity + time
Cosine · BM25 · entity graph · range
Hybrid retrieval · reranked per-tenant
CACHE · IN-MEMORY
In-memory cache
Hot context · read-through · TTL
p99 < 5ms · > 98% hit ratio
QUEUE · MESSAGES + DLQ
Message queue + DLQ
Idempotent · ordered · exp. backoff
DLQ inspection · manual replay
COMPUTE · DISTRIBUTED
Distributed compute
Extract · embed · infer · CPU + GPU
Spot-capable · autoscaled
STATE · PER-USER WEIGHTS
Per-user weight store
RL policy · overlays · LoRA deltas
Versioned · hot-load · < 1s rollback
JOBS · SCHEDULER
Background scheduler
Nightly · hourly · decay sweeps
Adaptive cadence · dormant ≈ 0 cost
SECURITY · KEY VAULT
Multi-tenant key vault
BYOK + hosted · per-project isolation
Envelope-encrypted at rest
FINOPS · COST LEDGER
Cost ledger + limiter
Per-user · per-day · per-month caps
Hard caps · soft alerts before bills
QUALITY · OBSERVABILITY
Observability
Tracing · metrics · logs · per-tenant
OTel-native · OpenSearch sink
QUALITY · EVAL GATE
Release eval gate
Believability · KB · EQ · social · goal
SOTOPIA 6-dim · blocks regressions

If you DIY vs. with Sonzai

If you DIY
With Sonzai
Pick a queue, tune retries, run dead-letter handling.
Already wired and idempotent.
Shard a database per tenant, replicate cross-region.
Multi-tenant isolation out of the box.
Stand up a vector index and keep embeddings fresh.
Vector + entity + temporal indexes managed.
Cluster GPU/CPU compute for extraction and embedding.
Distributed compute pool, autoscaled.
Store and version per-user RL weights and overlays.
Per-user weight store, hot-swappable.
Schedule nightly consolidation, hourly decay, sweep jobs.
Background scheduler runs the full cadence.
Build a cost ledger before usage explodes.
Per-user / per-day / per-month caps included.
Wire eval gates so quality doesn't regress on model swaps.
SOTOPIA 6-dim gate runs on every release.

Order of magnitude

The left column is the work a platform team typically takes 12+ months to stand up and another 12 to harden. Every row of that work is already running under the API.

04Beyond Vector RAGhow Sonzai actually retrieves

Plain RAG embeds, top-k's, and hopes. Sonzai's retrieval is agentic.

The model reasons about what it needs to know, chooses which memory tools to call, inspects the results, and iterates until it has enough context. This is the ReAct loop applied to memory — not just to web search.

The ReAct loop — applied to memory
  1. REASONLLM judges the user turn and decides which memory tools it needs.
  2. ACTIt calls one or more tools — by entity, by time, by relationship, by mood.
  3. OBSERVEHybrid index returns ranked, deduped, confidence-scored hits.
  4. REFINEIf under-confident, the LLM picks another tool and iterates.
  5. RESPONDGrounded reply. Every claim traceable to a source memory.
Post-turn (async): atomic facts extracted · mood updated · personality drifted · retrieval reinforced · consolidation queued.
Memory tools · chosen per turn
  • recall(query, top_k, filters)
  • recall_shared_memories(with_id)
  • recall_by_entity(entity_id)
  • recall_by_time(start, end)
  • check_emotional_alignment(topic)
  • check_relationship_state(user_id)
  • recall_personality_drift(window)
  • search_knowledge(query, project)
  • remember_fact(text, refs, confidence)
  • // + project-defined custom tools

Vector RAG vs. agentic retrieval

Plain Vector RAG
Sonzai agentic retrieval
Single embedding query, top-k dump.
ReAct loop: reason → choose tool → observe → refine.
Whole-document chunks, semantic-only.
Atomic facts with entity, temporal, and confidence dimensions.
One index, one signal.
Hybrid: vector + BM25 + entity graph + temporal range.
Static — every query treated the same.
Tool-calling agent picks recall, recall_shared, check_emotional_alignment per turn.
Hallucinations leak through.
Cite-and-verify — every fact traceable, filtered before storage.
Stale or contradictory facts coexist silently.
Polarity groups form on contradiction; confidence decays; consolidation resolves.
Same answer regardless of relationship or mood.
Retrieval is context-conditioned on relationship, mood, personality, goals.
No learning.
Retrieval reinforces — hits boost confidence, misses decay it.

The mental model

“Sonzai treats memory the way a reasoning agent treats the world — as something to interrogate, not something to flush into the prompt.”

05Integration Patternsfive flows, one platform

Five patterns. Same Relationship Layer. Pick the shape that fits your stack.

Each pattern is independently usable. Adopt one and you can graduate to another without re-platforming — the surface area changes, the substrate doesn't. Start with Process for the lightest touch, move to Sessions when you want lifecycle, or go straight to Agent Chat for the full hosted runtime. Hermes and OpenClaw are config-flips if you're already on those agent frameworks.

Flow 01

Process Endpoint

Memory layered onto an existing chat stack — one POST per turn

Use when

You already run your own LLM and chat — you want memory, personality, and learning layered on top without replacing what you have.

You own

The LLM call · the response stream · the UI.

Sonzai owns

Fact extraction · memory persistence · mood / personality / relationship deltas.

Python
from sonzai import Client
sz = Client(api_key=os.environ["SONZAI_API_KEY"])

async def handle_turn(user_id, agent_id, messages):
    # Sonzai extracts facts, persists memory, applies deltas — returns audit info.
    deltas = await sz.agents.process(
        agent_id=agent_id, user_id=user_id, messages=messages,
    )
    return deltas  # { memories_created, facts_extracted, mood, personality, relationship }
Flow 02

Real-Time Sessions

Your chat UI, our memory lifecycle — explicit start / per-turn / end

Use when

You want explicit per-conversation lifecycle — a clean start, per-turn enrichment and extraction, end-of-session consolidation.

You own

The LLM call · the message stream.

Sonzai owns

Context retrieval · per-turn extraction · async consolidation on close.

TypeScript
const s = await client.agents.sessions.start({ agent, user, sessionId });

for (const message of stream) {
  const ctx = await s.context({ query: message });   // 7-layer enriched system block
  const reply = await yourLLM([ctx.systemBlock, message]);
  await s.turn({ messages: [message, reply] });      // async extract + learn
}
await s.end({ messages });                            // triggers consolidation
Flow 03

Agent Chat Endpoint

Full hosted runtime — SSE deltas, tools, multi-provider failover

Use when

Greenfield apps that want a complete agent in one call — streaming, tool calling, side-effect events for memory mutations.

You own

UI only.

Sonzai owns

Context assembly · LLM orchestration · tool dispatch · memory persistence · provider fallback.

Python
async for evt in client.agents.chat(
    agent=agent, messages=[...], stream=True, tools=[...]
):
    if   evt.type == "delta":     render(evt.text)
    elif evt.type == "tool_call": handle_tool(evt)
    elif evt.type == "complete":  show_usage(evt.usage)
Flow 04

Hermes Plugin

Drop-in for Nous Research's Hermes Agent — two lines of YAML

Use when

You already run Hermes Agent and want the Relationship Layer added with two lines of YAML and zero handler changes.

You own

Hermes config.

Sonzai owns

Memory recall on prefetch · fact extraction after each turn · intelligent context compression on overflow.

config.yaml
# Two plugins, cooperating:
# Memory Provider runs every turn; Context Engine fires only on token-budget hit.
plugins:
  memory: sonzai
  context: sonzai
sonzai:
  api_key: ${SONZAI_API_KEY}
Flow 05

OpenClaw Plugin

Drop-in for OpenClaw agents — config-flip, zero code

Use when

You run OpenClaw and want server-backed enrichment instead of the default local Markdown memory.

You own

OpenClaw config.

Sonzai owns

The full Context Engine lifecycle — bootstrap, assemble, afterTurn, compact, dispose.

openclaw.json
{
  "contextEngine": "sonzai",
  "sonzai": {
    "apiKey": "<your-key>",
    "audit": true           // composio_app + request_id captured
  }
}

Choosing a flow

Match your existing stack to the flow that touches it least. All five share the same backend, the same primitives, and the same per-user state — moving between them is a code change, not a re-platforming.

If your stack looks like…UseWhy
Your own LLM and chat — just need memory.01 · ProcessLowest surface area. One call per turn.
Your own chat, but you want session lifecycle.02 · SessionsExplicit start / turn / end for clean conversation boundaries.
Hosted agent — streaming, tools, the whole runtime.03 · Agent ChatFull runtime in one SSE call.
Hermes Agent.04 · HermesTwo YAML lines, zero code.
OpenClaw.05 · OpenClawConfig-flip, zero code.

Design choice

All five flows share the same Relationship Layer underneath. Moving between them is a code-level change, not a re-platforming — per-user state, learned weights, and accumulated memory all carry across.

06Per-User Model Weightsinference personalised at the weight level

The agent on day 90 is not the agent on day 1. It has learned this user specifically.

Most platforms ship a single model that serves every user the same. Sonzai stores per-user reinforcement-learning policy weights and personality overlays, hot-loaded into the inference path. The substrate to do this safely — shadow rollouts, promotion gates, versioning, rollback — is the kind of thing teams spend a year building. Sonzai ships it.

Inference path · request time
User turn → load user_id's policy + overlay → assemble prompt → LLM call
Hot-loaded from weight store · <5ms overhead · cache-aware
Per-user weight store
user_id → versions
  • v18 · shadow ░░░░░
  • v17 · live ▓▓▓▓▓
  • v16 · prev ▓▓▓▓▓
  • v15 · prev ▓▓▓▓▓
RL policy heads · personality overlays · LoRA deltas. Hot-swappable. Rollback in <1s.
Shadow · Live
Graduated promotion
  • Shadow model scored vs current live on real traffic.
  • Confidence ranking with turn-by-turn deltas tracked.
  • Promotion at 1% → 10% → 50% → 100%.
Auto-rollback on regression.
Promotion gate
SOTOPIA 6-dim eval
  • · Believability
  • · Relationships
  • · Knowledge
  • · Social Rules
  • · EQ
  • · Goal Completion
Every release. Every user.

What this changes

Most platforms ship one model that serves every user identically. With per-user policies, the effective model becomes a different one per user over time — safely, with shadow rollout, eval-gated promotion, and sub-second rollback. Personalisation at the weight level, not just the prompt.

07A Single Turn, End-to-Endwhat actually happens

One request in. One response out. Eleven things in between.

The full lifecycle of a single user turn in Managed Runtime mode. Steps 1–6 are synchronous (in the request path). Steps 7–11 are asynchronous (queued, eventually consistent).

StepSyncWhat happens
1 · Auth & routeTenant + user resolved. Rate limiter checked. Provider keys vault hit.
2 · Load per-user weightsRL policy + personality overlay hot-loaded from weight store (§06).
3 · Agentic retrievalReAct loop — LLM picks memory tools, queries hybrid index, refines (§04).
4 · Context assemblyMemory + mood + relationship + personality + knowledge composed into prompt.
5 · LLM call with failoverMulti-provider router; priority list; cascade on quota exhaustion.
6 · Stream response + tool callsSSE to your app. Tool calls intercepted, audited, returned.
7 · Cite-and-verify extractNew atomic facts extracted, verified against turn source, scored, stored.
8 · Mood + personality driftAffective vector updated. Big-5 deltas applied.
9 · Relationship updateBond scores adjusted. Shared-memory channels checked.
10 · Reinforcement learningRL signal recorded. Shadow model scored. Promotion considered.
11 · Consolidation queueTurn queued for nightly consolidation, decay sweeps, polarity-group formation.
08SDKs & Integration Surfacessix ways to plug in

Same primitives. Six surfaces. Pick what fits your stack.

SurfaceForShape
Python SDKBackend services · batch jobs · eval pipelinesclient.agents.chat(...) — sync & async
TypeScript SDKNode · Bun · Deno · edgeZero-dependency, isomorphic. Same surface area.
Go SDKHigh-throughput infrastructureNative client for Go runtimes.
MCP ServerAny MCP-compatible hostMemory, knowledge, and tool primitives as MCP servers.
Framework PluginsHermes · OpenClaw · similarDrop-in plugin auto-injects <sonzai-context>. No code change.
REST APIAnything elseOpenAPI-spec'd, language-agnostic.
TypeScript
import { Sonzai } from "@sonzai-labs/agents";
const sz = new Sonzai({ apiKey: process.env.SONZAI_API_KEY });

const stream = await sz.agents.stream({
  userId,
  message,
  scene: "front_of_house",
  providers: ["claude-3.5", "gpt-4o"],
  tools: ["composio.gmail", "kb.search"],
});
for await (const chunk of stream) yield chunk.text;
Go
import sonzai "github.com/sonz-ai/sonzai-go"

sz, _ := sonzai.New(sonzai.WithAPIKey(os.Getenv("SONZAI_API_KEY")))

facts, _ := sz.Memory.Recall(ctx, &sonzai.RecallReq{ UserID: uid, Query: msg })
// ... your LLM call, with facts injected ...
sz.Memory.ExtractAsync(ctx, uid, transcript)

Deployment modes — adopt what you need

ModeSonzai ownsYou own
Standalone MemoryMemory · Personality · Mood (via 2 calls / turn)LLM call · orchestration · UX
Drop-In RuntimeThe full request loop · all 8 modules · failover · toolsUX · auth · business logic
Edge / LocalOn-device semantic memory · privacy-sensitive flowsEverything else
Research / BenchmarkEval harness · SOTOPIA scoringYour candidate memory backend
Bring-Your-Own-KeyRouting · failover · all behavioral systemsProvider keys · provider billing
09Architectural Choicesthe nine that compound

None of this is one feature. It's nine choices that compound.

Each item below is a deliberate design choice in the substrate. None of them is novel in isolation — retrieval, evals, RL, fallback all exist elsewhere. The substrate is what's hard: making them work together, per-tenant, under production load, with rollback. Read these as the nine commitments the platform has already made, so applications built on top don't have to.

Agentic, multi-signal retrieval

ReAct loop over hybrid vector + BM25 + entity + temporal indexes. The LLM picks tools per turn. Not RAG-on-vector-soup.

Confidence-aware memory ranking

Facts carry decay curves. Retrieval reinforces them. Contradictions form polarity groups instead of silently overwriting.

Adaptive consolidation cadence

Dormant users pay near-zero. Heavy users get more passes. Cost scales with engagement, not headcount.

Cross-tenant concept catalog

Cheap models inherit frontier-model quality via grounded retrieval. The largest economic lever in the stack.

Cite-and-verify pipeline

Every extracted fact is traceable to its source turn. Hallucinated facts are filtered before storage.

Multi-provider failover by priority

Automatic cascade on quota exhaustion. Single point of integration, zero single point of failure.

Per-user model weights, hot-loaded

Each user's agent becomes a different model over time. Shadow rollout, promotion gates, rollback all managed.

SOTOPIA-gated releases

6-dim behavioral scoring — Believability, Relationships, Knowledge, Social Rules, EQ, Goal Completion — on every release.

Workbench = production, accelerated

What you evaluate in minutes of simulated time is exactly what runs in production. Same code path.

10Enterprise Readinessfor the architect doing the due diligence

Built so the on-call rotation doesn't become your problem.

The questions that actually decide whether a stack ships to production aren't about features. They're about what happens at p99, what happens when a provider 429s, what happens when finance asks where the bill came from, and what happens when legal asks where the data sits. Sonzai answers each one before you have to.

p99 < 5ms
Hot context cache
98%+ hit ratio · in-memory · per-tenant
< 200ms
Context assembly p95
Full 7-layer context · not just vector lookup
< 1s
Weight rollback
Any user · any version · hot-swap
RPO ≪ 1s
Transactional store
3-replica · multi-region · async replication
≈ $0
Dormant cost floor
Adaptive cadence · idle agents charge nothing
Zero
Single points of failure
Multi-provider failover · DLQ · auto-retry

Security & isolation by construction

Isolation lives in the storage layer, not the API. BYOK belongs in the platform, not the wrapper. Audit lives in the substrate, not the integration. The boundaries an architect cares about are drawn before the request reaches the model.

Request boundaryFig. 10 — Isolation walls per request
Tenant A
User a1 · agent persona X
namespace: t_A · u_a1
Tenant B
User b7 · agent persona X
namespace: t_B · u_b7
↓ enforced at storage, key vault, retrieval ↓
Storage layer
Queries scoped to (tenant_id, user_id) at the index level. Cross-tenant reads are physically impossible.
Key vault
Provider keys envelope-encrypted, scoped per project. Plaintext keys never persist.
Retrieval
Hybrid index reranked per tenant. Shared-memory channels require explicit ACL grant.

Per-tenant isolation

Every tenant gets its own data namespace across the transactional store, columnar memory, vector index, and key vault. Queries can never cross the tenant boundary — enforced at the storage layer, not just at the API.

BYOK & envelope encryption

Bring your own provider keys. They land in a multi-tenant key vault, envelope-encrypted at rest, with per-project isolation. We never persist plaintext provider credentials. Sonzai's 25% service fee still applies on routed work units; you keep the provider relationship.

Per-user memory isolation

User A's conversations, mood history, and relationship state are completely walled off from User B's — even when both talk to the same shared persona. Memory namespaces are per-user under the persona; the persona is shared, the state is not.

Cite-and-verify audit trail

Every extracted fact is traceable to the source turn that produced it. Every tool call has a request_id. Every promotion goes through an evaluated gate. Audit trail is built into the substrate — you don't bolt one on.

Cost guardrails

Per-user, per-day, per-month caps stop runaway spend before it happens. Soft alerts fire before hard caps trigger. Cost ledger surfaces every line of consumption — by tenant, by user, by agent, by provider.

Data residency & exit

Your data is yours. Single-region pinning available; full export via API at any time; deletion enforced down to the columnar store and weight overlays. No proprietary file format — exports are documented JSON.

The architect's promise

We've already drawn the lines the security review is going to ask about. The boundary between tenants is in the storage, not the API. The boundary between you and the model provider is in BYOK, not in our trust assertions. The audit trail is the substrate, not a feature we'll add later.

11Open by Defaultno lock-in lives anywhere in the stack

The fastest way to die in this space is to lock customers in. So we didn't.

Every place a vendor could insert a wedge — the model, the provider keys, the SDK, the data format, the protocol — we left open. Sonzai is worth using because it's better, not because you can't leave.

Provider-agnostic

Bring your own LLM (BYOLLM)

Route across Gemini, GPT, Claude, Grok — or any OpenAI-compatible endpoint. Failover is configurable; we don't pick winners.

Recommended

Bring your own key (BYOK)

Pay the provider directly at your negotiated rates. Use your volume discounts and rate limits. Audit our routing line-by-line against the provider invoice.

On-prem friendly

Bring your own model (BYOM)

Run self-hosted Llama, Qwen, DeepSeek, or your own fine-tunes via vLLM, Ollama, or TGI. Sonzai handles orchestration; you keep the weights inside your perimeter.

Spec-first

Open protocol surfaces

REST + OpenAPI. MCP servers for memory, knowledge, and tool primitives. Framework plugins for Hermes and OpenClaw. No SDK lock-in — every primitive is reachable over the wire.

Your data is yours

Exit on your terms

Full data export at any time: memories, knowledge, user weight overlays, evaluation history. Documented JSON, not a black box. Migrate in, migrate out.

Graduate, don't re-platform

Composable adoption

Start with Standalone Memory, graduate to the full Runtime, drop in plugins where they fit. Adoption is not all-or-nothing — and the per-user state carries across modes.

Why this matters to your CIO

The procurement question isn't “will it work?”, it's “what happens in year three when we want to switch?” Sonzai is designed so the answer is short: you export your data, point your traffic somewhere else, and the switching cost is the integration time — not a hostage negotiation over your own state.

12Decision Frameworkthe architect's checklist, answered

The nine questions your architect is going to ask.

We've been through enough security reviews and vendor assessments to know the questions before they're asked. Here's the short answer to each — read this as the section you forward to your platform-ops counterpart.

Q · 01
Where is per-user state stored, and is it isolated?
Answer
Per-user namespaces across transactional, columnar, vector, and weight stores. Enforced at storage.
Q · 02
Can we keep our provider relationships and billing?
Answer
Yes. BYOK is recommended. Provider billing stays with the provider. We charge a 25% service fee on routed work units.
Q · 03
What happens when a provider rate-limits or fails?
Answer
Multi-provider failover by priority. Cascade is automatic; no app-side retry logic.
Q · 04
How is quality protected on model swaps?
Answer
SOTOPIA 6-dim eval gate on every release. Shadow → 1% → 10% → 50% → 100% promotion. Auto-rollback on regression.
Q · 05
How do we prevent runaway cost?
Answer
Per-user, per-day, per-month caps. Hard caps + soft alerts. Cost ledger surfaces every consumption line by tenant / user / agent / provider.
Q · 06
Is observability standards-based?
Answer
OpenTelemetry-native. Traces, metrics, logs per tenant. Sink to your existing OpenSearch / Datadog / Honeycomb.
Q · 07
How is dormant capacity priced?
Answer
Near zero. Adaptive scheduler cadence — idle agents and quiet users don't burn cost. You pay for work units consumed, not seats provisioned.
Q · 08
Can we self-host the model?
Answer
Yes — BYOM via vLLM / Ollama / TGI. Sonzai orchestrates; weights stay inside your perimeter.
Q · 09
What's the exit plan?
Answer
Full data export via API. Documented JSON. Deletion enforced down to weight overlays. Your data is yours.

Build vs. buy, plainly

Standing this up internally is twelve to twenty-four months of platform work — multi-tenant isolation, per-user weight stores, consolidation cadence, eval gates, BYOK plumbing, cost ledgers, DLQ wiring — before you've shipped a single agent feature. The product team is going to ask you for a date. Sonzai turns that date into next quarter.

The Relationship Layer

Give any LLM a mind.

One SDK. Five integration patterns. The same Relationship Layer underneath whether you adopt it as a memory sidecar, a session runtime, a hosted agent, or a plugin in Hermes or OpenClaw.