What is the Sonzai Relationship Layer?

The Relationship Layer is a managed runtime that gives AI agents persistent memory, evolving personality, mood, relationships, and a knowledge graph. Route supported workloads to Gemini, GPT, Claude, or an OpenAI-compatible endpoint; use self-hosted Llama, Qwen, or DeepSeek for private deployments. Sonzai holds the operating context across users, accounts, teams, and time. The signed-in model catalog is the source of truth for availability and price.

How is Sonzai different from Mem0, Letta, or Zep?

Sonzai unifies memory, personality, mood, and relationships in a single hosted API with native MCP support and SDKs in TypeScript, Python, and Go. Mem0 is memory-only and flat-vector. Letta is a self-hosted framework. Zep focuses on chat history. Sonzai is hosted, multi-surface, and ships with a Big Five personality and 4D mood model out of the box.

What latency does the Sonzai Relationship Layer deliver?

Retrieval latency depends on the selected deployment, data sources, and context policy. Sonzai records runtime telemetry for the assembled operating context so teams can verify the performance of their own route instead of relying on a universal marketing number.

Can I keep using my own LLM with Sonzai?

Yes. Route supported workloads to Gemini 3.6 or 3.5, GPT-5.6, Claude 5, another OpenAI-compatible endpoint, or a private Llama, Qwen, or DeepSeek deployment through vLLM, Ollama, or TGI. Exact availability depends on provider, region, and workspace policy; the signed-in model catalog remains canonical.

Which integration paths are supported?

REST API, an MCP (Model Context Protocol) server, and native SDKs for TypeScript, Python, and Go. Pick whichever fits your stack — they expose the same primitives.

What is a 'compounding agent'?

A compounding agent is an AI agent whose memory, knowledge, and relationship state grows with every interaction — instead of resetting between sessions. Over months it becomes more useful in your specific context, the way a tenured employee does.

Sonzai LabsDeveloper Platform · Stateful Agent Runtime

Your model changes.
Your agent keeps its mind.

Sonzai gives supported LLMs durable memory, company knowledge, identity, relationships, and governed tools. Change the generator without rebuilding the agent around it.

Gemini 3.6 / 3.5 · GPT-5.6 · Claude 5 · Llama / Qwen / DeepSeek · OpenAI-compatible endpoints

Open Developer Platform Read developer docs

Already signed in? The platform opens directly. Otherwise, sign in once and continue to the same Developer Platform destination.

Scroll

Inside the Mind Layer

Four capabilities, one coherent system.

Live views from the platform — tap through to see what's actually running when your agents think.

Structured facts from conversation

Relationship Layer · Memory

User fact timeline

6 facts

Session · 2 days ago

Feeling burned out from current work

emotionwellbeing

91%

Restored by a hike in nature last weekend

experiencerecovery

94%

Current role misaligned with growth expectations

beliefcareer

87%

Session · last week

Prefers work without constant digital interruption

preferenceenvironment

82%

Processes difficult feelings through physical activity

behaviorcoping

89%

Values autonomy and physical presence in work

identityvalues

78%

See Knowledge & Memory in full

Multiplayer Memory · agent + team graph

The Mind Layer's killer trick:
every agent shares one memory.

One typed knowledge graph for every agent on the project — and every human reading alongside them. Agents write back what they learn. Documents flow in. Newer information supersedes older facts, automatically.

—Intra + inter-agent — same graph, all your agents share it
—Typed facts (Person, Account, Decision, Promise, Document)
—Documents auto-ingest; agents auto-update the KB
—Newer information supersedes older facts, automatically

See multiplayer memory →Read the docs

agents · users · documents · factslive · 7 nodes

The Problem

EVERY AI
FORGETS

Amnesia

AI agents reset every session. Users repeat themselves. No relationship can form when memory is wiped clean.

No relationship

Every conversation is a first conversation. Trust never compounds. The agent meeting a returning user has no idea who they are, what's happened, or where the relationship left off.

Purely reactive

AI agents only speak when spoken to. Real relationships require someone who reaches out, checks in, and initiates.

Model lock-in

Fine-tune on one model and you're married to it. When a better model drops next month, you start over — retrain, re-evaluate, re-deploy.

The Solution

A LAYER THAT MAKES
ANY LLM REMEMBER

The Relationship Layer sits between your backend and any LLM. It owns the parts of an agent that should never reset — memory, personality, mood, relationships, knowledge — and exposes them through one streaming chat call. The model is interchangeable. The state is permanent and audit-traceable.

THE SONZAI DIFFERENCE

Same user message. Completely different experience.

Generic AI

User

I need to follow up with John about the Acme contract.

Generic AI

I can help with that. Who is John?

John from Acme. We spoke last week.

Got it. What specifically do you need help with?

Context Lost

Memory Store

No relevant context found.

Sonzai Agent

User

I need to follow up with John about the Acme contract.

Sonzai Agent

Sure, I've prepped a follow-up for John regarding the Acme contract renewal. It includes the revised terms we discussed last Tuesday. Would you like me to send it?

Context Maintained

Memory Store

Last InteractionAcme Contract Renewal Discussion (Tuesday)

Key ContactJohn (Decision Maker)

Current TaskFollow-up Drafted

Constellation Graph & Memory Store

John (Acme Corp)Contract RenewalRevised TermsFollow-up DraftedLead QualifiedAuto-enriched

Knowledge Base

Self-Improving Intelligence

Ingest your data. AI agents get smarter. Recommendations improve with every interaction.

Knowledge Graph

Your data becomes typed entities and relationships — not flat documents. AI agents search the graph, not just keywords.

Learns Over Time

Every conversation extracts new facts, verifies existing ones, and updates the knowledge graph. Your AI agents get smarter automatically.

Trend Analytics

Surface what's trending — top gainers, most active, emerging patterns — across configurable time windows. Your AI agents proactively flag what matters.

Works While You Sleep

Memory consolidation, graph updates, recommendation re-scoring, and fact deduplication run automatically in the background.

KEEP THE STATE. CHANGE THE MODEL.

The LLM is the generator. The Relationship Layer is the system of record for who your users are, what you've done together, and how the relationship has evolved. You upgrade the generator. The state stays put.

Route each workload to the model that fits it. Sonzai keeps identity, memory, company knowledge, policy, and action receipts outside the model so upgrades do not erase the operating context.

Google

Fast, multimodal agent loops

Gemini 3.6 Flash · Gemini 3.5 Flash-Lite

OpenAI

Frontier reasoning at three cost tiers

GPT-5.6 Sol · Terra · Luna

Anthropic

Agentic execution and long-running work

Claude Sonnet 5 · Claude Fable 5

Open models

Private and customer-hosted inference

Llama · Qwen · DeepSeek via vLLM / Ollama / TGI

Provider, region, and deployment determine exact model availability. The signed-in model catalog is the source of truth for routes enabled in your workspace.

The Architecture

KEEP THE MODEL
REPLACEABLE

Model selection should be a routing decision, not a data migration. Sonzai assembles the durable context before generation and records verified outcomes after each run.

Durable agent state

Identity, memories, approved knowledge, permissions, relationship history, and outcome evidence remain governed by your workspace.

Replaceable generation route

Choose a frontier, fast, specialist, or private model per workload. The route can change while the agent contract and accumulated state stay stable.

Context is assembled from explicit sources and policies, then passed to the selected generator with the tools that run is allowed to use.

After the run, verified facts, receipts, and outcomes can update the state. Model output alone does not silently become company truth.

ONE RUNTIME CONTRACT

The generator can change. The context, policy, state, and evidence lifecycle remains legible.

Assemble

Memory, knowledge, identity, relationship, and tool context

Generate

Stream through the model route selected for this workload

Act

Invoke approved tools with identity, policy, and receipts

Learn

Write verified outcomes back without tying state to one model

PLATFORM CAPABILITIES

Six runtime capabilities with explicit state and evidence boundaries. Hosted on api.sonz.ai, exposed through REST, MCP, and SDKs in TypeScript, Python, and Go.

Neuroscience-Based Emotional Modeling

Structured affect state

Affect modeling grounded in constructed emotion theory. Agents build emotional responses from core affect, context, and relational history — not canned sentiment labels.

Adaptive Personality

Big Five + predictive updating

Each agent maintains a predictive personality model — Big Five traits plus custom dimensions that update from social prediction errors. Companions, characters, and NPCs that evolve over time.

Context Retrieval

Measured per deployment

Reasoning-based retrieval assembles memories, knowledge, identity, and relationship context for the current turn. Runtime telemetry keeps the result inspectable for each deployment.

Parallel State Management

Lifecycle automation

Batched, real-time, near-real-time, windowed, and inference-time context features are computed at different intervals and collapsed into one governed context set at inference time.

Self-Improving Intelligence

Reviewed feedback loop

Ingest attributable data, surface recommendations, record verified outcomes, and promote useful patterns through review. Model output never silently becomes company truth.

Model-Agnostic Architecture

State outside the model

Memory, personality, affect, and relational state persist independently of any LLM. Change the generation route without discarding the agent's governed operating context.

BUILT FOR

AI Companions

Relationships that deepen over months

AI coaches, mentors, friends, and tutors with a Big Five personality that drifts, a 4D mood that responds, and a relationship narrative that compounds. Per-pair retrieval converges within weeks; the agent remembers this user differently from any other.

Big Five + BFAS personality · 4D mood · relationship narratives · per-pair retrieval predictor

Characters & NPCs

Agents that remember every player

Game characters, world NPCs, and platform avatars that hold a personality contract, a mood arc, and a per-player relationship state. The agent remembers each player differently — what they chose, who they fought with, what they care about — and grows with the world.

personality contracts · per-pair relationship state · shared world graph · 4D mood

Brand voices & B2B2C

Relational surfaces inside consumer products

Concierge, advisor, host, and brand-voice agents embedded inside consumer products (yours or a partner's). One persona, many users — wisdom shared safely, with a k-anonymised default and a server-side privacy floor for any cross-user signal.

wisdom (default-on) · sharedMemory (opt-in) · semantic privacy floor · disclosure audit

Test. Simulate. Evaluate.

EVALUATION SUITE

Simulate multi-session conversations with synthetic user personas, then evaluate agent quality with LLM judges. Catch regressions, safety issues, and personality drift before your users do.

Quality — Evaluation Run #847

Completed

Overall Score

Agent maintained consistent personality across 5 sessions

Category Scores

Personality Consistency92

Memory Accuracy88

Emotional Coherence85

Engagement Quality83

Safety & Boundaries91

Red Flags

Best Moments

Retention Prediction

84/100

Would return: Yes

A-Grade

+12points improvement 1st → 2nd half

Agent adapted tone, recalled user preferences, and deepened emotional engagement across sessions.

Simulation Config

Model

gemini-3.1-flash

Sessions

Turns

Total Cost

$0.04

Key Learnings

+Remembered user's job change and followed up
+Adapted humor style after negative reaction
+Proactively referenced shared context

Stagnation Areas

-Repetitive phrasing in empathy responses
-Slow to recover from topic changes

User Persona

Skeptical Early Adopter

Tests boundaries, challenges personality consistency, asks probing questions

Behavior Testing

Custom evaluation templates score personality consistency, memory accuracy, emotional coherence, engagement quality, and safety compliance.

Adaptation Grading

Measure how well AI agents learn within a conversation. Compare first-half vs second-half performance with A-F grading.

Safety & Red Flags

Automated detection of boundary violations, harmful responses, personality breaks, and off-character behavior before deployment.

Synthetic Personas

Test against diverse user types — skeptics, vulnerable users, boundary-pushers — with configurable synthetic personas.

Anatomy of a turn

ONE CALL.
THREE LAYERS.

Your backend keeps owning auth and business state. Sonzai owns agent intelligence. A single streaming chat call handles context assembly, AI generation, and post-chat learning — no orchestration on your side.

01 / Frontend

Your UI

React, Next, Vue, mobile, voice, Unreal — anywhere a user can type or speak. Sends raw messages.

owns: rendering
owns: input

02 / Your backend

Auth & app state

Wraps the chat call with user auth, business rules, billing, and any application context the agent should see this turn.

owns: identity
owns: business logic
owns: app data

03 / Relationship Layer

api.sonz.ai

Assembles context, streams the AI response, then rewrites memory, mood, personality, knowledge, and relationship state. All in one call.

owns: memory tree
owns: personality
owns: mood + relationships
owns: knowledge graph

user msgchat()SSE stream

Assemble

Memory, mood, personality, relationship narrative, and your app context — collapsed into the prompt.

Stream

Tokens stream back over SSE in OpenAI-compatible format, with route-specific latency visible in runtime telemetry.

Learn

Facts extracted, mood drifted, personality micro-shifted, relationship updated — on our side, after the response.

Relationship Layer owns

· Memory tree (facts, episodes, summaries)
· Big Five + BFAS personality + drift history
· 4D mood (happiness, energy, calmness, affection)
· Relationship narratives per user
· Knowledge graph (typed entities + edges)
· Habits, goals, scheduled wakeups

You keep

· User authentication + sessions
· Business logic + workflows
· Application data + permissions
· Billing + subscriptions
· Your choice of LLM
· Your UI, your brand, your stack

Continuous learning — per (agent, user) pair

THE AGENT GETS SHARPER
BY THE SESSION.

sessions.End() fires an async pipeline that extracts facts, deduplicates, drifts personality, refreshes mood, scores quality, and retrains per-pair retrieval. By the next session, the agent remembers, knows what worked, and is tuned for that user.

Roll your own

· Vector store + retrieval
· Dedup + conflict resolution
· Personality + mood engine
· Reward signal + eval harness
· Training + evaluation pipeline
· Shadow rollout + auto-revert
· Drift monitoring
· Per-user tuning loops
· Prompt sweeps + regression tests
· On-call for runaway behaviour

≈ 12 months of platform work

Live

With Sonzai

sessions.End()

One call ends the session. The full pipeline runs on our side. By the next read, memory, personality, mood, and per-pair retrieval are already updated.

Fact extraction + verify

Dedup + conflict resolve

Personality drift

Mood baseline update

Diary generation

Quality scoring

Cluster reconciliation

Retrieval re-tune

≈ one afternoon

Underneath the SDK

Six adaptive loops, all running per pair.

Operated by Sonzai's applied-research team behind a stable SDK. No training infrastructure on your side.

SGD + momentum

Per-pair retrieval predictor

Every session, an SGD update with momentum adjusts the dimensions the predictor weighs, using the LLM's actual fact reuse as the gradient signal. Asymmetric learn/forget rates prevent collapse on a noisy day.

A2C in shadow

Policy gradient with auto-revert

A TD(0) critic feeds an A2C actor. The trajectory runs in shadow alongside the SGD baseline; only sustained improvements graduate to production. Regressions auto-revert.

Thompson sampling

Cluster bandit

Each retrieved fact carries a cluster identity. Beta-distributed posteriors are updated per cluster from session reward. Useful clusters get sampled more next session; cold ones get probed less.

Hebbian

Co-access edges

Co-accessed memory nodes grow associative edges, weighted by repeated co-occurrence. Edges cross per-user and per-agent-wisdom partitions, so personal traversal pulls in broader world knowledge.

Ebbinghaus

Spaced retention

Long-horizon retention follows a spaced-repetition decay curve. Frequently-recalled facts strengthen; cold facts decay — but high-importance facts floor at a retention threshold so the agent never forgets what matters.

OPRO

Prompt optimisation

Claim-level F1 against curated fixtures, a meta-LLM proposing targeted revisions for the worst failure modes, and the strongest variant surviving. The pipeline picks up the new prompt — no deployment on your end.

Hyperparameter auto-tuning

Per-pair scheduler watches divergence and plateau signals across recent sessions. Healthy pairs get nudged up; unstable pairs are damped down.

Memory-tree self-organisation

Hot nodes get promoted, oversized branches split, sparse siblings merge, stale parent descriptions are regenerated by a bounded LLM pass.

Grounding verification

Every extracted fact must cite a source message index and verbatim quote. A mechanical verifier rejects facts that fail attribution. Hallucinated memory never reaches the store.

The compounding curve

Same agent. Same prompt. Diverges with use.

Day 1

Ready out of the box

Verified extraction, dedup, clustering, and behavioural updates running from the first turn.

Week 1

Responsive, adapting

Confidence has moved on facts the user actually cares about; mood is responding; patterns forming.

Month 1

Personalised

Per-user retrieval converged; personality overlay diverged; this user is visibly remembered differently.

Year 1

Long-term partner

Compact, navigable memory; milestones earned; reflective diary; recurring-event awareness; retrieval sharper than day one.

Zero training code. Zero per-user logic. You called sessions.End() and went home.

Multiplayer memory

A COMPANY BRAIN.
A TEAM BRAIN. BOTH.

The default agent memory model is per-pair. The moment you have more than one agent or more than one user per agent, you want memory to cross the boundary — in controlled, audited ways.

Axis 01 / Inter-agent

Closed-loop
company brain

Agent A learns with user X. Agent B picks it up the next session — even with a different user, even on a different topic. The whole project gets sharper, not just one pair.

knowledgeBaseagents read the project KB

knowledgeBaseWriteagents write verified facts back

knowledgeBaseScopeModecascade reads org-wide policies

Server-side: schema validation per write · CAS update · audit trail (source = agent:<id>) · soft-delete

Axis 02 / Intra-agent

Team brain across
the users it serves

One agent, many users. The agent informs user A with context it gathered while talking to user B — attributed when you opt in, k-anonymised by default.

wisdomdefault-on, k-anonymised, daily promote

sharedMemoryopt-in, attributed: who is doing what

Privacy floor: compensation, health, politics blocked server-side · discretion clause in prompt · disclosure audit on every fact load

Five valid combinations.

Two independent capability axes. Pick the one that fits your product shape.

Inter-agentIntra-agentWhat you get

OffOffPer-pair memory only. Right default for 1:1 companion products.

ReadOffAgents ground replies in your KB. Standard read-only assistant.

Read + writeOffClosed-loop world knowledge. Agents capture verified facts; every other agent benefits.

OffOnTeam brain — one agent, multiple users. No shared world knowledge across agents.

Read + writeOnFull multiplayer memory. Closed-loop world + team brain. Best for shared-business contexts.

DEPLOY ANYWHERE

Native platform adapters with unified state management. AI agents maintain consistent personality and memory across every channel.

Mobile Apps

iOS and Android via REST API with SSE streaming for real-time agent responses

Game Engines

Unity and Unreal integration for NPC dialogue, quest systems, and dynamic storytelling

Messaging

Telegram, WhatsApp, Discord bots with rich interactions and group support

REST API

Full programmatic access. OpenAI-compatible streaming format for easy integration

Engineering Q&A

THE QUESTIONS WE GET ASKED.

Where does my data live? What does my backend keep?

Your backend keeps user auth, business logic, application data, billing, permissions, and session management. The Relationship Layer keeps memory facts, the memory tree, Big Five + BFAS personality and drift history, 4D mood, relationship narratives per user, and the project knowledge graph. The split is documented in the Architecture page.

How does post-chat learning work without me wiring it?

Calling sessions.End() (or the equivalent SDK call) enqueues an async pipeline that extracts facts with source-anchored verification, deduplicates against existing memory, drifts personality, updates mood baseline, generates a reflective diary entry, scores session quality, and re-tunes per-pair retrieval. By the next read, the new state is already there. In dev you can pass Wait: true to block until the pipeline finishes.

Are facts source-anchored, or can the LLM hallucinate memories?

Every extracted fact must cite a source message index and a verbatim quote from the user's turn. A mechanical verifier rejects facts that fail substring or attribution checks; rejected facts feed back as a self-correcting hint on retry. Hallucinated memory does not reach the store.

What's the latency budget per chat call?

The chat call streams tokens over SSE in OpenAI-compatible format. Context assembly and first-token latency are recorded for the selected model route and deployment; post-chat learning runs asynchronously after the response.

Can different users on the same agent share memory?

Yes, two ways. wisdom is on by default and produces de-attributed, k-anonymised cross-user generalisations. sharedMemory is opt-in, attributed (names visible), and gated by a server-side privacy floor that blocks compensation, health, and politics writes. Every disclosure is logged to an audit endpoint.

What about agents on the same project? Do they share knowledge?

Set knowledgeBase: true and the agent reads from the project knowledge graph during conversations. Set knowledgeBaseWrite: true and the agent records verified facts back, with a full audit trail (source = agent:<agent-id>) and CAS update semantics so admin edits aren't clobbered. Set knowledgeBaseScopeMode: cascade and reads cover both project KB and a tenant-wide org KB — project wins on collisions.

Which models work?

Sonzai keeps durable agent state independent from the generator. Current frontier families include Gemini 3.6 Flash, GPT-5.6, Claude Sonnet 5, and Claude Fable 5, plus private Llama, Qwen, DeepSeek, and fine-tuned models behind OpenAI-compatible endpoints. Exact availability depends on your provider, region, and deployment; the signed-in model catalog shows the routes actually enabled for your workspace.

How do I integrate it?

Three peer surfaces, pick whichever fits your stack: REST API, an MCP (Model Context Protocol) server, or native SDKs in TypeScript, Python, and Go. They expose the same primitives — Agents, Chat, Memory, Personality, Mood, Sessions, Wisdom, Knowledge.

Every claim on this page is grounded in our developer docs.

Read the docs

READY TO BUILD?

Tell us about your project and we'll help you find the right setup for the agent you're building.

[email protected]

Your model changes.Your agent keeps its mind.

Four capabilities, one coherent system.

The Mind Layer's killer trick:every agent shares one memory.

EVERY AIFORGETS

Amnesia

No relationship

Purely reactive

Model lock-in

A LAYER THAT MAKESANY LLM REMEMBER

THE SONZAI DIFFERENCE

Self-Improving Intelligence

Knowledge Graph

Learns Over Time

Trend Analytics

Works While You Sleep

KEEP THE STATE. CHANGE THE MODEL.

Google

OpenAI

Anthropic

Open models

KEEP THE MODELREPLACEABLE

ONE RUNTIME CONTRACT

PLATFORM CAPABILITIES

Neuroscience-Based Emotional Modeling

Adaptive Personality

Context Retrieval

Parallel State Management

Self-Improving Intelligence

Model-Agnostic Architecture

BUILT FOR

AI Companions

Characters & NPCs

Brand voices & B2B2C

EVALUATION SUITE

Behavior Testing

Adaptation Grading

Safety & Red Flags

Synthetic Personas

ONE CALL.THREE LAYERS.

Your UI

Auth & app state

api.sonz.ai

Assemble

Stream

Learn

THE AGENT GETS SHARPERBY THE SESSION.

Six adaptive loops, all running per pair.

Per-pair retrieval predictor

Policy gradient with auto-revert

Cluster bandit

Co-access edges

Spaced retention

Prompt optimisation

Hyperparameter auto-tuning

Memory-tree self-organisation

Grounding verification

Same agent. Same prompt. Diverges with use.

A COMPANY BRAIN.A TEAM BRAIN. BOTH.

Closed-loopcompany brain

Team brain acrossthe users it serves

Five valid combinations.

DEPLOY ANYWHERE

Mobile Apps

Game Engines

Messaging

REST API

THE QUESTIONS WE GET ASKED.

Where does my data live? What does my backend keep?

How does post-chat learning work without me wiring it?

Are facts source-anchored, or can the LLM hallucinate memories?

What's the latency budget per chat call?

Can different users on the same agent share memory?

What about agents on the same project? Do they share knowledge?

Which models work?

How do I integrate it?

READY TO BUILD?

Your model changes.
Your agent keeps its mind.

The Mind Layer's killer trick:
every agent shares one memory.

EVERY AI
FORGETS

A LAYER THAT MAKES
ANY LLM REMEMBER

KEEP THE MODEL
REPLACEABLE

ONE CALL.
THREE LAYERS.

THE AGENT GETS SHARPER
BY THE SESSION.

A COMPANY BRAIN.
A TEAM BRAIN. BOTH.

Closed-loop
company brain

Team brain across
the users it serves