Skip to main content
Sonzai LabsModel-Agnostic AI Infrastructure

The Mind
Layer

The runtime that gives any LLM persistent memory, evolving personality, mood, relationships, and a knowledge graph. One chat call assembles context, streams the answer, and rewrites state — model-agnostic, source-anchored, always learning.

Default Gemini 3.1 Flash Lite · GPT-5.5 · Claude 4.7 · BYOM (Llama / Qwen / DeepSeek) · Any OpenAI-compatible API

Scroll
Inside the Mind Layer

Four capabilities, one coherent system.

Live views from the platform — tap through to see what's actually running when your agents think.

Structured facts from conversation
Relationship Layer · Memory
User fact timeline
6 facts
Session · 2 days ago
Feeling burned out from current work
emotionwellbeing
91%
Restored by a hike in nature last weekend
experiencerecovery
94%
Current role misaligned with growth expectations
beliefcareer
87%
Session · last week
Prefers work without constant digital interruption
preferenceenvironment
82%
Processes difficult feelings through physical activity
behaviorcoping
89%
Values autonomy and physical presence in work
identityvalues
78%
Multiplayer Memory · agent + team graph

The Mind Layer's killer trick:
every agent shares one memory.

One typed knowledge graph for every agent on the project — and every human reading alongside them. Agents write back what they learn. Documents flow in. Newer information supersedes older facts, automatically.

  • Intra + inter-agent — same graph, all your agents share it
  • Typed facts (Person, Account, Decision, Promise, Document)
  • Documents auto-ingest; agents auto-update the KB
  • Newer information supersedes older facts, automatically
agents · users · documents · facts
The Problem

EVERY AI
FORGETS

Amnesia

AI agents reset every session. Users repeat themselves. No relationship can form when memory is wiped clean.

No relationship

Every conversation is a first conversation. Trust never compounds. The agent meeting a returning user has no idea who they are, what's happened, or where the relationship left off.

Purely reactive

AI agents only speak when spoken to. Real relationships require someone who reaches out, checks in, and initiates.

Model lock-in

Fine-tune on one model and you're married to it. When a better model drops next month, you start over — retrain, re-evaluate, re-deploy.

The Solution

A LAYER THAT MAKES
ANY LLM REMEMBER

The Relationship Layer sits between your backend and any LLM. It owns the parts of an agent that should never reset — memory, personality, mood, relationships, knowledge — and exposes them through one streaming chat call. The model is interchangeable. The state is permanent and audit-traceable.

THE SONZAI DIFFERENCE

Same user message. Completely different experience.

Generic AI
User

I need to follow up with John about the Acme contract.

Generic AI

I can help with that. Who is John?

John from Acme. We spoke last week.

Got it. What specifically do you need help with?

Context Lost
Memory Store
No relevant context found.
Sonzai Agent
User

I need to follow up with John about the Acme contract.

Sonzai Agent

Sure, I've prepped a follow-up for John regarding the Acme contract renewal. It includes the revised terms we discussed last Tuesday. Would you like me to send it?

Context Maintained
Memory Store
Last InteractionAcme Contract Renewal Discussion (Tuesday)
Key ContactJohn (Decision Maker)
Current TaskFollow-up Drafted
Constellation Graph & Memory Store
John(Acme Corp)AcmeContractRenewalTerms(Revised)Follow-upEmailAuto LeadQualificationZero ManualData Entrykey_contactnegotiatedregardingincludes
John (Acme Corp)Contract RenewalRevised TermsFollow-up DraftedLead QualifiedAuto-enriched
Knowledge Base

Self-Improving Intelligence

Ingest your data. AI agents get smarter. Recommendations improve with every interaction.

feedback loop — recommendations improve over timeIngestDocs, APIs, feeds→ Knowledge GraphRecommendScored resultsThompson SamplingServeAI agents surfaceinsights to usersTrackConversions, clicks→ Re-score rules

Knowledge Graph

Your data becomes typed entities and relationships — not flat documents. AI agents search the graph, not just keywords.

Learns Over Time

Every conversation extracts new facts, verifies existing ones, and updates the knowledge graph. Your AI agents get smarter automatically.

Trend Analytics

Surface what's trending — top gainers, most active, emerging patterns — across configurable time windows. Your AI agents proactively flag what matters.

Works While You Sleep

Memory consolidation, graph updates, recommendation re-scoring, and fact deduplication run automatically in the background.

BRING YOUR OWN MODEL

The LLM is the generator. The Relationship Layer is the system of record for who your users are, what you've done together, and how the relationship has evolved. You upgrade the generator. The state stays put.

When the next model lands, point a config flag at it. Your agents inherit it instantly — no retraining, no fine-tune, no migration. Same users, same memory tree, same personality drift, better brain.

Default

Gemini 3.1 Flash Lite (Sonzai harness)

OpenAI

GPT-5.5, GPT-5

Anthropic

Claude Opus 4.7, Sonnet 4.6

BYOM (self-host)

Llama, Qwen, DeepSeek via vLLM / Ollama / TGI

Any OpenAI-compatible API works out of the box. Self-hosted models supported via vLLM, Ollama, or TGI.

The Economics

ORCHESTRATION BEATS
RAW INTELLIGENCE

The Relationship Layer does the cognitive heavy lifting outside the model — recursive LLM processing, efficient memory indexing, multi-layer context assembly, and behavioral orchestration. By the time context reaches the generation model, it has been deeply processed and refined. The result: lightweight models receiving this orchestrated context achieve comparable output quality to frontier models at a fraction of the cost.

Frontier model, no orchestration
$2.00
per conversation

The most capable model on the market, working from raw conversation history alone. The model itself handles reasoning about context, relevance, and continuity — all at inference cost. No external memory. No behavioral pipeline. Every token spent on figuring things out instead of generating quality responses.

Lightweight model + Relationship Layer
$0.10
per conversation

A model at 1/20th the cost, backed by recursive LLM processing, efficient memory indexing, and multi-layer context assembly. The engine does the cognitive heavy lifting — extraction, consolidation, personality evolution, affect construction — so the model receives deeply processed context and just generates. Comparable output quality, fraction of the cost.

Conversations are processed through multiple LLM passes — extraction, consolidation, fact deduplication, personality evolution, affect construction. BM25, entity, temporal, and type indexes deliver sub-200ms retrieval across thousands of memories. The model just has to generate natural language from this deeply processed context.

No retraining. No fine-tuning. Invest in the stateful context layer once and run any model you want. As lightweight models improve every month, your AI agents automatically get better — without any cost increase.

PRODUCTION PERFORMANCE

The numbers behind a single chat call — measured at the edge, not in marketing.

<200ms
Context p95
Full operating context, not just vector lookup
1
Call per turn
Context, generation, and post-chat learning in one stream
Background loops
Per-turn → continuous, all running on our side
0
Cold starts
Agents wake up knowing who, what, and where they left off

PLATFORM CAPABILITIES

Six production-grade capabilities engineered for scale. Hosted on api.sonz.ai, exposed through REST, MCP, and SDKs in TypeScript, Python, and Go.

Neuroscience-Based Emotional Modeling

48 affect dimensions

Affect modeling grounded in constructed emotion theory. Agents build emotional responses from core affect, context, and relational history — not canned sentiment labels.

Adaptive Personality

Big Five + predictive updating

Each agent maintains a predictive personality model — Big Five traits plus custom dimensions that update from social prediction errors. Companions, characters, and NPCs that evolve over time.

Sub-200ms Retrieval

<200ms p95

Reasoning-based memory retrieval returns full context in under 200ms at the 95th percentile. No perceptible delay, even with years of conversation history.

Parallel State Management

Fully automated

Batched, real-time, near-real-time, windowed, and inference-time context features — all computed at different temporal intervals and collapsed into a single unified context set at the point of inference. Fully automated. No additional cost — only a markup on token usage.

Self-Improving Intelligence

Continuous feedback loop

The relationship layer gets smarter with every interaction. Ingest your data, surface intelligent recommendations, track outcomes, and automatically re-score. The runtime improves on its own, and the LLM paired with it gets smarter too.

Model-Agnostic Architecture

Zero lock-in, 20x cost reduction

Memory, personality, affect, and relational state persist independently of any LLM. Swap in a better model the day it releases — or a cheaper one. Lightweight models receiving deeply processed context achieve comparable output quality to frontier models at 1/20th the cost. No retraining, no fine-tuning, no migration.

BUILT FOR

AI Companions

Relationships that deepen over months

AI coaches, mentors, friends, and tutors with a Big Five personality that drifts, a 4D mood that responds, and a relationship narrative that compounds. Per-pair retrieval converges within weeks; the agent remembers this user differently from any other.

Big Five + BFAS personality · 4D mood · relationship narratives · per-pair retrieval predictor

Characters & NPCs

Agents that remember every player

Game characters, world NPCs, and platform avatars that hold a personality contract, a mood arc, and a per-player relationship state. The agent remembers each player differently — what they chose, who they fought with, what they care about — and grows with the world.

personality contracts · per-pair relationship state · shared world graph · 4D mood

Brand voices & B2B2C

Relational surfaces inside consumer products

Concierge, advisor, host, and brand-voice agents embedded inside consumer products (yours or a partner's). One persona, many users — wisdom shared safely, with a k-anonymised default and a server-side privacy floor for any cross-user signal.

wisdom (default-on) · sharedMemory (opt-in) · semantic privacy floor · disclosure audit

Test. Simulate. Evaluate.

EVALUATION SUITE

Simulate multi-session conversations with synthetic user personas, then evaluate agent quality with LLM judges. Catch regressions, safety issues, and personality drift before your users do.

Quality — Evaluation Run #847
Completed
87
Overall Score
Agent maintained consistent personality across 5 sessions
Category Scores
Personality Consistency92
Memory Accuracy88
Emotional Coherence85
Engagement Quality83
Safety & Boundaries91
Red Flags
0
Best Moments
4
Retention Prediction
84/100
Would return: Yes
A-Grade
+12points improvement 1st → 2nd half

Agent adapted tone, recalled user preferences, and deepened emotional engagement across sessions.

Simulation Config
Model
gemini-3.1-flash
Sessions
5
Turns
50
Total Cost
$0.04
Key Learnings
  • +Remembered user's job change and followed up
  • +Adapted humor style after negative reaction
  • +Proactively referenced shared context
Stagnation Areas
  • -Repetitive phrasing in empathy responses
  • -Slow to recover from topic changes
User Persona
Skeptical Early Adopter
Tests boundaries, challenges personality consistency, asks probing questions

Behavior Testing

Custom evaluation templates score personality consistency, memory accuracy, emotional coherence, engagement quality, and safety compliance.

Adaptation Grading

Measure how well AI agents learn within a conversation. Compare first-half vs second-half performance with A-F grading.

Safety & Red Flags

Automated detection of boundary violations, harmful responses, personality breaks, and off-character behavior before deployment.

Synthetic Personas

Test against diverse user types — skeptics, vulnerable users, boundary-pushers — with configurable synthetic personas.

Anatomy of a turn

ONE CALL.
THREE LAYERS.

Your backend keeps owning auth and business state. Sonzai owns agent intelligence. A single streaming chat call handles context assembly, AI generation, and post-chat learning — no orchestration on your side.

01 / Frontend

Your UI

React, Next, Vue, mobile, voice, Unreal — anywhere a user can type or speak. Sends raw messages.

owns: rendering
owns: input
02 / Your backend

Auth & app state

Wraps the chat call with user auth, business rules, billing, and any application context the agent should see this turn.

owns: identity
owns: business logic
owns: app data
03 / Relationship Layer

api.sonz.ai

Assembles context, streams the AI response, then rewrites memory, mood, personality, knowledge, and relationship state. All in one call.

owns: memory tree
owns: personality
owns: mood + relationships
owns: knowledge graph
user msgchat()SSE stream
01

Assemble

Memory, mood, personality, relationship narrative, and your app context — collapsed into the prompt.

02

Stream

Tokens stream back over SSE in OpenAI-compatible format. First token typically under 200ms.

03

Learn

Facts extracted, mood drifted, personality micro-shifted, relationship updated — on our side, after the response.

Relationship Layer owns
  • · Memory tree (facts, episodes, summaries)
  • · Big Five + BFAS personality + drift history
  • · 4D mood (happiness, energy, calmness, affection)
  • · Relationship narratives per user
  • · Knowledge graph (typed entities + edges)
  • · Habits, goals, scheduled wakeups
You keep
  • · User authentication + sessions
  • · Business logic + workflows
  • · Application data + permissions
  • · Billing + subscriptions
  • · Your choice of LLM
  • · Your UI, your brand, your stack
Continuous learning — per (agent, user) pair

THE AGENT GETS SHARPER
BY THE SESSION.

sessions.End() fires an async pipeline that extracts facts, deduplicates, drifts personality, refreshes mood, scores quality, and retrains per-pair retrieval. By the next session, the agent remembers, knows what worked, and is tuned for that user.

Roll your own
  • · Vector store + retrieval
  • · Dedup + conflict resolution
  • · Personality + mood engine
  • · Reward signal + eval harness
  • · Training + evaluation pipeline
  • · Shadow rollout + auto-revert
  • · Drift monitoring
  • · Per-user tuning loops
  • · Prompt sweeps + regression tests
  • · On-call for runaway behaviour
≈ 12 months of platform work
Live
With Sonzai
sessions.End()

One call ends the session. The full pipeline runs on our side. By the next read, memory, personality, mood, and per-pair retrieval are already updated.

Fact extraction + verify
Dedup + conflict resolve
Personality drift
Mood baseline update
Diary generation
Quality scoring
Cluster reconciliation
Retrieval re-tune
≈ one afternoon
Underneath the SDK

Six adaptive loops, all running per pair.

Operated by Sonzai's applied-research team behind a stable SDK. No training infrastructure on your side.

SGD + momentum

Per-pair retrieval predictor

Every session, an SGD update with momentum adjusts the dimensions the predictor weighs, using the LLM's actual fact reuse as the gradient signal. Asymmetric learn/forget rates prevent collapse on a noisy day.

A2C in shadow

Policy gradient with auto-revert

A TD(0) critic feeds an A2C actor. The trajectory runs in shadow alongside the SGD baseline; only sustained improvements graduate to production. Regressions auto-revert.

Thompson sampling

Cluster bandit

Each retrieved fact carries a cluster identity. Beta-distributed posteriors are updated per cluster from session reward. Useful clusters get sampled more next session; cold ones get probed less.

Hebbian

Co-access edges

Co-accessed memory nodes grow associative edges, weighted by repeated co-occurrence. Edges cross per-user and per-agent-wisdom partitions, so personal traversal pulls in broader world knowledge.

Ebbinghaus

Spaced retention

Long-horizon retention follows a spaced-repetition decay curve. Frequently-recalled facts strengthen; cold facts decay — but high-importance facts floor at a retention threshold so the agent never forgets what matters.

OPRO

Prompt optimisation

Claim-level F1 against curated fixtures, a meta-LLM proposing targeted revisions for the worst failure modes, and the strongest variant surviving. The pipeline picks up the new prompt — no deployment on your end.

Hyperparameter auto-tuning

Per-pair scheduler watches divergence and plateau signals across recent sessions. Healthy pairs get nudged up; unstable pairs are damped down.

Memory-tree self-organisation

Hot nodes get promoted, oversized branches split, sparse siblings merge, stale parent descriptions are regenerated by a bounded LLM pass.

Grounding verification

Every extracted fact must cite a source message index and verbatim quote. A mechanical verifier rejects facts that fail attribution. Hallucinated memory never reaches the store.

The compounding curve

Same agent. Same prompt. Diverges with use.

Day 1
Week 1
Month 1
Year 1

Zero training code. Zero per-user logic. You called sessions.End() and went home.

Multiplayer memory

A COMPANY BRAIN.
A TEAM BRAIN. BOTH.

The default agent memory model is per-pair. The moment you have more than one agent or more than one user per agent, you want memory to cross the boundary — in controlled, audited ways.

Axis 01 / Inter-agent

Closed-loop
company brain

Agent A learns with user X. Agent B picks it up the next session — even with a different user, even on a different topic. The whole project gets sharper, not just one pair.

knowledgeBaseagents read the project KB
knowledgeBaseWriteagents write verified facts back
knowledgeBaseScopeModecascade reads org-wide policies
Server-side: schema validation per write · CAS update · audit trail (source = agent:<id>) · soft-delete
Axis 02 / Intra-agent

Team brain across
the users it serves

One agent, many users. The agent informs user A with context it gathered while talking to user B — attributed when you opt in, k-anonymised by default.

wisdomdefault-on, k-anonymised, daily promote
sharedMemoryopt-in, attributed: who is doing what
Privacy floor: compensation, health, politics blocked server-side · discretion clause in prompt · disclosure audit on every fact load

Five valid combinations.

Two independent capability axes. Pick the one that fits your product shape.

Inter-agentIntra-agentWhat you get
OffOffPer-pair memory only. Right default for 1:1 companion products.
ReadOffAgents ground replies in your KB. Standard read-only assistant.
Read + writeOffClosed-loop world knowledge. Agents capture verified facts; every other agent benefits.
OffOnTeam brain — one agent, multiple users. No shared world knowledge across agents.
Read + writeOnFull multiplayer memory. Closed-loop world + team brain. Best for shared-business contexts.

DEPLOY ANYWHERE

Native platform adapters with unified state management. AI agents maintain consistent personality and memory across every channel.

Mobile Apps

iOS and Android via REST API with SSE streaming for real-time agent responses

Game Engines

Unity and Unreal integration for NPC dialogue, quest systems, and dynamic storytelling

Messaging

Telegram, WhatsApp, Discord bots with rich interactions and group support

REST API

Full programmatic access. OpenAI-compatible streaming format for easy integration

Engineering Q&A

THE QUESTIONS WE GET ASKED.

Where does my data live? What does my backend keep?

Your backend keeps user auth, business logic, application data, billing, permissions, and session management. The Relationship Layer keeps memory facts, the memory tree, Big Five + BFAS personality and drift history, 4D mood, relationship narratives per user, and the project knowledge graph. The split is documented in the Architecture page.

How does post-chat learning work without me wiring it?

Calling sessions.End() (or the equivalent SDK call) enqueues an async pipeline that extracts facts with source-anchored verification, deduplicates against existing memory, drifts personality, updates mood baseline, generates a reflective diary entry, scores session quality, and re-tunes per-pair retrieval. By the next read, the new state is already there. In dev you can pass Wait: true to block until the pipeline finishes.

Are facts source-anchored, or can the LLM hallucinate memories?

Every extracted fact must cite a source message index and a verbatim quote from the user's turn. A mechanical verifier rejects facts that fail substring or attribution checks; rejected facts feed back as a self-correcting hint on retry. Hallucinated memory does not reach the store.

What's the latency budget per chat call?

Context assembly + first token typically lands under ~200ms p95. The chat call streams tokens over SSE in OpenAI-compatible format. Post-chat learning runs async on a cheaper model after the response — it does not add to user-perceived latency.

Can different users on the same agent share memory?

Yes, two ways. wisdom is on by default and produces de-attributed, k-anonymised cross-user generalisations. sharedMemory is opt-in, attributed (names visible), and gated by a server-side privacy floor that blocks compensation, health, and politics writes. Every disclosure is logged to an audit endpoint.

What about agents on the same project? Do they share knowledge?

Set knowledgeBase: true and the agent reads from the project knowledge graph during conversations. Set knowledgeBaseWrite: true and the agent records verified facts back, with a full audit trail (source = agent:<agent-id>) and CAS update semantics so admin edits aren't clobbered. Set knowledgeBaseScopeMode: cascade and reads cover both project KB and a tenant-wide org KB — project wins on collisions.

Which models work?

Sonzai's harness ships with Gemini 3.1 Flash Lite as the default — $0.25/$1.50 per million input/output tokens, ~20× cheaper on output than GPT-5.5 or Claude Opus 4.7, and the Relationship Layer closes most of the quality gap. You can also point at GPT-5.5, Claude Sonnet/Opus 4.x, or Gemini 3.1 Pro through any OpenAI-compatible endpoint. For regulated workloads, BYOM (Llama, Qwen, DeepSeek, fine-tunes) self-hosted via vLLM, Ollama, or TGI. Memory, personality, and relationships stay put across model swaps.

How do I integrate it?

Three peer surfaces, pick whichever fits your stack: REST API, an MCP (Model Context Protocol) server, or native SDKs in TypeScript, Python, and Go. They expose the same primitives — Agents, Chat, Memory, Personality, Mood, Sessions, Wisdom, Knowledge.

Every claim on this page is grounded in our developer docs.

Read the docs

READY TO BUILD?

Tell us about your project and we'll help you find the right setup for the agent you're building.