The Mind
Layer
The runtime that gives any LLM persistent memory, evolving personality, mood, relationships, and a knowledge graph. One chat call assembles context, streams the answer, and rewrites state — model-agnostic, source-anchored, always learning.
Default Gemini 3.1 Flash Lite · GPT-5.5 · Claude 4.7 · BYOM (Llama / Qwen / DeepSeek) · Any OpenAI-compatible API
Four capabilities, one coherent system.
Live views from the platform — tap through to see what's actually running when your agents think.
The Mind Layer's killer trick:
every agent shares one memory.
One typed knowledge graph for every agent on the project — and every human reading alongside them. Agents write back what they learn. Documents flow in. Newer information supersedes older facts, automatically.
- —Intra + inter-agent — same graph, all your agents share it
- —Typed facts (Person, Account, Decision, Promise, Document)
- —Documents auto-ingest; agents auto-update the KB
- —Newer information supersedes older facts, automatically
EVERY AI
FORGETS
Amnesia
AI agents reset every session. Users repeat themselves. No relationship can form when memory is wiped clean.
No relationship
Every conversation is a first conversation. Trust never compounds. The agent meeting a returning user has no idea who they are, what's happened, or where the relationship left off.
Purely reactive
AI agents only speak when spoken to. Real relationships require someone who reaches out, checks in, and initiates.
Model lock-in
Fine-tune on one model and you're married to it. When a better model drops next month, you start over — retrain, re-evaluate, re-deploy.
A LAYER THAT MAKES
ANY LLM REMEMBER
The Relationship Layer sits between your backend and any LLM. It owns the parts of an agent that should never reset — memory, personality, mood, relationships, knowledge — and exposes them through one streaming chat call. The model is interchangeable. The state is permanent and audit-traceable.
THE SONZAI DIFFERENCE
Same user message. Completely different experience.
I need to follow up with John about the Acme contract.
I can help with that. Who is John?
John from Acme. We spoke last week.
Got it. What specifically do you need help with?
I need to follow up with John about the Acme contract.
Sure, I've prepped a follow-up for John regarding the Acme contract renewal. It includes the revised terms we discussed last Tuesday. Would you like me to send it?
Self-Improving Intelligence
Ingest your data. AI agents get smarter. Recommendations improve with every interaction.
Knowledge Graph
Your data becomes typed entities and relationships — not flat documents. AI agents search the graph, not just keywords.
Learns Over Time
Every conversation extracts new facts, verifies existing ones, and updates the knowledge graph. Your AI agents get smarter automatically.
Trend Analytics
Surface what's trending — top gainers, most active, emerging patterns — across configurable time windows. Your AI agents proactively flag what matters.
Works While You Sleep
Memory consolidation, graph updates, recommendation re-scoring, and fact deduplication run automatically in the background.
BRING YOUR OWN MODEL
The LLM is the generator. The Relationship Layer is the system of record for who your users are, what you've done together, and how the relationship has evolved. You upgrade the generator. The state stays put.
When the next model lands, point a config flag at it. Your agents inherit it instantly — no retraining, no fine-tune, no migration. Same users, same memory tree, same personality drift, better brain.
Default
Gemini 3.1 Flash Lite (Sonzai harness)
OpenAI
GPT-5.5, GPT-5
Anthropic
Claude Opus 4.7, Sonnet 4.6
BYOM (self-host)
Llama, Qwen, DeepSeek via vLLM / Ollama / TGI
Any OpenAI-compatible API works out of the box. Self-hosted models supported via vLLM, Ollama, or TGI.
ORCHESTRATION BEATS
RAW INTELLIGENCE
The Relationship Layer does the cognitive heavy lifting outside the model — recursive LLM processing, efficient memory indexing, multi-layer context assembly, and behavioral orchestration. By the time context reaches the generation model, it has been deeply processed and refined. The result: lightweight models receiving this orchestrated context achieve comparable output quality to frontier models at a fraction of the cost.
The most capable model on the market, working from raw conversation history alone. The model itself handles reasoning about context, relevance, and continuity — all at inference cost. No external memory. No behavioral pipeline. Every token spent on figuring things out instead of generating quality responses.
A model at 1/20th the cost, backed by recursive LLM processing, efficient memory indexing, and multi-layer context assembly. The engine does the cognitive heavy lifting — extraction, consolidation, personality evolution, affect construction — so the model receives deeply processed context and just generates. Comparable output quality, fraction of the cost.
Conversations are processed through multiple LLM passes — extraction, consolidation, fact deduplication, personality evolution, affect construction. BM25, entity, temporal, and type indexes deliver sub-200ms retrieval across thousands of memories. The model just has to generate natural language from this deeply processed context.
No retraining. No fine-tuning. Invest in the stateful context layer once and run any model you want. As lightweight models improve every month, your AI agents automatically get better — without any cost increase.
PRODUCTION PERFORMANCE
The numbers behind a single chat call — measured at the edge, not in marketing.
PLATFORM CAPABILITIES
Six production-grade capabilities engineered for scale. Hosted on api.sonz.ai, exposed through REST, MCP, and SDKs in TypeScript, Python, and Go.
Neuroscience-Based Emotional Modeling
Affect modeling grounded in constructed emotion theory. Agents build emotional responses from core affect, context, and relational history — not canned sentiment labels.
Adaptive Personality
Each agent maintains a predictive personality model — Big Five traits plus custom dimensions that update from social prediction errors. Companions, characters, and NPCs that evolve over time.
Sub-200ms Retrieval
Reasoning-based memory retrieval returns full context in under 200ms at the 95th percentile. No perceptible delay, even with years of conversation history.
Parallel State Management
Batched, real-time, near-real-time, windowed, and inference-time context features — all computed at different temporal intervals and collapsed into a single unified context set at the point of inference. Fully automated. No additional cost — only a markup on token usage.
Self-Improving Intelligence
The relationship layer gets smarter with every interaction. Ingest your data, surface intelligent recommendations, track outcomes, and automatically re-score. The runtime improves on its own, and the LLM paired with it gets smarter too.
Model-Agnostic Architecture
Memory, personality, affect, and relational state persist independently of any LLM. Swap in a better model the day it releases — or a cheaper one. Lightweight models receiving deeply processed context achieve comparable output quality to frontier models at 1/20th the cost. No retraining, no fine-tuning, no migration.
BUILT FOR
AI Companions
Relationships that deepen over months
AI coaches, mentors, friends, and tutors with a Big Five personality that drifts, a 4D mood that responds, and a relationship narrative that compounds. Per-pair retrieval converges within weeks; the agent remembers this user differently from any other.
Big Five + BFAS personality · 4D mood · relationship narratives · per-pair retrieval predictor
Characters & NPCs
Agents that remember every player
Game characters, world NPCs, and platform avatars that hold a personality contract, a mood arc, and a per-player relationship state. The agent remembers each player differently — what they chose, who they fought with, what they care about — and grows with the world.
personality contracts · per-pair relationship state · shared world graph · 4D mood
Brand voices & B2B2C
Relational surfaces inside consumer products
Concierge, advisor, host, and brand-voice agents embedded inside consumer products (yours or a partner's). One persona, many users — wisdom shared safely, with a k-anonymised default and a server-side privacy floor for any cross-user signal.
wisdom (default-on) · sharedMemory (opt-in) · semantic privacy floor · disclosure audit
EVALUATION SUITE
Simulate multi-session conversations with synthetic user personas, then evaluate agent quality with LLM judges. Catch regressions, safety issues, and personality drift before your users do.
Agent adapted tone, recalled user preferences, and deepened emotional engagement across sessions.
- +Remembered user's job change and followed up
- +Adapted humor style after negative reaction
- +Proactively referenced shared context
- -Repetitive phrasing in empathy responses
- -Slow to recover from topic changes
Behavior Testing
Custom evaluation templates score personality consistency, memory accuracy, emotional coherence, engagement quality, and safety compliance.
Adaptation Grading
Measure how well AI agents learn within a conversation. Compare first-half vs second-half performance with A-F grading.
Safety & Red Flags
Automated detection of boundary violations, harmful responses, personality breaks, and off-character behavior before deployment.
Synthetic Personas
Test against diverse user types — skeptics, vulnerable users, boundary-pushers — with configurable synthetic personas.
ONE CALL.
THREE LAYERS.
Your backend keeps owning auth and business state. Sonzai owns agent intelligence. A single streaming chat call handles context assembly, AI generation, and post-chat learning — no orchestration on your side.
Your UI
React, Next, Vue, mobile, voice, Unreal — anywhere a user can type or speak. Sends raw messages.
owns: input
Auth & app state
Wraps the chat call with user auth, business rules, billing, and any application context the agent should see this turn.
owns: business logic
owns: app data
api.sonz.ai
Assembles context, streams the AI response, then rewrites memory, mood, personality, knowledge, and relationship state. All in one call.
owns: personality
owns: mood + relationships
owns: knowledge graph
Assemble
Memory, mood, personality, relationship narrative, and your app context — collapsed into the prompt.
Stream
Tokens stream back over SSE in OpenAI-compatible format. First token typically under 200ms.
Learn
Facts extracted, mood drifted, personality micro-shifted, relationship updated — on our side, after the response.
- · Memory tree (facts, episodes, summaries)
- · Big Five + BFAS personality + drift history
- · 4D mood (happiness, energy, calmness, affection)
- · Relationship narratives per user
- · Knowledge graph (typed entities + edges)
- · Habits, goals, scheduled wakeups
- · User authentication + sessions
- · Business logic + workflows
- · Application data + permissions
- · Billing + subscriptions
- · Your choice of LLM
- · Your UI, your brand, your stack
THE AGENT GETS SHARPER
BY THE SESSION.
sessions.End() fires an async pipeline that extracts facts, deduplicates, drifts personality, refreshes mood, scores quality, and retrains per-pair retrieval. By the next session, the agent remembers, knows what worked, and is tuned for that user.
- · Vector store + retrieval
- · Dedup + conflict resolution
- · Personality + mood engine
- · Reward signal + eval harness
- · Training + evaluation pipeline
- · Shadow rollout + auto-revert
- · Drift monitoring
- · Per-user tuning loops
- · Prompt sweeps + regression tests
- · On-call for runaway behaviour
sessions.End()One call ends the session. The full pipeline runs on our side. By the next read, memory, personality, mood, and per-pair retrieval are already updated.
Six adaptive loops, all running per pair.
Operated by Sonzai's applied-research team behind a stable SDK. No training infrastructure on your side.
Per-pair retrieval predictor
Every session, an SGD update with momentum adjusts the dimensions the predictor weighs, using the LLM's actual fact reuse as the gradient signal. Asymmetric learn/forget rates prevent collapse on a noisy day.
Policy gradient with auto-revert
A TD(0) critic feeds an A2C actor. The trajectory runs in shadow alongside the SGD baseline; only sustained improvements graduate to production. Regressions auto-revert.
Cluster bandit
Each retrieved fact carries a cluster identity. Beta-distributed posteriors are updated per cluster from session reward. Useful clusters get sampled more next session; cold ones get probed less.
Co-access edges
Co-accessed memory nodes grow associative edges, weighted by repeated co-occurrence. Edges cross per-user and per-agent-wisdom partitions, so personal traversal pulls in broader world knowledge.
Spaced retention
Long-horizon retention follows a spaced-repetition decay curve. Frequently-recalled facts strengthen; cold facts decay — but high-importance facts floor at a retention threshold so the agent never forgets what matters.
Prompt optimisation
Claim-level F1 against curated fixtures, a meta-LLM proposing targeted revisions for the worst failure modes, and the strongest variant surviving. The pipeline picks up the new prompt — no deployment on your end.
Hyperparameter auto-tuning
Per-pair scheduler watches divergence and plateau signals across recent sessions. Healthy pairs get nudged up; unstable pairs are damped down.
Memory-tree self-organisation
Hot nodes get promoted, oversized branches split, sparse siblings merge, stale parent descriptions are regenerated by a bounded LLM pass.
Grounding verification
Every extracted fact must cite a source message index and verbatim quote. A mechanical verifier rejects facts that fail attribution. Hallucinated memory never reaches the store.
Same agent. Same prompt. Diverges with use.
Zero training code. Zero per-user logic. You called sessions.End() and went home.
A COMPANY BRAIN.
A TEAM BRAIN. BOTH.
The default agent memory model is per-pair. The moment you have more than one agent or more than one user per agent, you want memory to cross the boundary — in controlled, audited ways.
Closed-loop
company brain
Agent A learns with user X. Agent B picks it up the next session — even with a different user, even on a different topic. The whole project gets sharper, not just one pair.
knowledgeBaseagents read the project KBknowledgeBaseWriteagents write verified facts backknowledgeBaseScopeModecascade reads org-wide policiesTeam brain across
the users it serves
One agent, many users. The agent informs user A with context it gathered while talking to user B — attributed when you opt in, k-anonymised by default.
wisdomdefault-on, k-anonymised, daily promotesharedMemoryopt-in, attributed: who is doing whatFive valid combinations.
Two independent capability axes. Pick the one that fits your product shape.
DEPLOY ANYWHERE
Native platform adapters with unified state management. AI agents maintain consistent personality and memory across every channel.
Mobile Apps
iOS and Android via REST API with SSE streaming for real-time agent responses
Game Engines
Unity and Unreal integration for NPC dialogue, quest systems, and dynamic storytelling
Messaging
Telegram, WhatsApp, Discord bots with rich interactions and group support
REST API
Full programmatic access. OpenAI-compatible streaming format for easy integration
THE QUESTIONS WE GET ASKED.
Where does my data live? What does my backend keep?
Your backend keeps user auth, business logic, application data, billing, permissions, and session management. The Relationship Layer keeps memory facts, the memory tree, Big Five + BFAS personality and drift history, 4D mood, relationship narratives per user, and the project knowledge graph. The split is documented in the Architecture page.
How does post-chat learning work without me wiring it?
Calling sessions.End() (or the equivalent SDK call) enqueues an async pipeline that extracts facts with source-anchored verification, deduplicates against existing memory, drifts personality, updates mood baseline, generates a reflective diary entry, scores session quality, and re-tunes per-pair retrieval. By the next read, the new state is already there. In dev you can pass Wait: true to block until the pipeline finishes.
Are facts source-anchored, or can the LLM hallucinate memories?
Every extracted fact must cite a source message index and a verbatim quote from the user's turn. A mechanical verifier rejects facts that fail substring or attribution checks; rejected facts feed back as a self-correcting hint on retry. Hallucinated memory does not reach the store.
What's the latency budget per chat call?
Context assembly + first token typically lands under ~200ms p95. The chat call streams tokens over SSE in OpenAI-compatible format. Post-chat learning runs async on a cheaper model after the response — it does not add to user-perceived latency.
Can different users on the same agent share memory?
Yes, two ways. wisdom is on by default and produces de-attributed, k-anonymised cross-user generalisations. sharedMemory is opt-in, attributed (names visible), and gated by a server-side privacy floor that blocks compensation, health, and politics writes. Every disclosure is logged to an audit endpoint.
What about agents on the same project? Do they share knowledge?
Set knowledgeBase: true and the agent reads from the project knowledge graph during conversations. Set knowledgeBaseWrite: true and the agent records verified facts back, with a full audit trail (source = agent:<agent-id>) and CAS update semantics so admin edits aren't clobbered. Set knowledgeBaseScopeMode: cascade and reads cover both project KB and a tenant-wide org KB — project wins on collisions.
Which models work?
Sonzai's harness ships with Gemini 3.1 Flash Lite as the default — $0.25/$1.50 per million input/output tokens, ~20× cheaper on output than GPT-5.5 or Claude Opus 4.7, and the Relationship Layer closes most of the quality gap. You can also point at GPT-5.5, Claude Sonnet/Opus 4.x, or Gemini 3.1 Pro through any OpenAI-compatible endpoint. For regulated workloads, BYOM (Llama, Qwen, DeepSeek, fine-tunes) self-hosted via vLLM, Ollama, or TGI. Memory, personality, and relationships stay put across model swaps.
How do I integrate it?
Three peer surfaces, pick whichever fits your stack: REST API, an MCP (Model Context Protocol) server, or native SDKs in TypeScript, Python, and Go. They expose the same primitives — Agents, Chat, Memory, Personality, Mood, Sessions, Wisdom, Knowledge.
Every claim on this page is grounded in our developer docs.
Read the docsREADY TO BUILD?
Tell us about your project and we'll help you find the right setup for the agent you're building.