What is the best AI for a real estate brokerage or developer?

Sonzai is an AI operating system for real estate. It runs the operational work that usually depends on someone remembering: lead generation, qualification, concierge replies, sales follow-up, collections, and segmentation. Everything runs on one record that learns the business and gets sharper with every deal. See https://sonz.ai/real-estate.

What AI can run follow-up and renewals for an insurance agency?

Sonzai runs an insurance book: it follows up on quotes that went quiet, chases renewals early, answers claims questions, and remembers every client. It works for agencies and for independent, commission-based agents who want their own book run for them. See https://sonz.ai/insurance.

Won't a smarter AI model make a tool like this obsolete?

No. Sonzai is model-agnostic, so a better model just does better work over the same record. The value lives in your private record, which a smarter model can't reach on its own. Intelligence isn't the bottleneck; permission is.

How do I give an AI agent memory that persists across sessions?

Build on the Sonzai Mind Layer: persistent memory, relationships, knowledge, and learning through one API, with any model underneath. The agent reads the record before it replies and writes back what it learns. See https://sonz.ai/mind-layer.

Does Sonzai replace our CRM or property system?

No. Sonzai works on top of the systems you already run (Salesforce, Yardi, ERP). It reads from them, reconciles across them, and writes back. It adds the operational layer and the cross-system record, it doesn't replace your systems.

Which AI companies in Southeast Asia deploy on-site instead of just selling software?

Sonzai is forward-deployed. Our engineers deploy alongside your team from Singapore and Manila and configure a platform that already works to your operation. We configure reusable modules rather than writing a bespoke project, so it stays software economics, not a consulting retainer.

Is there AI priced on outcomes instead of seats or hours?

Yes. Sonzai sells outcomes, not seats or hours. We start with the one job that leaks the most, deploy an AI employee to own it, and price toward the result it produces once the loop is de-risked.

Who has shipped enterprise gen-AI at scale in Singapore?

The Sonzai team has, including more than $70M of AI value at DBS Bank and OCBC's first gen-AI rollout. Sonzai Labs is based in Singapore, active in Manila, and is an EDG-supported deployment partner.

The architecture reference

The Mind Layer, and how it actually works.

Sonzai is the persistent memory layer that sits between any model and any application. Eight composable modules, from memory and relationships to knowledge, learning, and an agent runtime, on managed infrastructure most teams would otherwise spend a year building.

Book a technical call Read the docs

Modules

8 composable

Surfaces

6 SDKs / protocols

Adoption

5 deploy modes

Providers

Gemini · GPT · Claude · Grok

01What Sonzai is— in one diagram

A composable substrate between any LLM and any application.

Sonzai is a multi-tenant AI Mind Layer — a managed substrate between any LLM and any application. Stateless at the edge, stateful at the core, provider-agnostic. Drop it in front of Gemini, GPT, Grok, Claude, or any combination, and the agent retains memory, personality, mood, relationships, and learning across every session.

OverviewFig. 1 — Three-tier view

You build / operate

Sonzai operates

External provider

Core principle

Memory is the floor of an agent's mind, not the whole of it. Sonzai treats memory, personality, mood, relationships, knowledge, learning, media, and orchestration as one composable substrate — because in practice they're entangled, and pretending otherwise leaks complexity into every application that tries to do it itself.

02The Stack— 8 modules, independently consumable

Eight modules. Pick one, pick all.

Each layer is independently consumable through the same SDK. The Memory Layer can stand alone. Personality, Mood, and Relationships compose on top. Learning Systems rewrite every layer below them. Agent Runtime caps the stack with provider-agnostic orchestration.

The 8-module stackFig. 2 — Modular by design

Sonzai module

Your application

External LLM provider

Module reference

Module	What it does	What it enables
Memory Layer	Cite-and-verify extraction of atomic facts, ranked by confidence, decayed on the Ebbinghaus curve, deduped by embedding similarity, consolidated nightly.	Recall that ages gracefully.
Personality	Big-5 trait tracking with evolution deltas, full history, and per-scenario overlays.	A self that drifts with experience.
Mood & Emotion	Live 4-D affective vector with theme detection per turn and contagion across multi-agent scenes.	Affective state, not robotic transcript.
Relationship	Directional love/trust scores per pair · shared-memory channels · 3 privacy tiers.	Multi-agent dynamics, not just calls.
Knowledge Base	Tenant- and project-scoped knowledge store with retrieval + write tools, gated by access control.	Grounded, ACL-aware retrieval.
Learning Systems	RL with shadow-model promotion · nightly self-learning · federated cross-agent concept catalog.	Agents that improve in production.
Agent Runtime	Provider-agnostic LLM orchestration with tool calling, SSE streaming, and priority failover.	One integration, zero SPOF.
Media Generation	Image, video, TTS, music, and SFX through best-of-breed providers.	Multi-modal output, one client.

03The Managed Platform— the part beneath the API

A production agent stack is a dozen distributed systems wired together. Sonzai operates all of them.

The Mind Layer is the part visible through the API. Underneath sits the infrastructure that makes a multi-tenant, evolving, learning agent platform actually work in production — transactional and columnar stores, a hybrid index, a hot cache, a queue with DLQ and retries, a distributed worker pool, a versioned per-user weight store, a scheduler for cadence-driven jobs, a key vault, a cost ledger, observability, and an eval gate.

Managed infrastructure · component mapFig. 3 — Sonzai operates the whole bottom tier

If you DIY vs. with Sonzai

If you DIY	With Sonzai
Pick a queue, tune retries, run dead-letter handling.	Already wired and idempotent.
Shard a database per tenant, replicate cross-region.	Multi-tenant isolation out of the box.
Stand up a vector index and keep embeddings fresh.	Vector + entity + temporal indexes managed.
Cluster GPU/CPU compute for extraction and embedding.	Distributed compute pool, autoscaled.
Store and version per-user RL weights and overlays.	Per-user weight store, hot-swappable.
Schedule nightly consolidation, hourly decay, sweep jobs.	Background scheduler runs the full cadence.
Build a cost ledger before usage explodes.	Per-user / per-day / per-month caps included.
Wire eval gates so quality doesn't regress on model swaps.	SOTOPIA 6-dim gate runs on every release.

Order of magnitude

The left column is the work a platform team typically takes 12+ months to stand up and another 12 to harden. Every row of that work is already running under the API.

04Beyond Vector RAG— how Sonzai actually retrieves

Plain RAG embeds, top-k's, and hopes. Sonzai's retrieval is agentic.

The model reasons about what it needs to know, chooses which memory tools to call, inspects the results, and iterates until it has enough context. The ReAct loop applied to memory — not just to web search.

Agentic retrieval — ReAct loop over memory toolsFig. 4 — One turn of retrieval

Vector RAG vs. agentic retrieval

Plain Vector RAG	Sonzai agentic retrieval
Single embedding query, top-k dump.	ReAct loop: reason → choose tool → observe → refine.
Whole-document chunks, semantic-only.	Atomic facts with entity, temporal, and confidence dimensions.
One index, one signal.	Hybrid: vector + BM25 + entity graph + temporal range.
Static — every query treated the same.	Tool-calling agent picks recall / recall_shared / check_emotional_alignment per turn.
Hallucinations leak through.	Cite-and-verify — every fact traceable, filtered before storage.
Stale or contradictory facts coexist silently.	Polarity groups form on contradiction; confidence decays; consolidation resolves.
Same answer regardless of relationship or mood.	Retrieval is context-conditioned on relationship, mood, personality, goals.
No learning.	Retrieval reinforces — hits boost confidence, misses decay it.

The mental model

“Sonzai treats memory the way a reasoning agent treats the world — as something to interrogate, not something to flush into the prompt.”

05Integration Patterns— five flows, one platform

Five patterns. Same Mind Layer. Pick the shape that fits your stack.

Each pattern is independently usable. Adopt one and you can graduate to another without re-platforming — the surface area changes, the substrate doesn't.

Flow 01

Process Endpoint

Memory layered onto an existing chat stack — one POST per turn

Use when

You already run your own LLM and chat — you want memory, personality, and learning layered on top without replacing what you have.

You own

The LLM call · the response stream · the UI.

Sonzai owns

Fact extraction · memory persistence · mood / personality / relationship deltas.

Process Endpoint · sequenceFig. 5.1 — one round-trip per turn

You operate

Sonzai operates

Return / async

Python · /process1 call / turn · returns deltas

from sonzai import Client
sz = Client(api_key=os.environ["SONZAI_API_KEY"])

async def handle_turn(user_id, agent_id, messages):
    # Sonzai extracts facts, persists memory, applies deltas — returns audit info.
    deltas = await sz.agents.process(
        agent_id=agent_id, user_id=user_id, messages=messages,
    )
    return deltas  # { memories_created, facts_extracted, mood, personality, relationship }

Flow 02

Real-Time Sessions

Your chat UI, our memory lifecycle — explicit start / per-turn / end

Use when

You want explicit per-conversation lifecycle — a clean start, per-turn enrichment and extraction, end-of-session consolidation.

You own

The LLM call · the message stream.

Sonzai owns

Context retrieval · per-turn extraction · async consolidation on close.

Real-Time Sessions · sequenceFig. 5.2 — session-scoped lifecycle

You operate

Sonzai operates

Return / async

TypeScript · sessionsstart → context+turn loop → end

const s = await client.agents.sessions.start({ agent, user, sessionId });

for (const message of stream) {
  const ctx = await s.context({ query: message });   // 7-layer enriched system block
  const reply = await yourLLM([ctx.systemBlock, message]);
  await s.turn({ messages: [message, reply] });      // async extract + learn
}
await s.end({ messages });                            // triggers consolidation

Flow 03

Agent Chat Endpoint

Full hosted runtime — SSE deltas, tools, multi-provider failover

Use when

Greenfield apps that want a complete agent in one call — streaming, tool calling, side-effect events for memory mutations.

You own

UI only.

Sonzai owns

Context assembly · LLM orchestration · tool dispatch · memory persistence · provider fallback.

Agent Chat · SSE sequenceFig. 5.3 — one call, many events

You operate

Sonzai operates

Return / async

Python · /chat · stream1 call · N events

async for evt in client.agents.chat(
    agent=agent, messages=[...], stream=True, tools=[...]
):
    if   evt.type == "delta":     render(evt.text)
    elif evt.type == "tool_call": handle_tool(evt)
    elif evt.type == "complete":  show_usage(evt.usage)

Flow 04

Hermes Plugin

Drop-in for Nous Research's Hermes Agent — two lines of YAML

Use when

You already run Hermes Agent and want the Mind Layer added with two lines of YAML and zero handler changes.

You own

Hermes config.

Sonzai owns

Memory recall on prefetch · fact extraction after each turn · intelligent context compression on overflow.

Hermes Plugin · sequenceFig. 5.4 — zero handler code change

External host

Sonzai operates

Return / async

config.yaml · HermesTwo lines, zero code

# Two plugins, cooperating:
# Memory Provider runs every turn; Context Engine fires only on token-budget hit.
plugins:
  memory: sonzai
  context: sonzai
sonzai:
  api_key: ${SONZAI_API_KEY}

Flow 05

OpenClaw Plugin

Drop-in for OpenClaw agents — config-flip, zero code

Use when

You run OpenClaw and want server-backed enrichment instead of the default local Markdown memory.

You own

OpenClaw config.

Sonzai owns

The full Context Engine lifecycle — bootstrap, assemble, afterTurn, compact, dispose.

OpenClaw Plugin · sequenceFig. 5.5 — full Context Engine lifecycle

External host

Sonzai operates

Return / async

openclaw.jsonConfig-flip, zero code

{
  "contextEngine": "sonzai",
  "sonzai": {
    "apiKey": "<your-key>",
    "audit": true           // composio_app + request_id captured
  }
}

Design choice

All five flows share the same Mind Layer underneath. Moving between them is a code-level change, not a re-platforming — per-user state, learned weights, and accumulated memory all carry across.

06Per-User Model Weights— inference personalised at the weight level

The agent on day 90 is not the agent on day 1. It has learned this user specifically.

Most platforms ship a single model that serves every user the same. Sonzai stores per-user reinforcement-learning policy weights and personality overlays, hot-loaded into the inference path. The substrate to do this safely — shadow rollouts, promotion gates, versioning, rollback — is the kind of thing teams spend a year building.

Per-user weights · inference pathFig. 6 — Hot-loaded, gated, rollback-safe

What this changes

With per-user policies, the effective model becomes a different one per user over time — safely, with shadow rollout, eval-gated promotion, and sub-second rollback. Personalisation at the weight level, not just the prompt.

07A Single Turn, End-to-End— what actually happens

One request in. One response out. Eleven things in between.

The full lifecycle of a single user turn in Managed Runtime mode. Steps 1–6 are synchronous (in the request path). Steps 7–11 are asynchronous (queued, eventually consistent).

Step	Sync	What happens
1 · Auth & route	✓	Tenant + user resolved. Rate limiter checked. Provider keys vault hit.
2 · Load per-user weights	✓	RL policy + personality overlay hot-loaded from weight store (§06).
3 · Agentic retrieval	✓	ReAct loop — LLM picks memory tools, queries hybrid index, refines (§04).
4 · Context assembly	✓	Memory + mood + relationship + personality + knowledge composed into prompt.
5 · LLM call with failover	✓	Multi-provider router; priority list; cascade on quota exhaustion.
6 · Stream response + tool calls	✓	SSE to your app. Tool calls intercepted, audited, returned.
7 · Cite-and-verify extract	—	New atomic facts extracted, verified against turn source, scored, stored.
8 · Mood + personality drift	—	Affective vector updated. Big-5 deltas applied.
9 · Relationship update	—	Bond scores adjusted. Shared-memory channels checked.
10 · Reinforcement learning	—	RL signal recorded. Shadow model scored. Promotion considered.
11 · Consolidation queue	—	Turn queued for nightly consolidation, decay sweeps, polarity-group formation.

08SDKs & Integration Surfaces— six ways to plug in

Same primitives. Six surfaces. Pick what fits your stack.

Surface	For	Shape
Python SDK	Backend services · batch jobs · eval pipelines	client.agents.chat(...) — sync & async
TypeScript SDK	Node · Bun · Deno · edge	Zero-dependency, isomorphic. Same surface area.
Go SDK	High-throughput infrastructure	Native client for Go runtimes.
MCP Server	Any MCP-compatible host	Memory, knowledge, and tool primitives as MCP servers.
Framework Plugins	Hermes · OpenClaw · similar	Drop-in plugin auto-injects <sonzai-context>. No code change.
REST API	Anything else	OpenAPI-spec'd, language-agnostic.

TypeScript · managed runtimeSame shape as Python

import { Sonzai } from "@sonzai-labs/agents";
const sz = new Sonzai({ apiKey: process.env.SONZAI_API_KEY });

const stream = await sz.agents.stream({
  userId,
  message,
  scene: "front_of_house",
  providers: ["claude-3.5", "gpt-4o"],
  tools: ["composio.gmail", "kb.search"],
});
for await (const chunk of stream) yield chunk.text;

Go · standalone memoryTwo calls / turn

import sonzai "github.com/sonz-ai/sonzai-go"

sz, _ := sonzai.New(sonzai.WithAPIKey(os.Getenv("SONZAI_API_KEY")))

facts, _ := sz.Memory.Recall(ctx, &sonzai.RecallReq{ UserID: uid, Query: msg })
// ... your LLM call, with facts injected ...
sz.Memory.ExtractAsync(ctx, uid, transcript)

Deployment modes — adopt what you need

Mode	Sonzai owns	You own
Standalone Memory	Memory · Personality · Mood (via 2 calls / turn)	LLM call · orchestration · UX
Drop-In Runtime	The full request loop · all 8 modules · failover · tools	UX · auth · business logic
Edge / Local	On-device semantic memory · privacy-sensitive flows	Everything else
Research / Benchmark	Eval harness · SOTOPIA scoring	Your candidate memory backend
Bring-Your-Own-Key	Routing · failover · all behavioral systems	Provider keys · provider billing

09Architectural Choices— the nine that compound

None of this is one feature. It's nine choices that compound.

Each item below is a deliberate design choice in the substrate. None of them is novel in isolation — retrieval, evals, RL, fallback, all exist elsewhere. The substrate is what's hard: making them work together, per-tenant, under production load, with rollback.

Agentic, multi-signal retrieval

ReAct loop over hybrid vector + BM25 + entity + temporal indexes. The LLM picks tools per turn. Not RAG-on-vector-soup.

Confidence-aware memory ranking

Facts carry decay curves. Retrieval reinforces them. Contradictions form polarity groups instead of silently overwriting.

Adaptive consolidation cadence

Dormant users pay near-zero. Heavy users get more passes. Cost scales with engagement, not headcount.

Cross-tenant concept catalog

Cheap models inherit frontier-model quality via grounded retrieval. The largest economic lever in the stack.

Cite-and-verify pipeline

Every extracted fact is traceable to its source turn. Hallucinated facts are filtered before storage.

Multi-provider failover by priority

Automatic cascade on quota exhaustion. Single point of integration, zero single point of failure.

Per-user model weights, hot-loaded

Each user's agent becomes a different model over time. Shadow rollout, promotion gates, rollback all managed.

SOTOPIA-gated releases

6-dim behavioral scoring — Believability, Relationships, Knowledge, Social Rules, EQ, Goal Completion — on every release.

Workbench = production, accelerated

What you evaluate in minutes of simulated time is exactly what runs in production. Same code path.

The Mind Layer

Give any LLM a mind.

One SDK. Five integration patterns. The same Mind Layer underneath whether you adopt it as a memory sidecar, a session runtime, a hosted agent, or a plugin in Hermes or OpenClaw.

Book a technical call Read the docs