After every session, Sonzai runs an async pipeline (fact extraction, personality evolution, diary, consolidation) plus continuous online learning, reinforcement learning, multi-armed bandits, and hyperparameter auto-tuning that adapt how memory is processed per (agent, user) pair — automatically, with no training infrastructure to maintain.

When a session ends, Sonzai kicks off a multi-stage async pipeline against everything that was said. It extracts and verifies new facts, consolidates duplicates, updates personality scores and mood baselines, writes a reflective diary entry, scores retrieval quality, and feeds that score back into per-pair retrieval weights. By the time a user returns, the agent already knows what happened last time — and its retrieval has been re-tuned for that specific (agent, user) pair.

Underneath the pipeline, the Sonzai relationship layer runs continuous machine-learning model training against live traffic: per-pair stochastic gradient descent, multi-armed bandits over memory clusters, a shadow-mode policy-gradient learner with automatic regression rollback, per-pair hyperparameter auto-tuning, and an OPRO-style prompt optimiser. All of it ships behind a stable SDK. You don't run the training loop — you keep ending sessions, and the per-pair memory layer keeps getting sharper.

Fully automatic

Self-improvement is triggered by sessions.End(). Everything on this page happens as a result of that one call. The next time you read memory, personality, or insights, the new state is already there.

Roll your own memory + learning stack          With Sonzai
 -------------------------------------          --------------------

    vector store + retrieval                |
    dedup + conflict resolution             |
    personality + mood engine               |       sessions.End()
    reward signal + eval harness            |             |
    training + evaluation pipeline          |             v
    shadow rollout + auto-revert            |
    drift monitoring                        |       all of it,
    per-user tuning loops                   |       automatic
    prompt sweeps + regression tests        |
    on-call for runaway behaviour           |

 -------------------------------------          --------------------
 ~ 12 months of platform work                   one afternoon

The bottom line for developers

You wire up sessions.End() once. Sonzai does the rest:

No training infrastructure. No fine-tuning runs, no eval harness to maintain, no per-user model artefacts to ship. The online-learning, RL, and auto-tuning loops are operated by Sonzai's applied-research team and ride behind a stable SDK.
Per-user personalisation, automatic. Every (agent_id, user_id) pair gets its own retrieval predictor weights, cluster-sampling posterior, traversal graph, learning-rate schedule, and value function. Two users on the same agent see different memory layers within a handful of sessions — no per-user code, no profile training, no embeddings pipeline to operate.
It actually compounds. Each session's reward is observed from fact reuse, re-retrieval, engagement, and explicit feedback, then fed back into the weights, the bandit posteriors, the critic, and the prompt optimiser. The next session is measurably better than the last, and the gap widens as the relationship deepens.
Safe by default. New policies run in shadow until a per-pair promoter confirms a sustained advantage over the baseline; regressions auto-revert. Production memory never gets dragged off a good optimum by a noisy day.
Predictable cost. Post-processing runs on a cheaper model than chat, and the tuning loop trains on signals you're already producing — not extra LLM calls per turn. The smarter your agent gets, the more efficient retrieval becomes.

For most teams this is the difference between we'll get to memory next quarter and our agents already remember every user, and the memory layer keeps getting smarter every week. Rolling your own — vector store + dedup + per-user fine-tuning + RL eval harness + prompt sweeps + safe-rollout machinery — is a 12-month detour. With Sonzai it's one SDK call.

What you can build with it

Personality drift over time — the agent evolves character and relationship stance through repeated use, with no manual tuning
Diary generation per session — the agent writes reflective summaries in its own voice, available as future context
Automatic fact consolidation — duplicate and contradictory facts are merged or superseded; memory stays compact
Breakthrough detection — milestone moments fire on completed sessions and land in the evolution history for narrative use
Relationship tracking updates — stance, love score, and per-user personality overlays all update after each session
Per-(agent, user) retrieval that sharpens with use — online and RL loops adapt the predictor's dimension weights, cluster sampling, and traversal edges per pair, so a returning user gets retrieval that fits their pattern, not the cohort average

Quickstart

There is no direct API for the self-improvement pipeline. It is triggered exclusively by ending a session. Set Wait: true during development if you need to query memory or personality immediately after the call; in production, leave it false and let the pipeline run async.

// End the session — this triggers the post-processing pipeline.
_, err := client.Sessions.End(ctx, agentID, sonzai.SessionEndOptions{
    UserID:          "user-123",
    SessionID:       "sess-abc",
    TotalMessages:   12,
    DurationSeconds: 340,
    Messages:        messages,
    // Wait: true  // dev/test only — blocks until pipeline completes
})
if err != nil {
    return err
}

// On the next turn (or after Wait returns), the updated state is readable.
personality, err := client.Personality.Get(ctx, agentID, nil)
memory, err := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{UserID: "user-123"})

Core concepts

Triggered by SessionEnd — automatically. Every call to sessions.End() enqueues the pipeline. You do not need to call anything else.

Async by default. In production the call returns immediately and the pipeline runs in the background. Results are visible on the next read of memory, personality, or insights. Use Wait: true in tests or benchmarks when you need to assert on the new state in the same process.

Pipeline components. A single session end runs: fact extraction with source-anchoring verification, deduplication and conflict resolution, cluster reconciliation, personality drift application, mood baseline update, diary generation, next-session prediction, and session quality scoring.

Daily and weekly jobs layer on top. Immediate post-processing handles per-session work. Longer-horizon jobs (memory tree pruning, narrative arc compression, association decay, learning-pace checks) run on daily and weekly cadences. The workbench's Advance Time triggers these same jobs against simulated time.

Post-processing model. The pipeline uses a cheaper model than the chat model to keep costs low. The resolver cascade checks agent → project → account → system default. You can inspect or override the resolved model without running any inference.

// Check which model will run post-processing for this agent.
effective, err := client.Agents.EffectivePostProcessingModel(ctx, agentID, "gemini-2.0-pro")

// Pin a specific model at the agent level.
err = client.Agents.UpdatePostProcessingModel(ctx, agentID, "gemini", "gemini-2.0-flash-lite")

// Remove the agent-level pin (falls back to project/account/system).
err = client.Agents.ClearPostProcessingModel(ctx, agentID)

Continuous learning, per (agent, user) pair

The post-session pipeline runs every session. Underneath it, the runtime is continuously training how memory is processed for each (agent_id, user_id) pair — and Sonzai's applied-research team operates the online-learning, reinforcement-learning, bandit, and auto-tuning loops that govern it. Two pairs running the same agent end up with different predictor weights, different clusters surfaced, different traversal edges, and different schedules.

Day 1     |  ###...........................   ready out of the box
           |  verified extraction, dedup, clustering, and behavioural
           |  updates running from the first turn

 Week 1    |  #######.........................   responsive, adapting
           |  confidence has moved on the facts the user really cares
           |  about; mood is responding; patterns forming

 Month 1   |  ##############...................   personalised
           |  per-user retrieval converged; personality overlay has
           |  diverged; story arcs forming; this user is visibly
           |  remembered differently to the one before

 Year 1    |  #########################.........   long-term partner
           |  compact, navigable memory; milestones earned; reflective
           |  diary; recurring-event awareness; retrieval sharper than
           |  day one
           |
           |  Zero training code. Zero per-user logic. You called
           |  sessions.End() and went home.

Reward signal, compiled per session. A reward compiler turns each session's observable signals — what the LLM actually used, how the user engaged, and explicit feedback when present — into a single bounded scalar. Every loop below trains against this reward; nothing on your side has to be instrumented or labelled.

Per-pair retrieval predictor, tuned by stochastic gradient descent. Every session, an SGD update with momentum adjusts the dimensions the predictor weighs, using the LLM's actual fact reuse as the gradient signal. Asymmetric learn / forget rates (aggressive on confirmed positives, slow to discard) prevent weight collapse on a single noisy session.

Hyperparameter auto-tuning per pair. Learning rates aren't a fixed constant — a per-pair scheduler watches divergence and plateau signals across recent sessions and adapts each pair's learning rate independently. Healthy pairs get nudged up to keep adapting; unstable pairs are damped down so a bad day can't drag a good optimum off course. No knobs to tune on your side.

TD(0) critic + A2C policy gradient, in shadow with auto-revert. A per-pair linear value function estimates V(state) from observable features (sessions to date, recent F1, learning rate, relationship stage). An A2C actor consumes V(s) as its baseline with an entropy bonus to keep exploration broad. The A2C trajectory runs in shadow alongside production; a per-pair promoter compares it to the SGD baseline over a rolling window, and only confirmed sustained improvements graduate to production. On regression, the prior weights are restored automatically. Production never sees a half-trained policy.

Cluster bandit (Thompson sampling, Beta posterior). Every retrieved fact carries a cluster identity. Each session's reward is attributed back across the contributing clusters and used to update a Beta-distributed posterior per cluster — a multi-armed bandit. Useful clusters get sampled more often next session; cold ones get probed less. Posteriors are lineage-aware: when the self-organiser splits, merges, or retires a cluster, its evidence flows to its successors instead of being thrown away.

Hebbian edges across partitions. Co-accessed memory nodes grow associative edges between them, weighted by repeated co-occurrence. Edges cross the per-user and per-agent-wisdom partitions, so user-specific traversal patterns can pull in the agent's broader world knowledge — and the more the pair runs, the denser and more selective the personal traversal graph becomes.

Memory tree self-organisation. A self-organiser rebalances the per-pair memory tree from access statistics: hot nodes get promoted, oversized branches split, sparse siblings merge, and stale parent descriptions are regenerated by a bounded LLM pass so summarisation tracks what's actually being read.

Ebbinghaus-style retention. Long-horizon retention follows a spaced-repetition decay curve. Frequently-recalled facts strengthen and outlive their original importance score; cold facts decay and eventually drop out of hot retrieval — but high-importance facts floor at a retention threshold so the agent never forgets the things that matter.

OPRO-style prompt optimisation. Sonzai's team runs an OPRO-style optimiser over the post-processing prompts: claim-level F1 scoring against curated fixture sets, a stronger meta-LLM proposing targeted revisions for the worst failure modes, and the strongest variant surviving. The pipeline picks up the new prompt — no deployment on your end.

Grounding verification. Every extracted fact must cite a source message index and a verbatim source quote from the user's turn. A mechanical verifier rejects facts that fail substring or attribution checks, and rejected facts feed back as a self-correcting hint on retry. Hallucinated memory doesn't reach the store — and this layer costs no extra inference per turn.

The longer an (agent, user) pair runs, the more its memory layer reflects how that user actually thinks — which transitions matter, which clusters carry signal, which dimensions to trust, which schedule it learns on. The agent doesn't just remember more for a returning user; it remembers differently per user, with no tuning required on your side.

Same agent. Same prompt. Two different users.
          =============================================

 +--- user_A pair ------------+    +--- user_B pair ------------+
 |                            |    |                            |
 |  Remembers what matters    |    |  Remembers what matters    |
 |  to user_A                 |    |  to user_B                 |
 |                            |    |                            |
 |  > the work narrative      |    |  > the music narrative     |
 |  > formal tone             |    |  > playful banter          |
 |  > morning rhythm          |    |  > late-night rhythm       |
 |  > returns on Mondays      |    |  > returns on Fridays      |
 |                            |    |                            |
 |  Mood baseline: calm       |    |  Mood baseline: bright     |
 |  Relationship: familiar    |    |  Relationship: close       |
 |                            |    |                            |
 +----------------------------+    +----------------------------+

 Two memory layers, diverged purely from each user's own patterns.
 No per-user code. No per-user prompt. No tuning required.

Multiplayer: agents that learn together

Per-pair learning is one layer. On top of it, agents read, write to, and learn from a shared knowledge base — and a single agent can carry attributed memory across the users it serves. The same compounding curve you saw above happens at the team level too.

Inter-agent — closed-loop company brain. Agents on the same project autonomously write verified facts back into the Knowledge Base (with knowledgeBaseWrite on). Anything agent A learns with user X is grounded data agent B retrieves the next time the same topic comes up — even with a different user. The whole project gets sharper every session, not just one pair.
Intra-agent — shared memory across users. A single agent serving a team carries memory across users via Wisdom & shared memory. wisdom (de-attributed cross-user generalisation) is on by default; sharedMemory (attributed cross-user context, for groups and teams) is one capability flip away — the agent informs user A with the context it gathered while talking to user B.
Organisation scope. Org-wide KB sits above projects: tenant-wide policies, lore, brand, and reference catalogs every project agent reads alongside its own. The cascade mode is recommended — project wins on collisions, org fills in defaults.

Just like a new hire benefits from every senior employee's notes, every new agent and every new conversation benefits from everything the team has already learned. The per-pair tuning loops keep getting sharper for that user; the multiplayer layer keeps getting smarter for the whole company.

Full API

There is no SelfImprovement resource. The pipeline is an internal implementation detail of SessionEnd. The table below shows the SDK methods that are either inputs to or outputs of the pipeline.

Method	Returns	Description
`sessions.End(ctx, agentID, opts)`	`*SessionResponse`	Ends a session and triggers the post-processing pipeline
`personality.Get(ctx, agentID, opts)`	`*PersonalityResponse`	Reads current Big Five scores and evolution history — updated after each pipeline run
`personality.GetRecentShifts(ctx, agentID)`	`*RecentShiftsResponse`	Lists recent personality drift events with timestamps and magnitudes
`personality.GetSignificantMoments(ctx, agentID, limit)`	`*SignificantMomentsResponse`	Returns milestone / breakthrough events written by the pipeline
`memory.List(ctx, agentID, opts)`	`*MemoryResponse`	Reads the memory tree — consolidated facts appear here after processing
`memory.ListFacts(ctx, agentID, opts)`	`*FactListResponse`	Lists atomic facts; newly extracted and deduplicated facts are visible here
`agents.EffectivePostProcessingModel(ctx, agentID, chatModel)`	`*EffectivePostProcessingModel`	Resolves which model the pipeline would use for this agent without running inference
`agents.UpdatePostProcessingModel(ctx, agentID, provider, model)`	`error`	Pins a specific post-processing model at the agent level
`agents.ClearPostProcessingModel(ctx, agentID)`	`error`	Removes agent-level pin; resolver cascade falls through to project/account/system

Combines with other features

With Sessions — SessionEnd triggers the pipeline

Ending a session is the only way to trigger post-processing. The Messages field carries the full conversation; the pipeline reads it to extract facts and compute session quality.

// End the session with the full message history.
_, err := client.Sessions.End(ctx, agentID, sonzai.SessionEndOptions{
    UserID:          "user-123",
    SessionID:       "sess-abc",
    TotalMessages:   8,
    DurationSeconds: 210,
    Messages:        conversationMessages,
    Wait:            true, // block until pipeline finishes (dev only)
})

// Pipeline has run. New facts, updated personality, and diary entry are ready.
facts, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{UserID: "user-123"})
fmt.Printf("facts after session: %d\n", len(facts.Facts))

With Personality — evolution writes to personality

Every session end applies Big Five drift, updates the mood baseline, and can fire milestone events. Fetch personality before and after to see the delta.

before, _ := client.Personality.Get(ctx, agentID, nil)

// ... run a session and end it (Wait: true for this demo) ...

after, _ := client.Personality.Get(ctx, agentID, nil)

shifts, _ := client.Personality.GetRecentShifts(ctx, agentID)
moments, _ := client.Personality.GetSignificantMoments(ctx, agentID, 5)

fmt.Printf("openness before: %.3f, after: %.3f\n",
    before.Personality.Openness,
    after.Personality.Openness,
)
fmt.Printf("recent shifts: %d, milestones: %d\n",
    len(shifts.Shifts), len(moments.Moments),
)

With Memory — facts get consolidated

The pipeline extracts new facts, deduplicates against existing memory, resolves conflicts, and updates importance and confidence scores. List memory after session end to see the new state.

// Before session end.
before, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{
    UserID: "user-123",
})

// ... run a session with substantive content, then end it (Wait: true) ...

// After session end — new facts extracted, duplicates merged.
after, _ := client.Memory.ListFacts(ctx, agentID, &sonzai.FactListOptions{
    UserID: "user-123",
})
fmt.Printf("facts before: %d, after: %d\n", len(before.Facts), len(after.Facts))

// Browse the full memory tree for cluster-level changes.
tree, _ := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{
    UserID:          "user-123",
    IncludeContents: true,
})

With Advance Time — simulated time triggers the pipeline

In the workbench, advancing the clock by 24 hours runs the same daily jobs that production runs overnight: memory decay, tree pruning, diary generation, cluster reconciliation, and mood drift back to baseline. This is the fastest way to verify that long-horizon evolution is working correctly before shipping.

// Advance 24 simulated hours — triggers daily pipeline jobs.
result, err := client.Workbench.AdvanceTime(ctx, map[string]any{
    "agent_id":  agentID,
    "user_id":   "user-123",
    "hours":     24,
})

// If the advance takes longer than your HTTP timeout, run it async.
asyncResult, err := client.Workbench.AdvanceTime(ctx, map[string]any{
    "agent_id": agentID,
    "user_id":  "user-123",
    "hours":    168, // 1 week
    "async":    true,
})
jobID := asyncResult["job_id"].(string)

// Poll until done.
for {
    job, _ := client.Workbench.GetAdvanceTimeJob(ctx, jobID)
    if job["status"] == "succeeded" || job["status"] == "failed" {
        break
    }
    time.Sleep(2 * time.Second)
}

// Read memory and personality to see the result of 1 week of background jobs.
personality, _ := client.Personality.Get(ctx, agentID, nil)
memory, _ := client.Memory.List(ctx, agentID, &sonzai.MemoryListOptions{UserID: "user-123"})

Tutorials

Tutorial: Memory — walks through a full session-to-memory extraction flow
Evaluation — use the workbench to score how well the pipeline is running for your agent

Next steps

Personality — read and configure the Big Five profile the pipeline evolves into
Memory — explore the fact store and memory tree the pipeline writes to
Sessions — the triggering surface for everything on this page
Advance Time — simulate days and weeks of pipeline runs in seconds

Self-Improvement (Post-Processing)