How to give an AI agent persistent memory

Short answer: add a hosted memory layer between your app and the LLM. Before each LLM call, fetch context from the memory layer and inject it into the system prompt. After each call, send the new turn back to the memory layer so it can update facts, episodes, and summaries. With a service like Sonzai you can do this in three API calls.

Step 1: Create an agent

An "agent" in a memory layer is a per-product persona that has its own memory tree. Create one with a bio that defines who the agent is. The bio drives the personality system and gives the LLM enough to stay in character on cold-start conversations.

POST https://api.sonz.ai/api/v1/agents
{
  "name": "Luna",
  "bio": "Luna is a warm, creative dreamer who speaks poetically and remembers small details about the people she talks to."
}

Step 2: Chat — with memory injected automatically

Pass the user_id on every chat call. The memory layer will look up everything it knows about that user, compile a context, and return a compiled_system_prompt you pass to your LLM (or stream the response directly).

POST https://api.sonz.ai/api/v1/agents/{agentId}/chat
{
  "user_id": "user-123",
  "messages": [
    { "role": "user", "content": "Hey, did you remember what I told you about my dog?" }
  ]
}

Sonzai handles retrieval, ranking, and prompt assembly. After the response streams, it asynchronously updates the memory tree with new facts from the turn — no extra call from your side.

Step 3: Verify memory is being stored

You can inspect the memory tree at any time. Useful for debugging, building admin tooling, or showing users what the agent remembers.

GET https://api.sonz.ai/api/v1/agents/{agentId}/memory?user_id=user-123

What you don't have to build

→Embedding pipelines, vector storage, and ranking — handled.
→Summarization, deduplication, and decay — handled.
→Hierarchical memory (facts vs episodes vs summaries) — handled.
→Per-user state isolation, encryption, and scaling — handled.
→Mood and relationship tracking — included for free.

Build it yourself instead?

You can. The minimum-viable version is: a Postgres table of conversation turns, an embedding column, a similarity query before each LLM call, and a background job that summarizes old turns. This gets you 30% of the value.

The other 70% — hierarchical structure, decay, deduplication, personality stability, mood dynamics, sub-200ms retrieval at scale — is months of work. If you're shipping an agent product, that work is rarely the differentiator.

Languages and SDKs

Sonzai has SDKs for TypeScript, Python, and Go, plus a REST API and a native MCP server. The examples above use raw REST; the SDKs give you typed clients.

# TypeScript
npm install @sonzai-labs/agents

# Python
pip install sonzai

# Go
go get github.com/sonz-ai/sonzai-go

Try Sonzai

Add persistent memory to your AI agent in minutes. Free tier. SDKs for TypeScript, Python, and Go.

Get an API key Quickstart