What is a memory layer for AI agents?
A memory layer is infrastructure that gives AI agents persistent state across conversations. It stores facts, episodes, summaries, and relationships about each user, then serves that context back to the LLM at inference time so the agent can remember who you are, what you care about, and what you've talked about before.
Why agents need a memory layer
Out of the box, an LLM is stateless. You send a prompt, you get a response, the context disappears. The next time the same user shows up, they're a stranger. This works fine for one-shot completions but breaks for any product where users come back: AI companions, tutors, customer-facing agents, voice assistants, game NPCs.
A memory layer fixes this by sitting between your app and the LLM. When the user sends a message, the memory layer retrieves relevant context (facts about them, past conversations, relationship state) and injects it into the prompt. After the response, the memory layer updates itself with whatever new facts came up.
What's in a memory layer
A good memory layer has a few distinct stores:
- →Facts — atomic, structured statements like "user lives in Singapore" or "user prefers tea." Cheap to store, easy to retrieve.
- →Episodes — narrative summaries of past conversations or events. "User had a tough day at work on Tuesday and we talked about it for an hour."
- →Summaries — rolled-up, decay-aware higher-order themes. "User has been increasingly stressed about job search over the past two weeks."
- →Relationships — who the user knows, how the agent feels about them, shared history.
Together these form a hierarchical memory tree. Flat vector stores can do facts but struggle with episodes and summaries because retrieval doesn't preserve narrative structure.
Memory layer vs RAG
RAG (retrieval-augmented generation) is about retrieving documents to answer questions. A memory layer is about retrieving per-user state to keep an agent in character and aware of context. They overlap in implementation (both use embeddings and retrieval) but solve different problems. RAG = "what does the knowledge base say about X." Memory = "what do I know about this user."
Build it or buy it
Building a memory layer from scratch takes a team months. You'll need hierarchical storage, embedding pipelines, summarization, decay algorithms, deduplication, retrieval ranking, and an inference path fast enough for real-time chat. Most teams shouldn't build this — the differentiation is in your product, not in your memory infra.
Hosted options include Sonzai (memory + personality + mood + relationships in one API), Mem0 (memory only), Zep (memory + summarization), and Letta (an agent framework with self-editing memory).
How Sonzai's memory layer works
Sonzai provides a hosted memory layer for AI agents with sub-200ms p95 retrieval. You create an agent with a bio, send chat events, and Sonzai builds and maintains the memory tree. At inference time, you get a compiled context back to inject into your LLM call.
Sonzai goes further than memory: it also tracks Big Five personality, 4D mood, relationships, and a knowledge base — so the agent isn't just informed, it's in character.
Try Sonzai's memory layer
Hosted memory + personality + mood + relationships in one API. SDKs for TypeScript, Python, and Go.