Skip to main content

Knowledge Base

Store and search structured facts, documents, and entity graphs your AI agents query during conversations — and that they can write back into themselves, forming a closed-loop multiplayer memory shared across every agent in the project.

The Knowledge Base gives your agents a live, searchable store of facts and documents — so they answer from real data instead of guessing. You push data in (via file upload or API), the platform builds a knowledge graph, and agents query it in real time. Schemas are the bridge to the Inventory primitive: the same entity types you define here back every per-user inventory item, letting a single schema serve both global knowledge and user-specific state.

It is also multiplayer. Agents can autonomously write what they learn during conversations back into the project KB, where every other agent on the project reads it on the next session — a closed-loop company brain that compounds the way human institutional memory does. And a single agent serving a team can carry attributed memory across users, so it can inform user A with the context it gathered while talking to user B. See Multiplayer memory below.

How knowledge gets into the KB

There are two ways to populate the knowledge base, plus one optional capability you toggle on top of either of them:

1. Manual upload. Drop in a PDF, DOCX, Markdown, or plain text file via the SDK or the dashboard. The platform extracts entities and relationships automatically and writes them to the graph. Use this for static documents you control — handbooks, policies, product manuals, lore. One-shot, or re-uploaded whenever the source changes. → Upload a document

2. ETL job that pushes on delta changes. Define an entity schema once; have your job call insertFacts or bulkUpdate on a schedule, queue, or change-data-capture stream. Use this for live upstream sources of truth — databases, price feeds, CMSes, scrapers — so the KB stays in sync as the source changes. Upserts are idempotent; pushing the same label twice merges properties and increments the version, so the same job is safe to re-run on any cadence. → Define a schema, then push facts

+ Autonomous agent editing (optional toggle — enable or disable per agent or project-wide). Flip the knowledgeBaseWrite capability on and agents get knowledge_create / knowledge_update / knowledge_delete tools. During conversations they record verified facts themselves, with a full audit trail (each write is stamped source = "agent:<agent-id>") and compare-and-swap update semantics so concurrent admin edits never get clobbered. Use this when the source of truth IS the conversation — support agents recording verified incident details, customer-success agents capturing renewal context, scribe agents writing meeting notes. → Agents writing to the knowledge base

The two ingestion paths are independent — pick either, both, or neither. Autonomous editing is a per-agent toggle (or a project-wide default via default_agent_kb_write) that sits on top of whichever ingestion paths you're already running. You stay in control: every agent write is server-side validated against your schema, capped by quotas, scoped to the agent's own project, and reversible — soft-delete only, hard delete stays admin-only.

Manual upload          ETL on delta changes      Agent in conversation
 (PDF / DOCX / MD)      (insertFacts / bulkUpdate) (knowledge_create / update / delete)
      |                          |                          |
      |                          |                          | requires
      |                          |                          | knowledgeBaseWrite: true
      v                          v                          v
 +----------------------------------------------------------------+
 |                  Project Knowledge Graph                       |
 |    entities + relationships + version history + audit trail   |
 +----------------------------------------------------------------+
                            |
                            v
              Agents read via knowledge_search
              during every conversation

What you can build with it

  • Real-time product Q&A — push a live product catalog and let agents answer "what's in stock under $50?" with current prices and availability
  • Medication or supplement advisor — store drug and dosage facts; the agent surfaces the right information when a user asks about interactions or timing
  • Collectibles price tracker — scrape market prices hourly, push via bulkUpdate, and let agents answer "what's trending up this week?" with real data
  • Internal knowledge assistant — upload employee handbooks, policy docs, and product manuals; agents ground answers in authoritative sources instead of hallucinating
  • Personalized recommender — define recommendation rules on entity fields (set, rarity, budget) and surface the top matches for each user at conversation time

Quickstart

Upload a document

Upload a PDF, DOCX, Markdown, or plain text file. The platform extracts entities and relationships automatically.

import fs from "fs";
import { Sonzai } from "@sonzai-labs/agents";

const client = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

const doc = await client.knowledge.uploadDocument(projectId, {
file:     fs.createReadStream("product_catalog.pdf"),
fileName: "product_catalog.pdf",
});

console.log(doc.documentId, doc.status); // "processing"

Define a schema, then push facts

Define typed fields for an entity type, then insert structured facts. Good for scrapers, inventory feeds, and price trackers.

// 1. Create a schema
await client.knowledge.createSchema(projectId, {
entity_type: "pokemon_card",
display_name: "Pokémon Card",
description: "Collectible trading cards",
fields: [
  { name: "price",     type: "number", required: true },
  { name: "condition", type: "enum",   enum_values: ["PSA 10", "PSA 9", "Raw"] },
  { name: "set",       type: "string", required: true },
  { name: "rarity",    type: "enum",   enum_values: ["Common", "Uncommon", "Rare", "Ultra Rare"] },
  { name: "tags",      type: "array" },
  { name: "internal_sku", type: "string", indexed: false }, // stored but not BM25-searchable
],
similarity_config: {
  match_fields: ["set", "rarity"],
  threshold: 0.7,
},
});

// 2. Insert facts
await client.knowledge.insertFacts(projectId, {
source: "price_sync",
facts: [
  {
    entity_type: "pokemon_card",
    label: "Charizard Base Set",
    properties: { price: 450, condition: "PSA 10", set: "Base Set", rarity: "Rare" },
  },
],
relationships: [
  { from_label: "Charizard Base Set", to_label: "Fire Pokemon", edge_type: "is_type" },
],
});

Core concepts

Knowledge graph

Entities are nodes; relationships are typed edges. Nodes deduplicate by normalized label + type — pushing the same label twice merges properties and increments the version. Every change is recorded in version history with source and timestamp, giving you a full audit trail. The graph is completely domain-agnostic: you define entity types and relationship types; the platform stores and indexes them.

Similarity edges

When a schema has a similarity_config, the platform automatically creates similar_to edges between entities whose match_fields values are close enough to exceed the threshold. This turns structured fields into graph topology without any extra work — and powers the recommendation engine.

Entity type naming: entity_type vs display_name

entity_type is the machine-readable slug used everywhere in the API (e.g. "pokemon_card"). It is how inventory writes and KB lookups reference the schema. display_name is the optional human-friendly label shown in the dashboard and agent tool descriptions (e.g. "Pokémon Card"). If display_name is omitted, the dashboard falls back to a title-cased version of entity_type.

Controlling BM25 indexing per field

By default every field value is included in the BM25 full-text index so agents can find nodes by searching field contents. Set indexed: false on a field to exclude it from the search index — the value is still stored and returned in reads, but it will not match keyword queries. Use this for fields that should be readable but not searchable, for example:

  • Internal identifiers (sku, barcode, external_id) that should never surface in agent search results
  • High-cardinality numeric values like dosage amounts on a medication schema, where token matching produces noise rather than signal
  • Raw HTML or markdown blobs that you render in UI but do not want polluting search
fields: [
  { name: "name",    type: "string", required: true },           // searchable
  { name: "dosage",  type: "string", indexed: false },           // stored-only
  { name: "sku",     type: "string", indexed: false },           // stored-only
]

Upsert semantics

insertFacts and bulkUpdate default to upsert mode (upsert: true): if a node with the same label + type exists, its properties are merged and the version is incremented; if it does not exist, it is created. This makes idempotent syncs safe to run on any schedule.

Set upsert: false for strict update-only semantics: nodes that do not already exist are skipped rather than created, and their IDs appear in the response not_found list. Use this when you want to ensure you are only patching existing data and never accidentally inserting stale or erroneous entries from an upstream feed.

How agents use the graph

During conversations, agents have access to a knowledge_search tool that queries your graph. Instead of hallucinating facts, the agent calls this tool and returns grounded answers. The search result includes the entity's properties, relevance score, and any related nodes reachable via one-hop traversal.

Full API

All SDK methods are on client.knowledge.* (TS/Python) or client.Knowledge (Go).

Documents

MethodReturnsDescription
uploadDocument(projectId, opts)DocumentUpload a file for automatic entity extraction
listDocuments(projectId)Document[]List documents and their processing status
deleteDocument(projectId, docId)voidDelete a document and its extracted entities

Schemas

MethodReturnsDescription
createSchema(projectId, opts)SchemaDefine a typed entity schema with optional similarity config
listSchemas(projectId)Schema[]List all schemas for the project
updateSchema(projectId, schemaId, opts)SchemaUpdate fields or similarity config

Facts and graph

MethodReturnsDescription
insertFacts(projectId, opts)InsertResultUpsert entities and relationships
bulkUpdate(projectId, opts)voidPatch properties on many nodes at once; only changed fields are written. Pass upsert: false for strict update-only semantics (missing nodes are returned as not_found instead of being created).
listNodes(projectId, opts?)Node[]List nodes, optionally filtered by entity type
getNode(projectId, nodeId)NodeFetch a single node with its edges and version history
MethodReturnsDescription
search(projectId, opts)SearchResultFull-text search with type filter, property filters, and graph depth

Analytics

MethodReturnsDescription
createAnalyticsRule(projectId, opts)AnalyticsRuleCreate a recommendation or trend rule
listAnalyticsRules(projectId)AnalyticsRule[]List all rules
runAnalyticsRule(projectId, ruleId)voidTrigger a rule run immediately
getRecommendations(projectId, opts)RecommendationResultFetch pre-computed recommendations for a source node
getTrendRankings(projectId, ruleId, type, window, limit)TrendRankingsTop gainers, losers, most volatile, or highest average
recordFeedback(projectId, opts)voidRecord whether a recommendation converted

Combines with other features

With Inventory — shared schemas for per-user items

Inventory items are knowledge graph nodes scoped to a specific user. The same entity_type you define in a KB schema can back both global knowledge entries and per-user inventory items, so the agent reasons across both surfaces with a single mental model. When you call inventory.update with action: "add", the platform creates a node in the graph and returns a fact_id — the same identifier you use in KB lookups.

// Add a per-user inventory item that lives in the knowledge graph
const item = await client.agents.inventory.update("agent_abc", "user_123", {
  action:      "add",
  item_type:   "medication",
  description: "Ibuprofen 500mg",
  project_id:  "proj_abc",
  properties: {
    medication_name: "ibuprofen",
    dosage:          "500mg",
    frequency:       "twice daily",
  },
});

// item.fact_id is a knowledge graph node ID — use it for KB lookups or schedule linkage
console.log(item.fact_id);

With Scheduled Reminders — live data injection at fire time

A schedule can reference an inventory_item_id (a fact_id from the graph). At every fire the platform reads the item's current properties from the knowledge graph and injects them into the agent's wakeup block. This means a dosage change or property update flows through to the next reminder with no schedule edit required — the graph is the single source of truth for what the reminder is about.

// 1. Create the inventory item (returns fact_id)
const item = await client.agents.inventory.update("agent_abc", "user_123", {
  action:      "add",
  item_type:   "medication",
  description: "Ibuprofen",
  project_id:  "proj_abc",
  properties: { medication_name: "ibuprofen", dosage: "500mg" },
});

// 2. Link the schedule — at every fire the graph is re-read for live properties
await client.schedules.create("agent_abc", "user_123", {
  cadence: {
    simple: { frequency: "daily", times: ["08:00", "20:00"] },
    timezone: "Asia/Singapore",
  },
  intent:             "remind the user to take their ibuprofen at the correct dose",
  check_type:         "reminder",
  inventory_item_id:  item.fact_id,
});

With Knowledge Analytics — graph becomes a recommender

Define analytics rules on your entity graph to surface recommendations and trend rankings. Rules match source entities to target entities by field similarity, price range, or other numeric proximity. Conversion feedback flows back into the rule to improve rankings over time. The same graph you use for search becomes a live recommender with no extra data store.

// Create a recommendation rule matching cards by set and rarity
const rule = await client.knowledge.createAnalyticsRule(projectId, {
  rule_type: "recommendation",
  name:      "Similar cards",
  config:    { match_fields: ["set", "rarity"], limit: 5 },
  enabled:   true,
});

// Fetch pre-computed recommendations for a source node
const recs = await client.knowledge.getRecommendations(projectId, {
  rule_id:   rule.rule_id,
  source_id: sourceNodeId,
  limit:     5,
});
for (const rec of recs.recommendations) {
  console.log(rec.target_id, rec.score);
}

// Record conversion feedback — improves future rankings
await client.knowledge.recordFeedback(projectId, {
  rule_id:        rule.rule_id,
  source_node_id: sourceNodeId,
  target_node_id: recs.recommendations[0].target_id,
  converted:      true,
  score_at_time:  recs.recommendations[0].score,
});

Multiplayer memory: a company brain

Sonzai's knowledge layer is not a static store you hand-curate and agents read from once. It is a closed-loop system your agents read, write to, and learn from collaboratively — the way a real team builds shared institutional memory. Three capabilities stack on top of each other:

LayerWhat it doesDefaultWhere it lives
ReadEvery agent grounds its replies in the project KB and (optionally) the org-scope KB.On for any agent with knowledgeBase: true.Per-project + organisation-wide.
Write — autonomousAgents create, update, and soft-delete project KB entries themselves during conversations. Audit trail stamps which agent made which change.Off until knowledgeBaseWrite: true.Per-project; capability-gated.
Share across usersA single agent serving a team carries attributed memory across users — wisdom (de-attributed, on by default) plus sharedMemory (attributed, opt-in).wisdom on; sharedMemory off.Per-agent; capability-gated.

The result is the same compounding effect human teams get from institutional knowledge: an agent doesn't just remember what it did with one user — it picks up what the team did, and a new agent joining the project benefits from everything every previous agent already wrote down.

Project (your tenant)
                            |
 +--------------------------+--------------------------+
 |                          |                          |
 v                          v                          v
agent A                    agent B                    agent C
 |                          |                          |
 |--- writes verified ------+                          |
 |    incident detail       |                          |
 |                          |                          |
 |                  reads + grounds reply              |
 |                          |                          |
 |                          +--- updates the entry --->|
 |                                                     |
 |                                            reads enriched fact
 v                          v                          v
user X                     user Y                     user Z

 Inter-agent: closed loop. Anything one agent learns is
 instantly available to every other agent on the project.

 Intra-agent: a single agent can also share memory across
 the users it serves -- attributed (sharedMemory) or
 de-attributed (wisdom). Same agent, multiple users,
 shared context.

Real-world shapes this enables:

  • Customer-success scribes. Agent A captures verified renewal context with user X; agent B picks it up on a follow-up call with the same account.
  • Support that learns from itself. Each verified incident detail an agent records is grounded data for every other agent the next time the same product issue surfaces.
  • Team coordinators. One agent serves the whole project team — "Alice owns the migration, Bob is on incident response" — and informs each teammate with the context it gathered with the others.
  • Group / party planning. "Carol brings dessert, Dave does setup." Everyone joining the agent already knows who's doing what.
  • Cross-product company brain. Organization-scope KB sits above projects: tenant-wide policies, lore, brand, and reference catalogs every project agent reads alongside its own.

The detailed mechanics of each layer are below.

Agents writing to the knowledge base

By default the KB is admin-curated: you push data in via document upload or the bulkUpdate API, and agents only read. You can opt agents into autonomous editing so they create, update, and soft-delete entries themselves during conversations — useful when the source of truth is the conversation (e.g. a customer-success agent capturing renewal context, or a support agent recording verified incident details).

Three tools the agent gets when this is on

ToolWhat the agent can do
knowledge_createInsert a new node into the project KB with typed properties.
knowledge_updatePatch existing properties using compare-and-swap — the agent first reads, then submits the version it saw, so concurrent admin edits never get clobbered silently.
knowledge_deleteSoft-delete a node (is_active = false). Soft only; hard delete stays admin-only.

Every write is stamped with source = "agent:<agent-id>" on each PropertySource, so the KB audit trail shows exactly which agent made which change. Schema validation, write quotas, and the project-tenant scope check all run server-side — capability-on agents can only touch their own project.

Two ways to turn it on

Per-agent — set knowledgeBaseWrite: true on the agent's capabilities. Most useful when only specific agents in a project should be allowed to edit (e.g. a "scribe" agent vs. a customer-facing one).

await client.agents.updateCapabilities("agent_abc", {
  knowledgeBase:      true,  // required prerequisite — agent must be able to read first
  knowledgeBaseWrite: true,
});

Project default — flip the project's default_agent_kb_write toggle. Every agent in that project with knowledgeBase: true gets the write capability automatically. Available in the dashboard at /dashboard/knowledge (the toggle next to the project selector) and via the API:

await client.projects.update(projectId, {
  default_agent_kb_write: true,
});

The platform resolves both flags with OR semantics — the agent's own flag wins immediately when on; the project default applies only when the agent flag is off. So you can default-on the whole project and not need to touch each agent.

Read first, then write

knowledgeBaseWrite requires knowledgeBase: true to also be on — an agent that can't read the KB can't intelligently edit it. The platform refuses to register the write tools when only write is enabled and logs a warning.

Wisdom & shared memory

Dedicated page

Shared memory has its own full documentation page — see Shared Memory for when to use it, how to enable and disable it, what tools the agent gets, how to verify it's working with live API probes, and the full privacy-control story. The summary below is here so KB readers see the multiplayer-memory hook in context.

Beyond static documents, agents that talk to many users develop patterns — recurring behaviours, common goals, stable preferences. Sonzai surfaces this cross-user generalization through two complementary tiers: wisdom (de-attributed, on by default) and shared memory (attributed, opt-in).

Wisdom (de-attributed, on by default)

When the wisdom capability is on — which it is for every new agent — the platform runs a daily promotion job that pulls patterns from per-user fact histories, k-anonymizes them, and rewrites the result through an LLM into de-attributed knowledge. No individual user is identifiable. Every agent benefits from "what tends to work / what tends to come up" without ever leaking who said what.

This is your free generalization layer. There's nothing for agents to call — wisdom shows up alongside facts in the agent's context automatically when the capability is on.

// Wisdom is on by default for every new agent. To opt out for a specific
// agent (e.g. a single-user companion product where cross-user generalization
// isn't appropriate), pass false at create time or via updateCapabilities:
await client.agents.updateCapabilities("agent_abc", { wisdom: false });

Default-on, opt-out

Wisdom is enabled for all agents — including ones created before the default-on cutover. The capability stores tri-state: true, false, or unset (treated as on). Pass wisdom: false explicitly only when you want to disable it; passing nothing keeps the agent on the platform default.

Shared memory (attributed, opt-in)

Some businesses want the opposite of de-attribution — they want users working with the same agent to see who is doing what. A team-collaboration agent might surface "Alice owns the migration, Bob is on incident response." A party-coordinator agent might track "Carol brings dessert, Dave does setup." That's what the sharedMemory capability gates.

When this capability is on, the agent records person/entity-attributed facts (roles, expertise, business context, relationship edges) and exposes them to other users sharing the agent. Three things change:

  1. Tools. The agent gets wisdom_create, wisdom_update, wisdom_delete, and relation edges, plus admin-side CSV import.
  2. Context. Other users' attributed facts surface in the agent's per-turn context with attribution.
  3. Privacy floor. Every write is validated against a privacy blocklist (compensation, health, politics) using a dedicated semantic validator before persistence — so the agent can't share something that shouldn't cross the user boundary even if a user asks it to.

Shared memory is OFF by default. Enable it explicitly when the agent serves a group, team, or party that benefits from cross-user visibility.

// Wisdom is the precondition (default ON for new agents — only pass it
// explicitly when overriding the default).
await client.agents.updateCapabilities("agent_abc", {
wisdom:       true,
sharedMemory: true,
});

You can also enable it at agent creation time:

const agent = await client.agents.create({
name:       "Team Coordinator",
project_id: "proj_abc",
tool_capabilities: {
  wisdom:        true,
  shared_memory: true,
},
});

Wisdom vs. shared memory — pick deliberately

wisdom is the generalization layer (safe, de-attributed, on by default). sharedMemory is the attribution layer (sensitive, per-person, off by default). Both can coexist — but turn on shared memory only when the use case genuinely needs cross-user visibility (groups, teams, parties, shared business context). Single-user companion products should leave it off.

Tutorials

  • Inventory tutorial — model per-user items as typed KB entities and read them back during conversations
  • Medication Reminders — end-to-end flow combining Knowledge Base, Inventory, and Scheduled Reminders for a medication adherence product

Next steps

  • Inventory — per-user structured items backed by the same knowledge graph
  • Knowledge Analytics — recommendation rules, trend rankings, and conversion tracking built on your entity graph
  • Organization Knowledge Base — project-level shared knowledge visible to all agents across a tenant

On this page