Skip to main content
SONZAI

Tool Integration for BYO-LLM

When using standalone memory mode, your LLM handles chat generation but may need to search knowledge and memory on demand. Sonzai exposes tool schemas compatible with OpenAI function calling, so you can wire them into any agent framework.

Two Approaches to Enrichment

There are two complementary ways your agent can access Sonzai knowledge and memory:

Automatic (Recommended)

Call GET /context with a query param. The endpoint automatically searches the knowledge base and injects recalled memories. The deferred learning loop primes the next context call with KB results that the agent missed. No tool calling needed.

Explicit Tool Calling

Register Sonzai tools with your LLM so it can search on demand mid-conversation. This is for agent frameworks (LangChain, Vercel AI SDK, CrewAI) where the LLM decides when to search. You fetch tool schemas from Sonzai and wire them into your framework.

When to use which?

Start with automatic enrichment — it covers most cases with zero configuration. Add explicit tool calling when your agent needs to search mid-conversation (e.g., the user asks a question not covered by the initial context fetch) or when your framework expects tool definitions.

Discovering Available Tools

Fetch the tool catalog for an agent. This returns JSON schemas in OpenAI function-calling format that you can pass directly to your LLM's tool configuration.

const tools = await client.agents.getTools("agent-id");

// tools.tools = [
//   {
//     name: "knowledge_search",
//     description: "Search the agent's knowledge base...",
//     endpoint: "POST /api/v1/agents/{agentId}/tools/kb-search",
//     parameters: {
//       type: "object",
//       required: ["query"],
//       properties: {
//         query: { type: "string", description: "Search query" },
//         limit: { type: "integer", description: "Max results (default 10)" }
//       }
//     }
//   },
//   {
//     name: "memory_search",
//     description: "Search the agent's memory for previously learned facts...",
//     endpoint: "GET /api/v1/agents/{agentId}/memory/search?q={query}&userId={userId}",
//     parameters: {
//       type: "object",
//       required: ["query"],
//       properties: {
//         query: { type: "string", description: "Search query" },
//         user_id: { type: "string", description: "User ID to scope search" },
//         limit: { type: "integer", description: "Max results (default 20)" }
//       }
//     }
//   }
// ]

Knowledge Search Tool

Search the agent's knowledge base for relevant documents and facts. Uses hybrid search (BM25 + semantic) when embeddings are available, falling back to BM25 full-text search.

Endpoint

POST /api/v1/agents/{agentId}/tools/kb-search
GET  /api/v1/agents/{agentId}/tools/kb-search?q={query}&limit={limit}

Request

{
  "query": "refund policy",
  "limit": 5
}

Response

{
  "query": "refund policy",
  "results": [
    {
      "content": "Customers can request a full refund within 30 days of purchase...",
      "label": "Refund Policy",
      "type": "policy",
      "source": "policies.pdf",
      "score": 0.92
    },
    {
      "content": "For digital products, refunds are processed within 5 business days...",
      "label": "Digital Refund Process",
      "type": "process",
      "source": "policies.pdf",
      "score": 0.78
    }
  ]
}

SDK Usage

const results = await client.agents.knowledgeSearch("agent-id", {
  query: "refund policy",
  limit: 5,
});

for (const result of results.results) {
  console.log(`[${result.score.toFixed(2)}] ${result.label}: ${result.content}`);
}

Memory Search Tool

Search the agent's memory for previously extracted facts about a user. This is a synchronousBM25 full-text search that returns immediately — no deferred processing.

Endpoint

GET /api/v1/agents/{agentId}/memory/search?q={query}&userId={userId}&limit={limit}

Response

{
  "results": [
    {
      "fact_id": "f_abc123",
      "content": "User enjoys hiking on weekends",
      "fact_type": "preference",
      "score": 4.82
    },
    {
      "fact_id": "f_def456",
      "content": "User adopted a dog named Luna in March",
      "fact_type": "event",
      "score": 3.15
    }
  ]
}

SDK Usage

const results = await client.agents.memory.search("agent-id", {
  query: "hiking",
  userId: "user-123",
  limit: 10,
});

for (const fact of results.results) {
  console.log(`[${fact.fact_type}] ${fact.content}`);
}

Memory search is always synchronous

Unlike KB enrichment (which has a deferred path), memory search returns immediately from BM25 indexes. There is no async component. The /contextendpoint already includes the most relevant memories automatically — this tool is for cases where the LLM needs to search for additional facts mid-conversation.

Wiring Tools into Agent Frameworks

The tool schemas from GET /tools/schemas follow the OpenAI function-calling format. Here is how to wire them into popular agent frameworks.

Vercel AI SDK

import { generateText, tool } from "ai";
import { google } from "@ai-sdk/google";
import { Sonzai } from "@sonzai-labs/agents";
import { z } from "zod";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const agentId = "agent-id";
const userId = "user-123";

// Define Sonzai tools for the Vercel AI SDK
const sonzaiTools = {
  knowledge_search: tool({
    description: "Search the agent's knowledge base for relevant documents",
    parameters: z.object({
      query: z.string().describe("Search query"),
      limit: z.number().optional().describe("Max results"),
    }),
    execute: async ({ query, limit }) => {
      const results = await sonzai.agents.knowledgeSearch(agentId, {
        query,
        limit: limit ?? 5,
      });
      return results.results.map((r) => ({
        content: r.content,
        label: r.label,
        score: r.score,
      }));
    },
  }),
  memory_search: tool({
    description: "Search agent memory for facts about the user",
    parameters: z.object({
      query: z.string().describe("Search query"),
    }),
    execute: async ({ query }) => {
      const results = await sonzai.agents.memory.search(agentId, {
        query,
        userId,
      });
      return results.results.map((f) => ({
        content: f.content,
        type: f.fact_type,
      }));
    },
  }),
};

// Get enriched context first
const ctx = await sonzai.agents.getContext(agentId, {
  userId,
  sessionId: "session-abc",
  query: userMessage,
});

const { text } = await generateText({
  model: google("gemini-3.1-flash-lite-preview"),
  system: buildSystemPrompt(ctx),
  prompt: userMessage,
  tools: sonzaiTools,
  maxSteps: 3, // allow up to 3 tool calls per turn
});

Google Gemini Function Calling

import { GoogleGenAI, Type } from "@google/genai";
import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });
const gemini = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

const agentId = "agent-id";

// Define tools in Gemini format
const tools = [{
  functionDeclarations: [
    {
      name: "knowledge_search",
      description: "Search the agent's knowledge base for relevant documents",
      parameters: {
        type: Type.OBJECT,
        properties: {
          query: { type: Type.STRING, description: "Search query" },
          limit: { type: Type.INTEGER, description: "Max results" },
        },
        required: ["query"],
      },
    },
    {
      name: "memory_search",
      description: "Search agent memory for facts about the user",
      parameters: {
        type: Type.OBJECT,
        properties: {
          query: { type: Type.STRING, description: "Search query" },
        },
        required: ["query"],
      },
    },
  ],
}];

// Chat with tool calling
const response = await gemini.models.generateContent({
  model: "gemini-3.1-flash-lite-preview",
  contents: [{ role: "user", parts: [{ text: systemPrompt + "\n\n" + userMessage }] }],
  config: { tools },
});

// Handle tool calls
for (const part of response.candidates?.[0]?.content?.parts ?? []) {
  if (part.functionCall) {
    const { name, args } = part.functionCall;

    let result;
    if (name === "knowledge_search") {
      result = await sonzai.agents.knowledgeSearch(agentId, {
        query: args.query as string,
        limit: (args.limit as number) ?? 5,
      });
    } else if (name === "memory_search") {
      result = await sonzai.agents.memory.search(agentId, {
        query: args.query as string,
        userId: "user-123",
      });
    }

    // Send tool result back to Gemini for the final response
    // (see Gemini function calling docs for the full loop)
  }
}

LangChain (Python)

from langchain_core.tools import tool
from langchain_google_genai import ChatGoogleGenerativeAI
from langgraph.prebuilt import create_react_agent
from sonzai import Sonzai

sonzai_client = Sonzai(api_key="sk_your_api_key")
agent_id = "agent-id"
user_id = "user-123"


@tool
def knowledge_search(query: str, limit: int = 5) -> list[dict]:
    """Search the agent's knowledge base for relevant documents and facts.
    Use when the user asks about topics that may be in uploaded documents."""
    results = sonzai_client.agents.knowledge_search(agent_id, query=query, limit=limit)
    return [{"content": r.content, "label": r.label, "score": r.score} for r in results.results]


@tool
def memory_search(query: str) -> list[dict]:
    """Search agent memory for previously learned facts about the user.
    Use when the conversation references past interactions or personal details."""
    results = sonzai_client.agents.memory.search(agent_id, query=query, user_id=user_id)
    return [{"content": f.content, "type": f.fact_type} for f in results.results]


# Get enriched context
ctx = sonzai_client.agents.get_context(
    agent_id, user_id=user_id, session_id="session-abc", query=user_message
)

llm = ChatGoogleGenerativeAI(model="gemini-3.1-flash-lite-preview")
agent = create_react_agent(llm, [knowledge_search, memory_search])

result = agent.invoke({
    "messages": [
        {"role": "system", "content": build_system_prompt(ctx)},
        {"role": "user", "content": user_message},
    ]
})

OpenAI-Compatible (Generic)

Any framework that accepts OpenAI function-calling format can use the schemas directly:

import { Sonzai } from "@sonzai-labs/agents";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

// Fetch schemas and convert to OpenAI format
const { tools: sonzaiSchemas } = await sonzai.agents.getTools("agent-id");

const openaiTools = sonzaiSchemas.map((t) => ({
  type: "function" as const,
  function: {
    name: t.name,
    description: t.description,
    parameters: t.parameters,
  },
}));

// Pass to any OpenAI-compatible provider
const response = await openai.chat.completions.create({
  model: "your-model",
  messages: [...],
  tools: openaiTools,
});

// Handle tool calls in the response
for (const call of response.choices[0].message.tool_calls ?? []) {
  const args = JSON.parse(call.function.arguments);

  if (call.function.name === "knowledge_search") {
    const result = await sonzai.agents.knowledgeSearch("agent-id", {
      query: args.query,
      limit: args.limit,
    });
    // Feed result back to the LLM as a tool response
  }

  if (call.function.name === "memory_search") {
    const result = await sonzai.agents.memory.search("agent-id", {
      query: args.query,
      userId: "user-123",
    });
    // Feed result back to the LLM as a tool response
  }
}

Understanding Deferred Enrichment

The most powerful aspect of standalone mode is the self-improving learning loop. Even without explicit tool calls, the agent gets smarter each turn because /process detects knowledge gaps and primes the next /context call.

How It Works

┌──────────────────────────────────────────────────────────────────┐
│  Turn N                                                          │
│                                                                  │
│  1. GET /context?query="hiking boots"                            │
│     → Returns enriched context + any KB matches for "hiking"     │
│     → Also returns deferred results from Turn N-1 (if any)      │
│                                                                  │
│  2. Chat with your LLM (using enriched context)                  │
│                                                                  │
│  3. POST /process (send transcript)                              │
│     → Extracts facts: "user needs waterproof hiking boots"       │
│     → Extracts entities: "hiking boots", "waterproof"            │
│     → Searches KB with extracted topics (async, after response)  │
│     → Finds: "Hiking Gear Guide", "Waterproof Materials FAQ"    │
│     → Stores as deferred signals (Redis, 1-hour TTL)            │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│  Turn N+1                                                        │
│                                                                  │
│  1. GET /context?query="which brand do you recommend?"           │
│     → Direct search: matches for "brand recommend"              │
│     → Deferred results: "Hiking Gear Guide" + "Waterproof FAQ"  │
│     → Both merged into response (deduplicated)                  │
│     → Deferred signals consumed (one-shot, not repeated)        │
│                                                                  │
│  2. Chat with your LLM                                          │
│     → Now has hiking gear knowledge it didn't have before!      │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

Key Properties

One-shot signalsDeferred KB results are consumed when /context reads them. They appear exactly once, preventing stale or repeated information.
TTL-based expiryDeferred signals expire after 1 hour. If the user doesn't continue the conversation, stale signals are automatically cleaned up.
DeduplicationIf the direct /context query matches the same KB document as a deferred signal, the duplicate is removed. You never get the same result twice.
Capped searches/process runs at most 5 KB queries per call and stores at most 10 deferred results, preventing resource explosion on topic-heavy conversations.

Memory Search Is Always Synchronous

Unlike KB enrichment, memory search has no deferred/async path. When /context is called, it recalls the most relevant memories immediately using the hierarchical memory tree and BM25 indexes. When you call GET /memory/search explicitly, results return immediately.

The deferred behavior only applies to knowledge base content, where /process proactively discovers KB documents the agent should have known about. Memory facts are always available synchronously because they are indexed at write time (during /process).

Recommended Integration Pattern

For most applications, combine automatic enrichment with explicit tool calling for the best results:

import { generateText, tool } from "ai";
import { google } from "@ai-sdk/google";
import { Sonzai } from "@sonzai-labs/agents";
import { z } from "zod";

const sonzai = new Sonzai({ apiKey: process.env.SONZAI_API_KEY! });

async function chat(agentId: string, userId: string, sessionId: string, message: string) {
  // Step 1: Automatic enrichment — context includes KB + memories
  const ctx = await sonzai.agents.getContext(agentId, {
    userId,
    sessionId,
    query: message,
  });

  // Step 2: Chat with tools for on-demand search
  const { text, steps } = await generateText({
    model: google("gemini-3.1-flash-lite-preview"),
    system: buildSystemPrompt(ctx),
    prompt: message,
    tools: {
      knowledge_search: tool({
        description: "Search knowledge base for additional documents",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          const r = await sonzai.agents.knowledgeSearch(agentId, { query, limit: 5 });
          return r.results.map((d) => ({ content: d.content, label: d.label }));
        },
      }),
      memory_search: tool({
        description: "Search memory for additional facts about the user",
        parameters: z.object({ query: z.string() }),
        execute: async ({ query }) => {
          const r = await sonzai.agents.memory.search(agentId, { query, userId });
          return r.results.map((f) => ({ content: f.content, type: f.fact_type }));
        },
      }),
    },
    maxSteps: 3,
  });

  // Step 3: Process — extracts memories + primes next context with KB gaps
  await sonzai.agents.process(agentId, {
    userId,
    sessionId,
    messages: [
      { role: "user", content: message },
      { role: "assistant", content: text },
    ],
    provider: "gemini",
  });

  return text;
}
GET /context
                │
   ┌────────────┴────────────┐
   │                         │
   ▼                         ▼
Recalled              KB Search
Memories              Results
   │                    │
   └────────┬───────────┘
            │
            ▼
     System Prompt ──────► Your LLM
            │                  │
            │          ┌───────┴──────────────┐
            │          │ Tool call?            │
            │          │ knowledge_search()    │
            │          │ memory_search()       │
            │          └───────┬──────────────┘
            │                  │
            │                  ▼
            │             Response
            │                  │
            ▼                  ▼
        POST /process
            │
   ┌────────┴────────┐
   │                 │
   ▼                 ▼
Extract         Detect KB
Facts           Gaps (deferred)
   │                 │
   ▼                 ▼
Store in        Store in Redis
Memory Tree     (for next /context)

Frequently Asked Questions

Do I need tool calling if I already use /context?

Not necessarily. /contextautomatically includes KB results and recalled memories. Tool calling is useful when the LLM needs to search for something specific mid-conversation that wasn't covered by the initial context fetch, or when your framework expects tool definitions.

Is memory search async like KB enrichment?

No. Memory search is always synchronous. When you call GET /memory/search, results return immediately from BM25 indexes. The deferred/async flow only applies to knowledge base enrichment via the /process learning loop.

What happens if /process finds KB content but the user never calls /context again?

The deferred signals expire after 1 hour (TTL-based cleanup). No stale data persists. If the user resumes the conversation later, they get fresh results from the next /context call.

Can I use my own tools alongside Sonzai tools?

Absolutely. The Sonzai tool schemas are standard OpenAI function definitions. Mix them with your own tools in whatever framework you use. The LLM decides which tool to call based on the conversation.

How do custom tools defined in the dashboard relate to these?

Custom tools (created via POST /agents/{agentId}/toolsor the dashboard) are for agent-side tool calling in Sonzai's managed chat mode. The tool schemas described here (/tools/schemas) are for BYO-LLM mode where your LLM calls Sonzai endpoints.