Skip to main content
SONZAI

From custom JSON

The catch-all migration guide for homegrown chat stores, proprietary databases, Responses API logs, and anything else that has users and messages. Anything with (user_id, messages[]) fits.

When to use this guide

Use this guide if your data doesn't match one of the named sources and you already have it in some shape you control — a Postgres table, a pile of JSON files, the OpenAI Responses API store, a custom fine-tuning dataset, a backup from a deprecated product.

The target shape is simple: for each user you want to migrate, you need a stable user_id and some amount of content about them. Everything else is optional.

The canonical shape

Get your source data into this shape first. Everything else in this guide operates on it.

type SourceUser = {
  user_id:       string;
  display_name?: string;
  email?:        string;
  custom?:       Record<string, string>;
  transcripts:   string[];   // role-tagged, e.g. "User: hi\nAgent: hey"
  facts?:        string[];   // one assertion per string
  notes?:        string[];   // free-form paragraphs
};
  • transcripts — one string per distinct conversation. Keep them role-tagged (User: ... / Agent: ...) so the extractor can distinguish turns.
  • facts — if your source has pre-extracted user assertions, pass them here; each becomes a text block.
  • notes — anything else: CRM notes, support ticket summaries, onboarding questionnaires.

All three are optional. A user with only display_name and email is still a valid import — Sonzai will generate facts from metadata alone.

Example: migrating from a Postgres chat table

Suppose you have two tables:

users(id, display_name, email, created_at)
messages(user_id, role, content, created_at, conversation_id)
import os, psycopg
from collections import defaultdict
from sonzai import Sonzai

sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])
AGENT_ID = "agent_abc"

def load_source():
  conn = psycopg.connect(os.environ["DATABASE_URL"])
  users = {u[0]: {"user_id": u[0], "display_name": u[1], "email": u[2]}
           for u in conn.execute("SELECT id, display_name, email FROM users")}

  convos = defaultdict(lambda: defaultdict(list))
  rows = conn.execute(
      "SELECT user_id, conversation_id, role, content "
      "FROM messages ORDER BY user_id, conversation_id, created_at"
  )
  for user_id, convo_id, role, content in rows:
      r = "User" if role == "user" else "Agent"
      convos[user_id][convo_id].append(f"{r}: {content}")

  for user_id, u in users.items():
      u["transcripts"] = [
          f"[conversation {cid}]\n" + "\n".join(lines)
          for cid, lines in convos[user_id].items()
      ]
  return list(users.values())

def to_sonzai_user(src):
  content = [
      {"type": "chat_transcript", "body": t} for t in src.get("transcripts", [])
  ] + [
      {"type": "text", "body": f} for f in src.get("facts", [])
  ] + [
      {"type": "note", "body": n} for n in src.get("notes", [])
  ]
  return {
      "user_id":      src["user_id"],
      "display_name": src.get("display_name"),
      "metadata": {
          "email":  src.get("email"),
          "custom": src.get("custom") or {},
      },
      "content": content,
  }

def migrate():
  source = load_source()
  # Chunk in batches of 200 to keep individual requests small
  for i in range(0, len(source), 200):
      chunk = [to_sonzai_user(s) for s in source[i : i + 200]]
      job = sonzai.agents.priming.batch_import(
          AGENT_ID, source="custom_json", users=chunk,
      )
      print(f"batch {i//200}: job_id={job.job_id} users={job.total_users}")

Example: OpenAI Responses API store

The Responses API persists the conversation via previous_response_id chains. Walk each chain back to the root, collect input/output items, and treat the whole chain as one transcript.

from openai import OpenAI
openai = OpenAI()

def walk_response_chain(response_id: str) -> list:
    """Returns oldest-first list of (role, text) tuples."""
    items = []
    current = response_id
    while current:
        r = openai.responses.retrieve(current)
        # r.input and r.output both contain content items; flatten text
        for item in (r.output or []):
            if getattr(item, "type", None) == "message":
                for c in item.content or []:
                    if getattr(c, "type", None) == "output_text":
                        items.append(("Agent", c.text))
        # The user turn is in r.input when the chain started on a user message
        current = getattr(r, "previous_response_id", None)
    return list(reversed(items))

Pass the resulting transcript as a chat_transcript block per the canonical shape above.

Verify

curl -s https://api.sonz.ai/api/v1/agents/agent_abc/users/import/$JOB_ID \
  -H "Authorization: Bearer $SONZAI_API_KEY" | jq '{status,total_users,facts_stored,errors}'

List users to spot-check:

curl -s "https://api.sonz.ai/api/v1/agents/agent_abc/users?limit=5&sort_by=created_at&sort_order=desc" \
  -H "Authorization: Bearer $SONZAI_API_KEY"

Tips

  • Keep user_id stable across systems. The same key in your app database and in Sonzai means you don't need a second lookup table.
  • Batch size. 100–300 users per batch_import request is the sweet spot. Larger batches increase the odds of a timeout on the HTTP side even though processing is async. Pace yourself to avoid rate limits on metadata-fact extraction, which runs synchronously.
  • Progress reporting to your users. You can show "importing..." UI by polling get_import_status and computing processed_users / total_users. It ticks up as the async worker processes each user.
  • Incremental migration. If you want to migrate live without downtime, run the import in the background and keep writing new chats to your old system until the backfill catches up. Once Sonzai has the history, cut over to Sonzai for all new chats; old data stays in place but the agent has everything it needs.

What's next

  • CRM / CSV — if your data is tabular rather than conversational.
  • Knowledge base — if you also have documents to migrate.
  • Memory — how the extracted facts get organised.

On this page