From custom JSON
The catch-all migration guide for homegrown chat stores, proprietary databases, Responses API logs, and anything else that has users and messages. Anything with (user_id, messages[]) fits.
When to use this guide
Use this guide if your data doesn't match one of the named sources and you already have it in some shape you control — a Postgres table, a pile of JSON files, the OpenAI Responses API store, a custom fine-tuning dataset, a backup from a deprecated product.
The target shape is simple: for each user you want to migrate, you need a stable user_id and some amount of content about them. Everything else is optional.
The canonical shape
Get your source data into this shape first. Everything else in this guide operates on it.
type SourceUser = {
user_id: string;
display_name?: string;
email?: string;
custom?: Record<string, string>;
transcripts: string[]; // role-tagged, e.g. "User: hi\nAgent: hey"
facts?: string[]; // one assertion per string
notes?: string[]; // free-form paragraphs
};transcripts— one string per distinct conversation. Keep them role-tagged (User: .../Agent: ...) so the extractor can distinguish turns.facts— if your source has pre-extracted user assertions, pass them here; each becomes atextblock.notes— anything else: CRM notes, support ticket summaries, onboarding questionnaires.
All three are optional. A user with only display_name and email is still a valid import — Sonzai will generate facts from metadata alone.
Example: migrating from a Postgres chat table
Suppose you have two tables:
users(id, display_name, email, created_at)
messages(user_id, role, content, created_at, conversation_id)import os, psycopg
from collections import defaultdict
from sonzai import Sonzai
sonzai = Sonzai(api_key=os.environ["SONZAI_API_KEY"])
AGENT_ID = "agent_abc"
def load_source():
conn = psycopg.connect(os.environ["DATABASE_URL"])
users = {u[0]: {"user_id": u[0], "display_name": u[1], "email": u[2]}
for u in conn.execute("SELECT id, display_name, email FROM users")}
convos = defaultdict(lambda: defaultdict(list))
rows = conn.execute(
"SELECT user_id, conversation_id, role, content "
"FROM messages ORDER BY user_id, conversation_id, created_at"
)
for user_id, convo_id, role, content in rows:
r = "User" if role == "user" else "Agent"
convos[user_id][convo_id].append(f"{r}: {content}")
for user_id, u in users.items():
u["transcripts"] = [
f"[conversation {cid}]\n" + "\n".join(lines)
for cid, lines in convos[user_id].items()
]
return list(users.values())
def to_sonzai_user(src):
content = [
{"type": "chat_transcript", "body": t} for t in src.get("transcripts", [])
] + [
{"type": "text", "body": f} for f in src.get("facts", [])
] + [
{"type": "note", "body": n} for n in src.get("notes", [])
]
return {
"user_id": src["user_id"],
"display_name": src.get("display_name"),
"metadata": {
"email": src.get("email"),
"custom": src.get("custom") or {},
},
"content": content,
}
def migrate():
source = load_source()
# Chunk in batches of 200 to keep individual requests small
for i in range(0, len(source), 200):
chunk = [to_sonzai_user(s) for s in source[i : i + 200]]
job = sonzai.agents.priming.batch_import(
AGENT_ID, source="custom_json", users=chunk,
)
print(f"batch {i//200}: job_id={job.job_id} users={job.total_users}")Example: OpenAI Responses API store
The Responses API persists the conversation via previous_response_id chains. Walk each chain back to the root, collect input/output items, and treat the whole chain as one transcript.
from openai import OpenAI
openai = OpenAI()
def walk_response_chain(response_id: str) -> list:
"""Returns oldest-first list of (role, text) tuples."""
items = []
current = response_id
while current:
r = openai.responses.retrieve(current)
# r.input and r.output both contain content items; flatten text
for item in (r.output or []):
if getattr(item, "type", None) == "message":
for c in item.content or []:
if getattr(c, "type", None) == "output_text":
items.append(("Agent", c.text))
# The user turn is in r.input when the chain started on a user message
current = getattr(r, "previous_response_id", None)
return list(reversed(items))Pass the resulting transcript as a chat_transcript block per the canonical shape above.
Verify
curl -s https://api.sonz.ai/api/v1/agents/agent_abc/users/import/$JOB_ID \
-H "Authorization: Bearer $SONZAI_API_KEY" | jq '{status,total_users,facts_stored,errors}'List users to spot-check:
curl -s "https://api.sonz.ai/api/v1/agents/agent_abc/users?limit=5&sort_by=created_at&sort_order=desc" \
-H "Authorization: Bearer $SONZAI_API_KEY"Tips
- Keep
user_idstable across systems. The same key in your app database and in Sonzai means you don't need a second lookup table. - Batch size. 100–300 users per
batch_importrequest is the sweet spot. Larger batches increase the odds of a timeout on the HTTP side even though processing is async. Pace yourself to avoid rate limits on metadata-fact extraction, which runs synchronously. - Progress reporting to your users. You can show "importing..." UI by polling
get_import_statusand computingprocessed_users / total_users. It ticks up as the async worker processes each user. - Incremental migration. If you want to migrate live without downtime, run the import in the background and keep writing new chats to your old system until the backfill catches up. Once Sonzai has the history, cut over to Sonzai for all new chats; old data stays in place but the agent has everything it needs.
What's next
- CRM / CSV — if your data is tabular rather than conversational.
- Knowledge base — if you also have documents to migrate.
- Memory — how the extracted facts get organised.
From Character.AI / Replika
Migrate companion character chat exports from Character.AI, Replika, Chai, and similar companion apps into Sonzai. One character becomes one Sonzai agent; one user's conversation history becomes one Sonzai user.
From CRM / CSV
Migrate structured tabular data — Salesforce and HubSpot contacts, product ownership tables, subscription rosters — into Sonzai. Two patterns, one for contact rosters, one for inventory-style facts.