Ship AI With Confidence
Simulate multi-turn conversations, score responses against custom rubrics, schedule proactive behaviors, and monitor your AI employees in production with webhooks.
Evaluation
Score Every Response Against Your Standards
Define weighted evaluation rubrics with custom categories, then score your AI employee's responses automatically. Get detailed feedback with per-category breakdowns. Run evaluations in CI/CD or on-demand before shipping changes.
// Evaluate a response against a custom rubric
result, _ := client.Eval.Evaluate(ctx, eval.EvaluateParams{
AgentID: agentID,
TemplateID: "tmpl_empathy_check",
Messages: []eval.Message{
{Role: "user", Content: "I'm feeling really down today"},
{Role: "assistant", Content: "I hear you. Want to talk about it?"},
},
})
fmt.Printf("Score: %.1f/10
", result.Score)
fmt.Printf("Feedback: %s
", result.Feedback)
// Category scores: empathy 9.2, helpfulness 8.5, safety 10.0
// Re-evaluate with a different rubric
reeval, _ := client.Eval.ReEval(ctx, eval.ReEvalParams{
RunID: result.RunID,
TemplateID: "tmpl_brand_voice",
})Simulation
Test With Simulated Users Before Going Live
Run full multi-turn conversations with configurable user personas. Define persona goals, personality traits, and edge-case behaviors. Combine simulation with evaluation in a single call to test and score simultaneously — all without real users.
// Simulate + evaluate in one call
run, _ := client.Eval.Run(ctx, eval.RunParams{
AgentID: agentID,
TemplateID: "tmpl_onboarding_flow",
Persona: &eval.Persona{
Description: "Impatient enterprise buyer, skeptical of AI",
Goals: []string{"Understand pricing", "See a demo"},
},
Sessions: 3,
TurnsPerSession: 10,
})
// Stream events as the simulation progresses
for event := range run.Events() {
switch event.Type {
case "turn_complete":
fmt.Printf("[Turn %d] %s
", event.Turn, event.Content)
case "evaluation_complete":
fmt.Printf("Final score: %.1f/10
", event.Score)
}
}
// Or fire-and-forget for batch testing
client.Eval.RunAsync(ctx, eval.RunParams{
AgentID: agentID,
TemplateID: "tmpl_stress_test",
Sessions: 100,
})Proactive Behavior
AI That Reaches Out, Not Just Responds
Your AI employees schedule their own wakeups — birthdays they learned, interview prep they committed to, interest-based outreach they want to follow up on. No work from you. You can also schedule wakeups manually from the SDK when your business logic needs a specific moment. Either way, webhooks notify your backend when personality evolves, diaries are written, or any event fires.
// Schedule a proactive check-in
client.Agents.Wakeups.Schedule(ctx, agentID,
sonzai.WakeupParams{
UserID: "user_123",
Type: "recurring_event",
Intent: "Check in about their job interview preparation",
At: time.Now().Add(24 * time.Hour),
},
)
// Register a webhook for personality changes
client.Webhooks.Register(ctx, sonzai.WebhookParams{
Event: "on_personality_updated",
URL: "https://api.example.com/hooks/personality",
})
// List pending notifications
pending, _ := client.Agents.Notifications.List(ctx, agentID,
"user_123",
)
// Consume after delivery
client.Agents.Notifications.Consume(ctx, agentID,
pending[0].NotificationID,
)Templates
Reusable Rubrics for Consistent Quality
Create, share, and version evaluation templates with weighted scoring categories. Define what “good” looks like once, then reuse across agents, teams, and CI pipelines. Track cost per evaluation and optimize your testing budget.
// Create a reusable evaluation template
template, _ := client.Eval.Templates.Create(ctx,
eval.CreateTemplateParams{
Name: "Customer Support Quality",
Type: "evaluation",
Categories: []eval.Category{
{Name: "Empathy", Weight: 0.3,
Description: "Shows understanding of customer feelings"},
{Name: "Accuracy", Weight: 0.4,
Description: "Provides correct information"},
{Name: "Resolution", Weight: 0.3,
Description: "Moves toward solving the problem"},
},
},
)
// List all templates
templates, _ := client.Eval.Templates.List(ctx,
eval.ListTemplatesParams{Type: "evaluation"},
)
// Browse run history with costs
runs, _ := client.Eval.Runs.List(ctx,
eval.ListRunsParams{AgentID: agentID, Limit: 20},
)