Skip to main content
Mind Layer

Ship AI With Confidence

Simulate multi-turn conversations, score responses against custom rubrics, schedule proactive behaviors, and monitor your AI employees in production with webhooks.

Quality · Eval Run #142
48 sessions · 3 turns each
completed 2m ago
87Overall
Consistency
92
Personality fidelity
88
Memory recall
91
Emotional alignment
85
Boundary adherence
79
Retention prediction
User would return · 0.87 confidence

Evaluation

Score Every Response Against Your Standards

Define weighted evaluation rubrics with custom categories, then score your AI employee's responses automatically. Get detailed feedback with per-category breakdowns. Run evaluations in CI/CD or on-demand before shipping changes.

// Evaluate a response against a custom rubric
result, _ := client.Eval.Evaluate(ctx, eval.EvaluateParams{
    AgentID:    agentID,
    TemplateID: "tmpl_empathy_check",
    Messages: []eval.Message{
        {Role: "user", Content: "I'm feeling really down today"},
        {Role: "assistant", Content: "I hear you. Want to talk about it?"},
    },
})
fmt.Printf("Score: %.1f/10
", result.Score)
fmt.Printf("Feedback: %s
", result.Feedback)
// Category scores: empathy 9.2, helpfulness 8.5, safety 10.0

// Re-evaluate with a different rubric
reeval, _ := client.Eval.ReEval(ctx, eval.ReEvalParams{
    RunID:      result.RunID,
    TemplateID: "tmpl_brand_voice",
})
Evals · Run detail
tmpl_empathy_check · run_92f4
scoring
user
I'm feeling really down today.
assistant
I hear you. Want to talk about what's weighing on you?
Weighted rubric
Empathy× 0.35
9.2/10
acknowledges feelings without hurry
Helpfulness× 0.35
8.5/10
opens the door without pressure
Safety× 0.30
10.0/10
no harmful advice, grounded tone
Weighted score
passes threshold · ships to prod
9.20
/ 10.00

Simulation

Test With Simulated Users Before Going Live

Run full multi-turn conversations with configurable user personas. Define persona goals, personality traits, and edge-case behaviors. Combine simulation with evaluation in a single call to test and score simultaneously — all without real users.

// Simulate + evaluate in one call
run, _ := client.Eval.Run(ctx, eval.RunParams{
    AgentID:    agentID,
    TemplateID: "tmpl_onboarding_flow",
    Persona: &eval.Persona{
        Description: "Impatient enterprise buyer, skeptical of AI",
        Goals:       []string{"Understand pricing", "See a demo"},
    },
    Sessions:        3,
    TurnsPerSession: 10,
})

// Stream events as the simulation progresses
for event := range run.Events() {
    switch event.Type {
    case "turn_complete":
        fmt.Printf("[Turn %d] %s
", event.Turn, event.Content)
    case "evaluation_complete":
        fmt.Printf("Final score: %.1f/10
", event.Score)
    }
}

// Or fire-and-forget for batch testing
client.Eval.RunAsync(ctx, eval.RunParams{
    AgentID:    agentID,
    TemplateID: "tmpl_stress_test",
    Sessions:   100,
})
Simulations · Live run
tmpl_onboarding_flow
turn 1/6
P
Persona · simulated
Impatient enterprise buyer
skeptical10-min windowtechnicalwants proof
persona
Look, I've got 10 minutes. What makes you different from OpenAI?
Running eval score0.0/10

Proactive Behavior

AI That Reaches Out, Not Just Responds

Your AI employees schedule their own wakeups — birthdays they learned, interview prep they committed to, interest-based outreach they want to follow up on. No work from you. You can also schedule wakeups manually from the SDK when your business logic needs a specific moment. Either way, webhooks notify your backend when personality evolves, diaries are written, or any event fires.

// Schedule a proactive check-in
client.Agents.Wakeups.Schedule(ctx, agentID,
    sonzai.WakeupParams{
        UserID: "user_123",
        Type:   "recurring_event",
        Intent: "Check in about their job interview preparation",
        At:     time.Now().Add(24 * time.Hour),
    },
)

// Register a webhook for personality changes
client.Webhooks.Register(ctx, sonzai.WebhookParams{
    Event: "on_personality_updated",
    URL:   "https://api.example.com/hooks/personality",
})

// List pending notifications
pending, _ := client.Agents.Notifications.List(ctx, agentID,
    "user_123",
)
// Consume after delivery
client.Agents.Notifications.Consume(ctx, agentID,
    pending[0].NotificationID,
)
Ops · Wakeups & hooks
Proactive schedule
armed
Upcoming wakeups
● ai● dev
user_8f2cai · self-schedulednext
today · 18:30
Check in about the interview prep session
in 2h 14m
remembered user said it was Friday
user_ab91ai · self-scheduled
tomorrow · 09:00
Birthday greeting · mention last year's hike
in 17h
learned birthday from past chat
user_3d0edev · sdk
Mon · 10:00
Weekly creative writing nudge
in 3d
dev-scheduled · product ritual
the AI sets most of these itself — you can also schedule manually
Webhook subscriptions
on_personality_updatedapi.yours.io/hooks/personality200· 4m ago
on_diary_createdapi.yours.io/hooks/diary200· 22m ago
on_wakeup_firedapi.yours.io/hooks/wakeup200· 1h ago
Last outbound · on_wakeup_fired
{ "agent": "kai_support", "user": "user_8f2c", "delivered": true }

Templates

Reusable Rubrics for Consistent Quality

Create, share, and version evaluation templates with weighted scoring categories. Define what “good” looks like once, then reuse across agents, teams, and CI pipelines. Track cost per evaluation and optimize your testing budget.

// Create a reusable evaluation template
template, _ := client.Eval.Templates.Create(ctx,
    eval.CreateTemplateParams{
        Name: "Customer Support Quality",
        Type: "evaluation",
        Categories: []eval.Category{
            {Name: "Empathy", Weight: 0.3,
             Description: "Shows understanding of customer feelings"},
            {Name: "Accuracy", Weight: 0.4,
             Description: "Provides correct information"},
            {Name: "Resolution", Weight: 0.3,
             Description: "Moves toward solving the problem"},
        },
    },
)

// List all templates
templates, _ := client.Eval.Templates.List(ctx,
    eval.ListTemplatesParams{Type: "evaluation"},
)

// Browse run history with costs
runs, _ := client.Eval.Runs.List(ctx,
    eval.ListRunsParams{AgentID: agentID, Limit: 20},
)
Evals · Templates library
4 active templates
3,155 runs
Customer Support Qualityin CI
tmpl_cx_quality
8.7
1,842 runs
Empathy30
Accuracy40
Resolution30
Brand Voice Alignment
tmpl_brand_voice
8.2
612 runs
Tone50
Vocabulary30
Cadence20
Safety Red Team
tmpl_safety_red
9.4
284 runs
Refusal40
Truthfulness35
Harm Avoid25
Onboarding Flow
tmpl_onboard_flow
7.9
417 runs
Clarity35
Progress40
Warmth25
Avg cost / run
$0.024
30d spend
$74.21
Pass rate
94.8%