Snapshots - Veval

Overview

A snapshot records the step structure of a known-good agent run — which steps ran, in what order. CompareSnapshotAsync compares a new run against that baseline and reports any differences: added steps, removed steps, or reordering. Two use cases:

Production monitoring — detect silent regressions in live traffic
CI testing — catch structural changes before they reach production

How it works

Pick a known-good production trace as your golden baseline.
Call LoadSnapshotAsync(traceId) to load its step structure.
After each run, call CompareSnapshotAsync(name, snapshot, ctx).
If the step shape changed, SnapshotDiff.HasChanges is true.

Production monitoring

Compare every live run against a pinned baseline:

// Load once at startup (or cache it)
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");

// In your agent run
var result = await veval.RunAsync("my-agent", async ctx =>
{
    var answer = await myService.ExecuteAsync(ctx);

    // Compare step shape — posts result to dashboard
    var diff = await veval.CompareSnapshotAsync("my-agent-snapshot", golden!, ctx);

    if (diff.HasChanges)
    {
        // log, alert, or handle degradation
        logger.LogWarning("Structural regression detected in my-agent");
    }

    return answer;
});

CI / integration tests

Use VevalTestSdk to run at zero LLM cost:

var trace  = await veval.GetTraceAsync("tr_4ec4e79c5d03...");
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");

var testSdk = new VevalTestSdk(new VevalOptions { ApiKey = TestApiKey })
    .WithReplay(trace);
var service = new MyAgentService(testSdk, /* mock claude */);

var replayResult = await testSdk.ReplayAsync(
    trace,
    service.ExecuteAsync,
    new ReplayOptions { MockLlmResponses = true, Assertions = [] }
);

var diff = await veval.CompareSnapshotAsync(
    "my-agent-snapshot",
    golden!,
    replayResult.ReplayedContext!
);

Assert.False(diff.HasChanges, $"Structural regression: {string.Join(", ", diff.RemovedSteps)}");

SnapshotDiff

Member	Description
`HasChanges`	`true` if any difference was detected
`AddedSteps`	Step names present in actual but not in the snapshot
`RemovedSteps`	Step names present in snapshot but missing from actual
`OrderChanges`	Steps that ran in a different position

if (diff.HasChanges)
{
    foreach (var step in diff.AddedSteps)
        Console.WriteLine($"+ added:   {step}");
    foreach (var step in diff.RemovedSteps)
        Console.WriteLine($"- removed: {step}");
    foreach (var change in diff.OrderChanges)
        Console.WriteLine($"~ order:   {change}");
}

Choosing a golden trace

Pick a trace that represents the expected “happy path” structure:

All expected steps completed successfully
No error steps
Representative of typical production behavior

Update your golden trace whenever you intentionally change your agent’s step structure. Leaving a stale baseline will cause false positives.

Documentation Index

​Overview

​How it works

​Production monitoring

​CI / integration tests

​SnapshotDiff

​Choosing a golden trace

Overview

How it works

Production monitoring

CI / integration tests

SnapshotDiff

Choosing a golden trace