Skip to main content

Overview

A snapshot records the step structure of a known-good agent run — which steps ran, in what order. CompareSnapshotAsync compares a new run against that baseline and reports any differences: added steps, removed steps, or reordering. Two use cases:
  • Production monitoring — detect silent regressions in live traffic
  • CI testing — catch structural changes before they reach production

How it works

  1. Pick a known-good production trace as your golden baseline.
  2. Call LoadSnapshotAsync(traceId) to load its step structure.
  3. After each run, call CompareSnapshotAsync(name, snapshot, ctx).
  4. If the step shape changed, SnapshotDiff.HasChanges is true.

Production monitoring

Compare every live run against a pinned baseline:
// Load once at startup (or cache it)
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");

// In your agent run
var result = await veval.RunAsync("my-agent", async ctx =>
{
    var answer = await myService.ExecuteAsync(ctx);

    // Compare step shape — posts result to dashboard
    var diff = await veval.CompareSnapshotAsync("my-agent-snapshot", golden!, ctx);

    if (diff.HasChanges)
    {
        // log, alert, or handle degradation
        logger.LogWarning("Structural regression detected in my-agent");
    }

    return answer;
});

CI / integration tests

Use VevalTestSdk to run at zero LLM cost:
var trace  = await veval.GetTraceAsync("tr_4ec4e79c5d03...");
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");

var testSdk = new VevalTestSdk(new VevalOptions { ApiKey = TestApiKey })
    .WithReplay(trace);
var service = new MyAgentService(testSdk, /* mock claude */);

var replayResult = await testSdk.ReplayAsync(
    trace,
    service.ExecuteAsync,
    new ReplayOptions { MockLlmResponses = true, Assertions = [] }
);

var diff = await veval.CompareSnapshotAsync(
    "my-agent-snapshot",
    golden!,
    replayResult.ReplayedContext!
);

Assert.False(diff.HasChanges, $"Structural regression: {string.Join(", ", diff.RemovedSteps)}");

SnapshotDiff

MemberDescription
HasChangestrue if any difference was detected
AddedStepsStep names present in actual but not in the snapshot
RemovedStepsStep names present in snapshot but missing from actual
OrderChangesSteps that ran in a different position
if (diff.HasChanges)
{
    foreach (var step in diff.AddedSteps)
        Console.WriteLine($"+ added:   {step}");
    foreach (var step in diff.RemovedSteps)
        Console.WriteLine($"- removed: {step}");
    foreach (var change in diff.OrderChanges)
        Console.WriteLine($"~ order:   {change}");
}

Choosing a golden trace

Pick a trace that represents the expected “happy path” structure:
  • All expected steps completed successfully
  • No error steps
  • Representative of typical production behavior
Update your golden trace whenever you intentionally change your agent’s step structure. Leaving a stale baseline will cause false positives.