Overview
A snapshot records the step structure of a known-good agent run — which steps ran, in what order. CompareSnapshotAsync compares a new run against that baseline and reports any differences: added steps, removed steps, or reordering.
Two use cases:
- Production monitoring — detect silent regressions in live traffic
- CI testing — catch structural changes before they reach production
How it works
- Pick a known-good production trace as your golden baseline.
- Call
LoadSnapshotAsync(traceId) to load its step structure.
- After each run, call
CompareSnapshotAsync(name, snapshot, ctx).
- If the step shape changed,
SnapshotDiff.HasChanges is true.
Production monitoring
Compare every live run against a pinned baseline:
// Load once at startup (or cache it)
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");
// In your agent run
var result = await veval.RunAsync("my-agent", async ctx =>
{
var answer = await myService.ExecuteAsync(ctx);
// Compare step shape — posts result to dashboard
var diff = await veval.CompareSnapshotAsync("my-agent-snapshot", golden!, ctx);
if (diff.HasChanges)
{
// log, alert, or handle degradation
logger.LogWarning("Structural regression detected in my-agent");
}
return answer;
});
CI / integration tests
Use VevalTestSdk to run at zero LLM cost:
var trace = await veval.GetTraceAsync("tr_4ec4e79c5d03...");
var golden = await veval.LoadSnapshotAsync("tr_9bc2024e84324464...");
var testSdk = new VevalTestSdk(new VevalOptions { ApiKey = TestApiKey })
.WithReplay(trace);
var service = new MyAgentService(testSdk, /* mock claude */);
var replayResult = await testSdk.ReplayAsync(
trace,
service.ExecuteAsync,
new ReplayOptions { MockLlmResponses = true, Assertions = [] }
);
var diff = await veval.CompareSnapshotAsync(
"my-agent-snapshot",
golden!,
replayResult.ReplayedContext!
);
Assert.False(diff.HasChanges, $"Structural regression: {string.Join(", ", diff.RemovedSteps)}");
SnapshotDiff
| Member | Description |
|---|
HasChanges | true if any difference was detected |
AddedSteps | Step names present in actual but not in the snapshot |
RemovedSteps | Step names present in snapshot but missing from actual |
OrderChanges | Steps that ran in a different position |
if (diff.HasChanges)
{
foreach (var step in diff.AddedSteps)
Console.WriteLine($"+ added: {step}");
foreach (var step in diff.RemovedSteps)
Console.WriteLine($"- removed: {step}");
foreach (var change in diff.OrderChanges)
Console.WriteLine($"~ order: {change}");
}
Choosing a golden trace
Pick a trace that represents the expected “happy path” structure:
- All expected steps completed successfully
- No error steps
- Representative of typical production behavior
Update your golden trace whenever you intentionally change your agent’s step structure. Leaving a stale baseline will cause false positives.