What Veval does
Veval wraps your agent runs and LLM calls to record traces: inputs, outputs, step timing, token counts, and cost. Traces appear in the dashboard. You run scenarios and assertions against them in CI to catch regressions before they reach production. Three SDKs, same concept: C# (Veval.Sdk), Python (veval), Node (@veval/sdk).
Install
Initialize
C#
Python
Node
Core API
RunAsync — wrap a complete agent run
C#
Python
Node
ctx object is a VevalExecutionContext.
TrackStepAsync — record one LLM call or sub-operation
Simple overload (no metadata):C#
Python
Node
C#
Python
Node
| Key | Type | Effect |
|---|---|---|
model | string | Stored on step.model |
tokens_in | int | Stored on step.tokens_in |
tokens_out | int | Stored on step.tokens_out |
cost_usd | float | Stored on step.cost_usd |
type | string | Use "tool" for tool-call steps |
| (any other key) | any | Stored in step.metadata dict |
Nested steps
Pass the parent handle’s context to nest steps in the dashboard tree:C#
Python
Node
Trace-level metadata
C#
Python
Node
Replay / Test SDK
VevalTestSdk is a drop-in test double. It mocks LLM step outputs from a recorded trace so no real API calls are made. Throws if a step name isn’t found in the trace (strict mode).
C#
Python
Node
ReplayAsync (lower-level)
C#
Python
Node
Assertions
All assertions implementITraceAssertion. They return null on pass or a failure string.
Built-in factory — all three languages:
| C# | Python | Node | Description |
|---|---|---|---|
TraceAssert.NoErrors() | TraceAssert.no_errors() | TraceAssert.noErrors() | Fail if any step has error status |
TraceAssert.MaxSteps(n) | TraceAssert.max_steps(n) | TraceAssert.maxSteps(n) | Fail if total step count > n |
TraceAssert.MaxCost(n) | TraceAssert.max_cost(n) | TraceAssert.maxCost(n) | Fail if total cost_usd > n |
TraceAssert.MaxDuration(ms) | TraceAssert.max_duration(ms) | TraceAssert.maxDuration(ms) | Fail if total duration > ms |
TraceAssert.StepExists("name") | TraceAssert.step_exists("name") | TraceAssert.stepExists("name") | Fail if named step not found |
TraceAssert.OutputContains("s") | TraceAssert.output_contains("s") | TraceAssert.outputContains("s") | Fail if no step output contains s |
TraceAssert.ToolCalled("name") | TraceAssert.tool_called("name") | TraceAssert.toolCalled("name") | Fail if no tool step with that name |
C#
Python
Node
Scenarios
Run your agent against multiple inputs and post pass/fail to the dashboard.C#
Python
Node
ScenarioItem fields: name (string), input (any), trace_id (string) — provide either input (live LLM) or trace_id (mocked replay), not both. assertions is always an array (can be empty).
Pass items: null / omit items to fetch them from the dashboard by scenarioName.
Snapshots
Detect structural regressions by comparing step shape against a pinned golden trace.C#
Python
Node
SnapshotDiff fields (same names in all SDKs): has_changes (bool), added_steps (string[]), removed_steps (string[]), order_changes (string[]).
Naming convention cheat sheet
| Concept | C# | Python | Node |
|---|---|---|---|
| SDK class | VevalSdk | VevalSdk | VevalSdk |
| Test SDK | VevalTestSdk | VevalTestSdk | VevalTestSdk |
| Options | VevalOptions { ApiKey } | VevalOptions(api_key=) | { apiKey: } |
| Run agent | RunAsync | run_async | runAsync |
| Record step | TrackStepAsync | track_step_async | trackStepAsync |
| Trace metadata | SetMetadata | set_metadata | setMetadata |
| Step metadata | handle.SetMeta | handle.set_meta | handle.setMeta |
| Load replay | WithReplay | with_replay | withReplay |
| Run replay | ReplayAsync | replay_async | replayAsync |
| Run scenario | RunScenarioAsync | run_scenario_async | runScenarioAsync |
| Load snapshot | LoadSnapshotAsync | load_snapshot_async | loadSnapshotAsync |
| Compare snapshot | CompareSnapshotAsync | compare_snapshot_async | compareSnapshotAsync |
| Assertion factory | TraceAssert.NoErrors() | TraceAssert.no_errors() | TraceAssert.noErrors() |
| Context input | ctx.Input | ctx.input | ctx.input |
| Context trace ID | ctx.TraceId | ctx.trace_id | ctx.traceId |
| Last run status | testSdk.LastStatus | test_sdk.last_status | testSdk.lastStatus |