Skip to main content

What is a scenario?

A scenario is a named set of test cases for your agent. Each run posts pass/fail results to the dashboard so you can track quality over time. Scenarios have two item types:
TypeHow it worksCost
SyntheticLive LLM, fresh input you defineReal API cost
Trace-backedMocked LLM, replays a recorded production traceZero

RunScenarioAsync

var result = await veval.RunScenarioAsync(
    scenarioName:       "my-scenario",
    agent:              service.ExecuteAsync,
    scenarioAssertions: [
        TraceAssert.NoErrors(),
        TraceAssert.MaxCost(0.10m),
        TraceAssert.StepExists("classify"),
    ],
    items: [
        new ScenarioItem { Name = "question 1", Input = "What is prompt caching?" },
        new ScenarioItem { Name = "question 2", Input = "What is tool use?" },
    ]
);
ParameterDescription
scenarioNameIdentifies the scenario in the dashboard
agentYour agent — same signature as RunAsync
scenarioAssertionsAssertions that apply to every item
itemsList of ScenarioItem — inline or fetched from dashboard

Synthetic items

Use these for new inputs you want to test with a live LLM.
new ScenarioItem
{
    Name  = "prompt caching question",
    Input = "Explain prompt caching in one sentence.",
}

Trace-backed items

Use these to replay a recorded production trace with mocked LLM responses — no API cost.
new ScenarioItem
{
    Name    = "production trace replay",
    TraceId = "tr_4ec4e79c5d03...",
    Assertions = [TraceAssert.MaxCost(0.05m)],  // per-item assertion
}
The trace must have recorded steps. If it has none, Veval throws rather than silently calling the live LLM.

Per-item assertions

Each ScenarioItem can carry its own assertions on top of the scenario-level ones:
new ScenarioItem
{
    Name       = "expensive question",
    Input      = "Write me a novel.",
    Assertions = [TraceAssert.MaxCost(0.50m)],  // only for this item
}

Fetching items from the dashboard

If items is null, Veval fetches items for the scenario from the API automatically. This lets you manage test cases in the dashboard without redeploying code.
// items: null → fetched from dashboard by scenarioName
var result = await veval.RunScenarioAsync(
    scenarioName:       "my-scenario",
    agent:              service.ExecuteAsync,
    scenarioAssertions: [TraceAssert.NoErrors()]
);

ScenarioRunResult

Console.WriteLine($"{result.PassCount}/{result.Results.Count} passed");

foreach (var item in result.Results)
{
    Console.WriteLine($"{(item.Passed ? "✓" : "✗")} {item.Item.Name}");
    foreach (var failure in item.Failures)
        Console.WriteLine($"  → {failure}");
}
MemberDescription
PassedTrue if all items passed
PassCountNumber of passing items
FailCountNumber of failing items
ResultsList of ItemRunResult