Skip to main content

Recipe evidence loop

The recipe loop turns agentic work from “the diff looks plausible” into “the behavior was exercised and the proof is reviewable.”

Loop

Why this matters

Without recipes, agents repeatedly explore the same UI and humans still need to guess whether validation was meaningful.

With recipes, validation becomes a reusable asset:

a bug reproduction recipe can prove the fix;
an acceptance-criteria recipe can accompany a PR;
a regression recipe can run later against adjacent changes;
evidence can be attached to review instead of described in prose.

Evidence package

A compatible runner should produce a typed artifact package containing at least:

summary.json — high-level run result;
trace.json — ordered action trace;
artifact-manifest.json — typed index of screenshots, videos, logs, and other outputs;
resolved recipe/workflow copy when practical.

Loop
Why this matters
Evidence package