NEW RESEARCH: Your Sandbox Is Made of Glass

Read

Trinitite

PricingResearchBlogPodcasts

Glossary / Continuous Evals

Definition

What is Continuous Evals?

Agent SLA telematics

Continuous evals run agent evaluations on a schedule against live production traffic so a one-time score never goes stale. Trinitite binds an Eval Monitor with a cron and a pass-rate floor, judges tagged traffic daily, and rolls each window into a signed, externally anchored Eval Attestation with PSI drift detection and regression webhooks.

The moment a provider updates the model behind your endpoint, a score from last quarter is fiction. Continuous evals are telematics for the agent you operate: tag production traffic with an X-Trinitite-Eval-Run header and each window folds into one signed daily Eval Attestation carrying pass_rate, pass/fail counts, PSI, and the window receipt id.

Population Stability Index drift fires an evals.monitor.drift_detected webhook the day the model, prompt, or traffic mix shifts; a regression webhook fires when pass rate drops below your floor. It is the outward-facing inverse of Continuous Assurance — that surface streams Trinitite’s own Guardian; this one streams your agent and scores it with our judge.

See Continuous Evals in action.

Run the free 1,000-log pre-audit and get a signed, reproducible report you can verify in a browser — no NDA.