What is Continuous Evals?

Agent SLA telematics

Continuous evals run agent evaluations on a schedule against live production traffic so a one-time score never goes stale. Trinitite binds an Eval Monitor with a cron and a pass-rate floor, judges tagged traffic daily, and rolls each window into a signed, externally anchored Eval Attestation with PSI drift detection and regression webhooks.

The moment a provider updates the model behind your endpoint, a score from last quarter is fiction. Continuous evals are telematics for the agent you operate: tag production traffic with an X-Trinitite-Eval-Run header and each window folds into one signed daily Eval Attestation carrying pass_rate, pass/fail counts, PSI, and the window receipt id.

Population Stability Index drift fires an evals.monitor.drift_detected webhook the day the model, prompt, or traffic mix shifts; a regression webhook fires when pass rate drops below your floor. It is the outward-facing inverse of Continuous Assurance — that surface streams Trinitite’s own Guardian; this one streams your agent and scores it with our judge.

Related terms

Agent Evals

What is Agent Evals? →

Continuous Assurance

What is Continuous Assurance? →

Eval Harness

What is Eval Harness? →