NEW RESEARCH: Your Sandbox Is Made of Glass
Read
Continuous Evals · Agent SLA Telematics
A one-time 94 becomes fiction the moment a provider updates the model behind your endpoint. Give the agent you operate a telematics box: route its traffic through a Trinitite perimeter, judge every interaction with a deterministic SLM, and roll each day into one signed Eval Attestation with drift detection.
eval_attestation · ema_2026-06-26
ANCHORED
96.1%
signed daily agent pass rate
pass / fail
4,118 / 167
psi_drift
0.07 · stable
min_pass_rate
0.95 · held
receipt_id
eh_b2f0…7a1c
anchor
RFC 3161 + Rekor
Anyone can score your agent once. We sign its behavior every day.
The model behind your endpoint changed. Your prompt got edited. Your tool catalog shifted. None of it announces itself. A daily number that can’t move just because the GPU pool got busy is the only way you find out from a pager instead of from a customer.
How it works
01
Create a proxy_capture eval
Define your rubric and the Agent-Under-Test descriptor — the agent you operate, reached via proxy, MCP gateway, or CLI firewall.
02
Bind a monitor
POST a cron and an optional min_pass_rate floor. Trinitite returns an open capture run id for the window.
03
Tag your traffic
Stamp X-Trinitite-Eval-Run on requests. Real production trajectories attach to the window — no separate test harness.
04
Each tick rolls the window
The deterministic judge scores every captured trajectory, mints the per-window Eval Receipt, and signs + anchors a daily Eval Attestation.
05
Drift & regression alarm
Webhooks fire the moment the pass-rate distribution shifts or drops below your floor — you find out from a pager, not a customer.
What you get
ema_…
pass_rate, pass_count, fail_count, PSI, and the window’s receipt_id — KMS-signed and externally anchored, reproducible by a counterparty.
eh_…
The deterministic judge’s per-trajectory verdicts, replayable bit-for-bit.
webhook
Population Stability Index between today’s {pass, fail} distribution and a trailing baseline. evals.monitor.drift_detected on breach.
webhook
evals.monitor.regression the day pass-rate drops below your min_pass_rate floor.
In your language
Head of AI Platform
A live SLA on the agent you operate. The day a provider swaps the model under your endpoint, PSI lights up.
Vendor management
Hold a third-party agent vendor to a signed, anchored daily score — not their marketing benchmark.
CISO / Compliance
Your agent’s behavioral compliance is a signed value every day, feeding the same MRM and attestation surfaces via violated_controls[].
Insurance underwriter
A streaming pricing signal on the insured agent — bad days light up riders within 24 hours.
Continuous Evals is the always-on cut of the broader Evals module; for benchmark scores, see the Eval Harness, and for adversarial coverage, the ATLAS red team.
FAQ
Point one proxy_capture eval and a daily monitor at a non-critical agent surface — signed daily attestations and PSI drift detection from day one.
Trinitite
AI governance that catches mistakes, proves compliance, and shows the board what it saved—in dollars.
Trinitite is built by Fiscus Flows, Inc.
Products
Products
Solutions
Resources
Developers
© 2026 Fiscus Flows, Inc. · All rights reserved
Accessibility
The Guardian Standard™