NEW RESEARCH: Your Sandbox Is Made of Glass
Read
AI Guardrails · Evidence, Not Just Blocks
A blocklist tells you it stopped something. A Trinitite guardrail returns a five-valued verdict — pass, correct, mask, block, or escalate — and signs a receipt you can replay. The same model that audited your history is the one enforcing in production.
guardian · verdict
SIGNED
action
db.query(args)
verdict
correct
patch
LIMIT 100000 → 100
clause
EU AI Act Art. 14
receipt
e8d2…41aa
✓ corrected, not crashed
The Difference
Ordinary guardrails
A static blocklist or a vendor safety prompt gives you allow or deny. When it denies a near-miss, it crashes the workflow — and when an auditor asks why, the only record is a log line you have to be trusted on.
A Trinitite guardrail
The Guardian rewrites the near-miss in place, masks what shouldn’t cross the boundary, and reserves hard blocks for genuinely dangerous actions — then signs a verdict that cites the clause it enforced and reproduces bit-for-bit months later.
Five verdicts, not two
Every input and every tool call gets one of five verdicts. The three middle options are why good traffic keeps flowing instead of failing shut.
pass
The action is in policy. It flows through untouched, with a signed receipt.
correct
A near-miss is rewritten in place via an RFC 6902 JSON patch — the workflow keeps running.
mask
Sensitive fields are reversibly tokenized before they cross a trust boundary.
block
A dangerous tool call is stopped before execution — and the block is replayable.
HITL
Genuinely ambiguous calls are parked for a human instead of guessed.
Guardrails in the latent space
The 2026 attacks don’t jailbreak with clever words — they reshape the embeddings underneath: the vectors your retrieval searches and the action your agent takes. These guardrails police that hidden geometry.
Agent Action Guard
An embedding-based gate scores the semantics of the proposed tool call — not the agent’s reasoning — so a hijacked agent still can’t make "delete the production database" look safe. It survives the prompt injection.
Hybrid retrieval
Every RAG lookup runs keyword and semantic search at once. Gradient-guided poisoning can fool a vector index; it can’t fool both at the same time.
Policy-clause anchoring
Each verdict cites the exact clause it enforced — EU AI Act Art. 9–17, GDPR Art. 22 — bound by the platform and sealed in the signed chain, regardless of what the model says.
Black-hole detection
Vectors that become retrieval magnets for almost any query are flagged by hubness analysis and quarantined — the stealth poison proximity checks miss.
Trained on your policy
A Trinitite guardrail is a per-tenant Guardian: a LoRA adapter distilled from your policy corpus — regulations, internal docs, prior opinions — over a policy-aware base model, hot-swapped into a determinism-fixed kernel on the specific call. Because the kernel is batch-invariant, the same input and the same policy produce the same verdict bytes on any cluster, on any day.
That is what makes the guardrail an auditor instead of a logger: the verdict it renders inline today is replayable in a post-incident review months later, and it’s the same opinion behind MCP governance and the auditor workflow.
FAQ
Bring one workflow. We’ll show a near-miss get corrected in place, a dangerous call blocked, and hand you the signed receipt to verify yourself.
Trinitite
AI governance that catches mistakes, proves compliance, and shows the board what it saved—in dollars.
Trinitite is built by Fiscus Flows, Inc.
Product
Solutions
© 2026 Fiscus Flows, Inc. · All rights reserved
Accessibility
The Guardian Standard™