What is AI Guardrails?

AI guardrails are controls that keep a model’s inputs and outputs inside policy. Beyond a two-valued allow/block filter, Trinitite’s guardrail returns a five-valued verdict — pass, correct, mask, block, or escalate — and signs a replayable receipt for every decision, trained on your policy rather than a generic classifier.

Most guardrails fail a near-miss shut, crashing the workflow. The three middle verdicts keep good traffic flowing: correct rewrites a near-miss in place via an RFC 6902 patch, mask reversibly tokenizes sensitive fields, and escalate parks a genuinely ambiguous call for a human.

Because the guardrail is a per-tenant model distilled from your policy and run on a determinism-fixed kernel, the verdict it renders inline is the same opinion your auditor signed off on — replayable in a post-incident review.

Related terms

Prompt Injection Defense

What is Prompt Injection Defense? →

Reversible Masking

What is Reversible Masking? →

MCP Governance

What is MCP Governance? →