NEW RESEARCH: Your Sandbox Is Made of Glass

Read

Trinitite

PricingResearchBlogPodcasts

Glossary / Prompt Injection Defense

Definition

What is Prompt Injection Defense?

Prompt injection defense protects AI agents from hidden instructions that hijack them into unintended actions. Because a hijacked model will defend the attack, the Agent Action Guard scores the proposed action’s semantics independently of the agent’s reasoning — so a hijacked agent still can’t make a destructive tool call look safe.

A defense that trusts the agent’s reasoning is trusting the thing the attacker just took over; a static blocklist only knows yesterday’s attacks. The only check that holds judges the action and ignores the justification.

The Agent Action Guard scores the embedding of the tool call against a learned map of safe versus harmful actions, runs as a pre-call gate alongside the deterministic blocklist, is on by default, and fails open. Every embedding-based block is recorded and anchored for review.

See Prompt Injection Defense in action.

Run the free 1,000-log pre-audit and get a signed, reproducible report you can verify in a browser — no NDA.