What is Prompt Injection Defense?

Prompt injection defense protects AI agents from hidden instructions that hijack them into unintended actions. Because a hijacked model will defend the attack, the Agent Action Guard scores the proposed action’s semantics independently of the agent’s reasoning — so a hijacked agent still can’t make a destructive tool call look safe.

A defense that trusts the agent’s reasoning is trusting the thing the attacker just took over; a static blocklist only knows yesterday’s attacks. The only check that holds judges the action and ignores the justification.

The Agent Action Guard scores the embedding of the tool call against a learned map of safe versus harmful actions, runs as a pre-call gate alongside the deterministic blocklist, is on by default, and fails open. Every embedding-based block is recorded and anchored for review.

Related terms

AI Guardrails

What is AI Guardrails? →

MCP Governance

What is MCP Governance? →

NHI Governance

What is NHI Governance? →