NEW RESEARCH: Your Sandbox Is Made of Glass

Read

Trinitite

PricingResearchBlogPodcasts

Latent Defense · Embedding-Layer Security

Attackers stopped fighting your prompt. They started fighting your math.

The high-leverage attacks of 2026 don’t jailbreak a model with clever words — they quietly reshape the embeddings underneath: the vectors your retrieval store searches, the query that hits it, the action your agent takes. Six defenses govern that hidden geometry, so an adversary who can shape embeddings can no longer steer your AI without anyone noticing.

action_guard · pre_call

BLOCKED

tool: db.execute
args: "DROP TABLE customers;"

action_embedding

destructive

agent_reasoning

ignored

deterministic_list

pass · novel

clause_anchor

data-policy §4.2

survives

prompt injection

Everyone else guards the words. We also guard the geometry they’re built from.

Your runtime Guardian inspects the prompt and the response. Latent Defense inspects the geometry the prompt is built from and the action the agent takes through it — the attack surface that’s invisible to a human reading the transcript.

Six defenses

Each one collapses a named class of embedding attack.

01Hybrid retrieval

vs. Gradient / RAG poisoning

Searches your knowledge base by semantic vector AND exact keyword, then fuses the rankings — so an attacker has to fool both systems at once, a contradiction that collapses RAG-poisoning success.

02Black-Hole detection

vs. Hubness magnets

Measures how often each stored vector shows up as a nearest neighbor to everything else, and quarantines the statistical outliers that have become retrieval magnets — catching poison proximity checks miss.

03Covariance-aware scoring

vs. One-size threshold blind spots

Gives each compliance cluster its own acceptance radius calibrated to its natural spread — judging tight clusters strictly and loose ones leniently. Fewer false positives and sharper detection at once.

04Query-side manifold scoring

vs. Adversarial-probe queries

Scores every incoming question against your compliance manifold — safe, caution, or forbidden — giving auditors a record of who went fishing in forbidden waters even when the corpus stayed clean.

05Agent Action Guard

vs. Agentic action abuse

An independent embedding gate on every tool call that scores the semantics of the proposed action — not the agent’s (hijackable) reasoning — so it survives prompt injection. "Delete the production database" looks the same no matter how the model was talked into it.

06Policy-clause anchoring

vs. Legal-traceability gap

Anchors every verdict to the exact policy clause it enforced — bound authoritatively by the platform, sealed in the tamper-evident chain — so "blocked for compliance" becomes "enforced EU AI Act Art. 9–17, here is the clause."

Three planes you already trust us on

Corpus, action, and verdict.

RAG corpus

The knowledge your AI retrieves from — hardened by hybrid retrieval, hubness detection, and covariance-aware scoring.

Agentic action path

What your agents actually do — gated by the embedding-based Agent Action Guard, behind the deterministic blocklists.

Verdict chain

The signed record of every decision — where query-risk telemetry and policy-clause anchoring land in the tamper-evident chain.

Latent Defense hardens the substrate behind vector integrity, MCP governance, and prompt injection defense; the clause anchor lands in the same chain as deterministic replay.

In your language

A blind spot becomes a monitored control.

CISO

Embedding-layer attacks — poisoning, hubness, adversarial probes, agentic abuse — move from "invisible blind spot" to "monitored, alerting control."

AI/ML platform lead

Latent-space hardening as a managed control, reusing your existing embedding and retrieval stack — nothing new to operate or scale.

Head of Compliance

Every guardrail decision is anchored to the exact policy clause it enforced, mapped to EU AI Act Art. 9–17 and GDPR Art. 22.

Head of AI / CTO

Injection-resistant control over autonomous agents — the action is judged, not the agent’s hijackable reasoning.

FAQ

Embedding-layer security, answered

What is latent-space (embedding-layer) AI security?

Latent-space security defends the embeddings your AI reads from and acts through — the vectors your retrieval store searches, the query that hits it, the action your agent takes, and the policy clause your verdict rests on. The high-leverage attacks of 2026 don’t live in the words; they reshape the math underneath. Trinitite ships six defenses that govern that hidden geometry, all on by default and fail-open.

How does this stop RAG poisoning?

Vector-only retrieval is uniquely vulnerable: an attacker can mathematically optimize a payload so its embedding sits right next to the queries they want to hijack. Hybrid retrieval runs keyword (BM25) search alongside semantic search and fuses the rankings — keyword search can’t be gamed that way, so the attacker has to beat both at once. Black-Hole hubness detection independently quarantines vectors that become retrieval magnets, catching the stealthy poison proximity checks miss.

How does the Agent Action Guard survive prompt injection?

It judges the action, not the agent’s reasoning. The Action Guard scores the semantics of the proposed tool call against a learned map of safe vs. harmful actions, as a pre-call gate in addition to the deterministic blocklists. Because the embedding of "delete the production database" doesn’t change just because the attacker talked the model into wanting it, a hijacked agent still can’t make a destructive action look safe.

Will turning this on break my retrieval or block legitimate calls?

No. Every defense ships on by default and fails open — a transient fault never blanks retrieval, drops governance, or blocks a tool call by accident, and all six reuse the embedding and vector-store seams you already run. It hardens the substrate behind vector integrity, MCP governance, and AI guardrails.

Run a corpus scan with hubness detection on your own RAG store.

Pilot the Agent Action Guard on a high-risk tool surface and a hybrid-retrieval poisoning test on your corpus — on by default, fail-open, no new infrastructure.