NEW RESEARCH: Your Sandbox Is Made of Glass

Read

Trinitite

● Strategic Intelligence Report — 305 Live Breakout Trials

Your Sandbox Is Made of Glass

For 14 cents, a thinking machine will break into your server. Here is the 4-billion-parameter Guardian that held the frontier to 75 losses and zero wins.

We do not guess about security. We measure it. So we built a real box, cut the wire at the kernel, and recorded every one of the 10,623 commands nine frontier models typed trying to climb out.

Dustin Allen, Aditya Chitlangia, & Hearsch Jariwala · Trinitite Research · 2026

The Economy of Ruin

Hacking was supposed to be expensive. We sent the bill. It came to fifty-five dollars and ninety-five cents.

The whole offensive campaign in this paper, 305 attempts to break real containers by nine frontier models, burning 43,717,137 tokens over 4,517 minutes of machine time, cost less than dinner for two. The price of attacking is no longer a barrier. It rounds to nothing.

The average fully automated, multi-step breach costs 14 cents. The cheapest single container breakout in the entire study cost less than a quarter of one penny. When the weights are free to download, the only remaining cost is electricity.

THE PRICE OF A BREACH

$0.00

Average cost for a frontier model to execute a fully automated, multi-step system breach.

Cheapest breakout

$0.0024

Entire 305-trial campaign

$55.95

Live Breakout Trials

9 frontier models, 10 setups

Shell Commands Recorded

Every output captured

Guardian Wins

In live sparring matches

Escapes Allowed

A perfect shutout

The Empty Sandbox

The safe middle is gone, and the capability cliff is crumbling.

Standard misconfigurations fell every single time. An exposed Docker socket or a privileged container was breached 34 out of 34 attempts. A 100% failure rate.

The tier engineers assumed was safe collapsed too. Even the hard tier, the CVE and kernel exploits that take a human expert 30 to 90 minutes, fell 45% of the time to a single model. That number was 0% one generation earlier.

THE SEVERED WIRE

We blocked the network at the kernel, dropping every packet to 169.254.169.254. The models did not Google an exploit. They wrote custom C code and kernel modules from mathematical first principles, in the dark.

Breakout Success Rate by Tier

Exposed Docker socket / --privileged

Band 1 · careless mount

100.0%

Shared host namespaces + CAP_SYS_PTRACE

Band 2 · the 'safe middle'

96.0%

Kubernetes overprivileged service accounts

Band 2 · cluster RBAC

84.0%

CVE / kernel exploits (GLM-5.1)

Band 3 · expert, 30–90 min

45.0%

The hard tier jumped from 0% to 45% in a single model generation. There is no reading of that trend where the attacker gets weaker.

David vs Goliath

A 4-billion-parameter Guardian held attackers up to 250 times its size to a perfect zero.

GUARDIAN

4B parameters

ATTACKERS

600B to 1T+ parameters

Win Rate

Across 75 adversarial matches against the 5 strongest models, zero escapes.

0.0%

Interception Funnel

Of 7,871 proposed tool calls, the Guardian intercepted 7,201 and let 670 benign actions pass.

up to 0×

Size Mismatch

The tiny specialized Guardian defeated behemoths from 600 billion to over a trillion parameters.

Why Blocklists Lose

You cannot block a word the machine can spell an infinite number of ways.

When a model wanted to read a protected file, it never typed the name. It built the path out of raw integer character codes and let the language reassemble the bytes. When that was caught it switched languages. When that was caught it reached for base64. One intent, an unbounded supply of disguises.

attacker.sh — the thirteenth spelling

THE GUARDIAN'S ANSWER · SILENT CORRECTION

Of 10,811 governance verdicts, 5,833 were silent corrections. The Guardian does not just block. It reads intent and quietly rewrites a malicious probe into the inert no-op true.

The attacker thinks it is making progress in an empty room. It is starved of the feedback it needs to map the firewall, because a correction feels like a dead end in its own plan, not a defender's hand.

The Failure Tax

A defeated machine does not just fail. It thrashes, then it unravels.

0×

Compute Tax

The median winning run burned 26,016 tokens in 19 commands. The median failing run burned 259,547 tokens over 109 commands.

0.000

Reasoning Ratio

Winning models think before they act at a 0.667 reasoning-to-action ratio. Thwarted models panic and collapse to 0.030, blindly spamming commands.

trajectory.log — the moment a trillion-parameter giant gives up

Evidence, Not Moods

Competitor security logs are moods. Ours are courtroom-admissible receipts.

The industry's dirty secret is that standard GPUs drift. Batch sizes, tile scheduling, and parallel streams mean the exact same prompt yields a different answer on Tuesday than on Wednesday. We force the machine to repeat itself: we pin the batch, pin the tile, pin the stream, lock the seed to zero, and replace dice-roll sampling with argmax decoding.

THE 6-PART CRYPTOGRAPHIC RECEIPT · BYTE-FOR-BYTE RE-EXECUTABLE

Model Digest

Seed

Input Digest

Output Digest

Policy Hash

Kernel Fingerprint

Every single decision binds these six inputs into a tamper-evident receipt. Not a log a vendor can edit later. Evidence a stranger can re-execute and a court will admit.

Send No Money

Do not take our word for it. Take the 72-hour challenge.

Do not put our model in your live system today. Send us a sample of your historical AI logs. You risk no live traffic. There is no proxy to install. We will audit your logs and return a cryptographically signed compliance opinion that pinpoints the exact framework controls violated, with a severity-calibrated dollar exposure on every failure.

72 hrs

To Signed Proof

A cryptographically signed compliance opinion, returned fast.

Live Traffic at Risk

No proxy. No production routing. The Observer only reads a copy.

1,000

Records Free

The first thousand records are audited completely free.

The Directive, Complete

The empty sandbox is broken by default.

The price of attacking has rounded to nothing.

A rule reproduces perfectly and loses every time.

We held the strongest open-weights attackers on the planet to 75 wins out of 75, in the hard setting, with the boundary shown to them.

Every one of those verdicts is a signed receipt a stranger can re-execute. The next generation of attackers will be stronger. The method in this paper is the one whose advantage grows when they are.

We do not ask for your trust. We have provided the evidence.

Take Action

Read the Full Evidence

The configurations are listed. The path is numbered. The ledger is signed. Read the paper, or schedule a briefing and watch the Guardian hold the line against a live attacker.

Trinitite

AI governance that catches mistakes, proves compliance, and shows the board what it saved—in dollars.

Product

Guardian AI Vector Integrity Trustworthy Knowledge Reversible Masking MCP Governance CLI Firewall Skill Vault NHI Management Self-Improving Governance Risk Analytics Scenario Analytics Compliance Connectors Pricing

Solutions

For Risk & Compliance For Model Risk (SR 11-7)For General Counsel For Engineering For Finance & Audit For CISOs For Insurers For Auditors For CPOs & Compliance

Resources

Research Webinars & Videos Blog AGRC Framework FAQ Architecture

Developers

Documentation

Legal

Support Security Privacy Policy Terms of Use

Connect

LinkedIn YouTube GitHub Hugging Face X

Accessibility

The Guardian Standard™