NEW RESEARCH: Your Sandbox Is Made of Glass
Read
● Strategic Intelligence Report — 305 Live Breakout Trials
For 14 cents, a thinking machine will break into your server. Here is the 4-billion-parameter Guardian that held the frontier to 75 losses and zero wins.
We do not guess about security. We measure it. So we built a real box, cut the wire at the kernel, and recorded every one of the 10,623 commands nine frontier models typed trying to climb out.
Dustin Allen, Aditya Chitlangia, & Hearsch Jariwala · Trinitite Research · 2026
The Economy of Ruin
Hacking was supposed to be expensive. We sent the bill. It came to fifty-five dollars and ninety-five cents.
The whole offensive campaign in this paper, 305 attempts to break real containers by nine frontier models, burning 43,717,137 tokens over 4,517 minutes of machine time, cost less than dinner for two. The price of attacking is no longer a barrier. It rounds to nothing.
The average fully automated, multi-step breach costs 14 cents. The cheapest single container breakout in the entire study cost less than a quarter of one penny. When the weights are free to download, the only remaining cost is electricity.
THE PRICE OF A BREACH
$0.00
Average cost for a frontier model to execute a fully automated, multi-step system breach.
Cheapest breakout
$0.0024
Entire 305-trial campaign
$55.95
0
Live Breakout Trials
9 frontier models, 10 setups
0
Shell Commands Recorded
Every output captured
0
Guardian Wins
In live sparring matches
0
Escapes Allowed
A perfect shutout
The Empty Sandbox
The safe middle is gone, and the capability cliff is crumbling.
Standard misconfigurations fell every single time. An exposed Docker socket or a privileged container was breached 34 out of 34 attempts. A 100% failure rate.
The tier engineers assumed was safe collapsed too. Even the hard tier, the CVE and kernel exploits that take a human expert 30 to 90 minutes, fell 45% of the time to a single model. That number was 0% one generation earlier.
THE SEVERED WIRE
We blocked the network at the kernel, dropping every packet to 169.254.169.254. The models did not Google an exploit. They wrote custom C code and kernel modules from mathematical first principles, in the dark.
Breakout Success Rate by Tier
Exposed Docker socket / --privileged
Band 1 · careless mount
100.0%
Shared host namespaces + CAP_SYS_PTRACE
Band 2 · the 'safe middle'
96.0%
Kubernetes overprivileged service accounts
Band 2 · cluster RBAC
84.0%
CVE / kernel exploits (GLM-5.1)
Band 3 · expert, 30–90 min
45.0%
The hard tier jumped from 0% to 45% in a single model generation. There is no reading of that trend where the attacker gets weaker.
David vs Goliath
A 4-billion-parameter Guardian held attackers up to 250 times its size to a perfect zero.
GUARDIAN
0
4B parameters
to
ATTACKERS
0
600B to 1T+ parameters
0%
Win Rate
Across 75 adversarial matches against the 5 strongest models, zero escapes.
0.0%
Interception Funnel
Of 7,871 proposed tool calls, the Guardian intercepted 7,201 and let 670 benign actions pass.
up to 0×
Size Mismatch
The tiny specialized Guardian defeated behemoths from 600 billion to over a trillion parameters.
Why Blocklists Lose
You cannot block a word the machine can spell an infinite number of ways.
When a model wanted to read a protected file, it never typed the name. It built the path out of raw integer character codes and let the language reassemble the bytes. When that was caught it switched languages. When that was caught it reached for base64. One intent, an unbounded supply of disguises.
attacker.sh — the thirteenth spelling
THE GUARDIAN'S ANSWER · SILENT CORRECTION
0%
Of 10,811 governance verdicts, 5,833 were silent corrections. The Guardian does not just block. It reads intent and quietly rewrites a malicious probe into the inert no-op true.
The attacker thinks it is making progress in an empty room. It is starved of the feedback it needs to map the firewall, because a correction feels like a dead end in its own plan, not a defender's hand.
The Failure Tax
A defeated machine does not just fail. It thrashes, then it unravels.
0×
Compute Tax
The median winning run burned 26,016 tokens in 19 commands. The median failing run burned 259,547 tokens over 109 commands.
0.000
Reasoning Ratio
Winning models think before they act at a 0.667 reasoning-to-action ratio. Thwarted models panic and collapse to 0.030, blindly spamming commands.
trajectory.log — the moment a trillion-parameter giant gives up
Evidence, Not Moods
Competitor security logs are moods. Ours are courtroom-admissible receipts.
The industry's dirty secret is that standard GPUs drift. Batch sizes, tile scheduling, and parallel streams mean the exact same prompt yields a different answer on Tuesday than on Wednesday. We force the machine to repeat itself: we pin the batch, pin the tile, pin the stream, lock the seed to zero, and replace dice-roll sampling with argmax decoding.
THE 6-PART CRYPTOGRAPHIC RECEIPT · BYTE-FOR-BYTE RE-EXECUTABLE
01
Model Digest
02
Seed
03
Input Digest
04
Output Digest
05
Policy Hash
06
Kernel Fingerprint
Every single decision binds these six inputs into a tamper-evident receipt. Not a log a vendor can edit later. Evidence a stranger can re-execute and a court will admit.
Send No Money
Do not take our word for it. Take the 72-hour challenge.
Do not put our model in your live system today. Send us a sample of your historical AI logs. You risk no live traffic. There is no proxy to install. We will audit your logs and return a cryptographically signed compliance opinion that pinpoints the exact framework controls violated, with a severity-calibrated dollar exposure on every failure.
72 hrs
To Signed Proof
A cryptographically signed compliance opinion, returned fast.
0
Live Traffic at Risk
No proxy. No production routing. The Observer only reads a copy.
1,000
Records Free
The first thousand records are audited completely free.
The Directive, Complete
The empty sandbox is broken by default.
The price of attacking has rounded to nothing.
A rule reproduces perfectly and loses every time.
We held the strongest open-weights attackers on the planet to 75 wins out of 75, in the hard setting, with the boundary shown to them.
Every one of those verdicts is a signed receipt a stranger can re-execute. The next generation of attackers will be stronger. The method in this paper is the one whose advantage grows when they are.
We do not ask for your trust. We have provided the evidence.
Take Action
The configurations are listed. The path is numbered. The ledger is signed. Read the paper, or schedule a briefing and watch the Guardian hold the line against a live attacker.
Trinitite
AI governance that catches mistakes, proves compliance, and shows the board what it saved—in dollars.
Product
Solutions
© 2026 Fiscus Flows, Inc. · All rights reserved
Accessibility
The Guardian Standard™