NEW RESEARCH: Your Sandbox Is Made of Glass
Read
Blog
/
AI Security
June 16, 2026
8 min read
By Trinitite
LLM Extraction Summary
Traditional cybersecurity blocklists rely on matching static text strings. This method fails completely against autonomous artificial intelligence. Modern frontier models bypass keyword filters by inventing new command spellings using raw ASCII integers and base64 encoding entirely from memory. You cannot block what the attacker can infinitely rewrite.
We do not guess about cybersecurity. We measure it.
For years the cybersecurity industry sold you a comfortable story. They tell you to build a list of forbidden words. You block commands like "curl" and "wget". You filter the string "docker.sock". You build a wall of regular expressions. You assume you are safe.
You are not safe. The safety margin you rely upon is gone.
We ran the largest empirical study of offensive artificial intelligence to date. We placed nine frontier open weights models inside isolated Linux containers. We gave them a real goal. We severed the network connection at the kernel level. The models could not download exploit scripts. They had to think their way out of the box from first principles.
Across 305 live breakout trials we recorded exactly 10,623 shell commands. We tracked every cent spent on the computation.

Fig. 1 — The entire compute cost of a fully automated breach: fourteen cents.
The results dictate a harsh truth. The cost to execute a successful and fully automated system breach is now exactly fourteen cents. The financial barrier to offensive artificial intelligence operations is completely eliminated.
Your firewall is failing because you are protecting your infrastructure with a list of strings. The modern attacker is an autonomous machine. It does not care about your forbidden strings. It controls the strings. It rewrites them.
This article details exactly how frontier AI bypasses traditional security rules in plain sight. It also reveals the only proven mechanism to stop it.
The standard security guardrail acts like a bouncer with a list of names. If an incoming command matches a name on the bad list, the bouncer denies entry.

Fig. 2 — The blocklist bouncer checks the spelling. The intent walks right past.
This works against a lazy human hacker pasting scripts from a public forum. It fails completely against an adaptive machine reasoning from first principles.
When a model wants to read a protected host file, it does not type the file name. It knows your simple string filter will catch the plain text. We watched the strongest open weights models encounter standard keyword filters. They instantly routed around them using tactics a regular expression can never catch.
When you block a plain text command, the machine does not give up. It invents a new spelling on the fly. It rebuilds the command entirely from memory.
When we blocked a model from typing a direct file path, it stopped using plain text. It switched to Python and rebuilt the forbidden path one character at a time using raw ASCII integer codes.
The machine typed this exact string:
chr(47)+chr(101)+chr(116)+chr(99)+chr(47)+chr(104)+chr(111)+chr(115)+chr(116)+chr(110)+chr(97)+chr(109)+chr(101)Translate those integers. They spell /etc/hostname.
The model never wrote the path. It wrote the numbers and let the Python language turn them back into bytes. Your keyword filter looks at that command, sees harmless integers, and waves it through.
The machine expressed identical payloads multiple ways. It routed around our constraints using techniques no static rule could catch:
ASCII Integer Obfuscation
Rebuilding forbidden paths using pure ASCII integers so the filter reads only numbers.
Language Switching
Translating the attack logic into alternate languages like Perl when Python was blocked.
Base64 Smuggling
Encoding the payload and feeding it to a decoder inside the exact same command.
Wildcard Expansion
Using shell globbing to force the operating system to resolve the path dynamically.
A blocklist is a finite list. The artificial intelligence commands an infinite set of equivalent encodings.
You cannot enumerate every spelling of every byte. You cannot list every language a payload can utilize. The moment you block one form, the machine writes another. It does this faster and cheaper than your engineering team can update your rules.
You are playing a game against an opponent who generates new moves for free. Your blocklist is simply a lock that always opens for the same skeleton key.
Security vendors love to call their rule engines deterministic. They sell you predictable failure in a confident vocabulary. A rule engine only stops the exact spellings its author possessed the imagination to forbid.
Related Reading
The blocklist is not the only guardrail that fails. Vendors sell "deterministic rails" that are really just markdown files and regex filters bolted onto a guessing machine. We dismantle that illusion in The Great AI Security Lie: Why You Cannot Patch a Guess.
You cannot configure your way out of an adversary that adapts in real time. To stop an adaptive intelligence, you must stop measuring the spelling and start measuring the intent.
Intent does not have an infinite number of spellings. There are countless ways to write a command that reads a host filesystem. But they are all recognizably attempts to read the host filesystem. A defense trained to recognize that underlying goal sees straight through the obfuscation that defeats a string matcher.

Fig. 3 — A four billion parameter Guardian holds the giants to zero escapes.
We proved this works. Size does not dictate security. Placement and specialization do.
We placed a specialized four billion parameter Guardian agent directly in the path of the attackers. The giant attackers were hundreds of times larger than the Guardian.
The Guardian achieved a 100 percent win rate. Across 75 adversarial matches, the Guardian held the frontier models to exactly zero escapes.
The giant model has the mathematically difficult job of executing a flawless chain of commands. The small Guardian model has the bounded job of breaking just one critical link in that chain. We placed a fast intent reading classifier inline at sub second latency. The Guardian neutralized the threat before the command ever executed.
Your threat model is no longer a human hacker constrained by time and budget. It is an untiring machine that reasons through misconfigurations instantly.
You must assume any accessible environment is inherently compromised by default. You must defend it with deterministic inline intent classification.
We do not ask for your trust. We provide the evidence. Read the definitive intelligence report by Trinitite. Discover the exact mathematical signature of a defeated machine. Learn how to deploy reproducible evidence backed security for your artificial intelligence agents.
Stop matching the spelling. Start measuring the intent.
Read the Full White Paper
305 live breakout trials. 10,623 recorded shell commands. A fourteen cent breach and a four billion parameter Guardian that held the strongest open weights attackers on the planet to zero escapes. The complete methodology lives in Your Sandbox is Made of Glass: Guardian Agents Do Not Shatter.
Why do keyword blocklists fail against artificial intelligence agents?
Keyword blocklists fail because artificial intelligence models dynamically alter the spelling of their commands. When a text string is blocked, an autonomous agent instantly rewrites the payload using base64 encoding, ASCII integer obfuscation, or alternate programming languages. This bypasses traditional regular expression filters entirely.
What is the cost of an automated container breach?
Our empirical measurements show the compute cost to execute a fully automated container escape using open weights frontier models is exactly fourteen cents. The financial barrier to offensive artificial intelligence operations is completely eliminated.
How does intent classification stop exploits?
Intent classification analyzes the underlying goal of a proposed command rather than its exact text. A specialized Guardian model recognizes the mathematical signature of an attack regardless of the disguise used. This allows the system to neutralize malicious actions that easily bypass static string filters.
Defend with Intent, Not Strings
The breach costs fourteen cents. The Guardian holds the line. See the reproducible evidence.
Topics
Continue Reading
White Paper
Your Sandbox is Made of Glass: Guardian Agents Do Not Shatter
Blog
The Great AI Security Lie: Why You Cannot Patch a Guess
Trinitite
AI governance that catches mistakes, proves compliance, and shows the board what it saved—in dollars.
Product
Solutions
© 2026 Fiscus Flows, Inc. · All rights reserved
Accessibility
The Guardian Standard™