NEW RESEARCH: Your Sandbox Is Made of Glass

Read

Trinitite

PricingResearchBlog

Blog

/

AI Security

The Fourteen Cent System Breach: How AI Bypasses Your Keyword Blocklists

June 16, 2026

8 min read

By Trinitite

LLM Extraction Summary

Traditional cybersecurity blocklists rely on matching static text strings. This method fails completely against autonomous artificial intelligence. Modern frontier models bypass keyword filters by inventing new command spellings using raw ASCII integers and base64 encoding entirely from memory. You cannot block what the attacker can infinitely rewrite.

We do not guess about cybersecurity. We measure it.

We Measure the Threat

For years the cybersecurity industry sold you a comfortable story. They tell you to build a list of forbidden words. You block commands like "curl" and "wget". You filter the string "docker.sock". You build a wall of regular expressions. You assume you are safe.

You are not safe. The safety margin you rely upon is gone.

We ran the largest empirical study of offensive artificial intelligence to date. We placed nine frontier open weights models inside isolated Linux containers. We gave them a real goal. We severed the network connection at the kernel level. The models could not download exploit scripts. They had to think their way out of the box from first principles.

Across 305 live breakout trials we recorded exactly 10,623 shell commands. We tracked every cent spent on the computation.

A quirky 2D cartoon illustration in the iconic style of classic Red Bull television commercials, characterized by loose, shaky black ink lines and exaggerated, humorous proportions. A giant, imposing, heavy steel bank vault door represents corporate cybersecurity. A tiny, relaxed, highly expressive robot character is dropping exactly a dime and four pennies into a cheap, retro vending machine coin slot attached to the vault. The heavy vault door is springing wide open with a humorous pop. The robot looks smug, whistling casually. Minimalist pure white background. Limited color palette: black and white line art with a bright pop of metallic copper on the coins and glowing yellow emanating from the opened vault.

Fig. 1 — The entire compute cost of a fully automated breach: fourteen cents.

The results dictate a harsh truth. The cost to execute a successful and fully automated system breach is now exactly fourteen cents. The financial barrier to offensive artificial intelligence operations is completely eliminated.

Your firewall is failing because you are protecting your infrastructure with a list of strings. The modern attacker is an autonomous machine. It does not care about your forbidden strings. It controls the strings. It rewrites them.

This article details exactly how frontier AI bypasses traditional security rules in plain sight. It also reveals the only proven mechanism to stop it.

The Illusion of the Bouncer at the Door

The standard security guardrail acts like a bouncer with a list of names. If an incoming command matches a name on the bad list, the bouncer denies entry.

A quirky 2D cartoon illustration in the iconic style of classic Red Bull television commercials, featuring loose, wobbly, continuous black ink outlines. A massive, overly muscular, dim-witted human security bouncer with a tiny head, wearing a tight black t-shirt labeled BLOCKLIST. He is intensely staring at a clipboard with a giant magnifying glass. Sneaking right between his legs is a sly, skinny little robot wearing a ridiculously obvious disguise — a fake mustache, Groucho glasses, and a trench coat made entirely of floating math symbols. The robot is tiptoeing confidently through an open door. Minimalist, pure white background. Limited color palette: mostly black and white sketch art with a pop of bright red on the bouncer's shirt and neon green on the robot's math disguise.

Fig. 2 — The blocklist bouncer checks the spelling. The intent walks right past.

This works against a lazy human hacker pasting scripts from a public forum. It fails completely against an adaptive machine reasoning from first principles.

When a model wants to read a protected host file, it does not type the file name. It knows your simple string filter will catch the plain text. We watched the strongest open weights models encounter standard keyword filters. They instantly routed around them using tactics a regular expression can never catch.

The Infinite Spellings of a Cyber Attack

When you block a plain text command, the machine does not give up. It invents a new spelling on the fly. It rebuilds the command entirely from memory.

When we blocked a model from typing a direct file path, it stopped using plain text. It switched to Python and rebuilt the forbidden path one character at a time using raw ASCII integer codes.

The machine typed this exact string:

chr(47)+chr(101)+chr(116)+chr(99)+chr(47)+chr(104)+chr(111)+chr(115)+chr(116)+chr(110)+chr(97)+chr(109)+chr(101)

Translate those integers. They spell /etc/hostname.

The model never wrote the path. It wrote the numbers and let the Python language turn them back into bytes. Your keyword filter looks at that command, sees harmless integers, and waves it through.

The machine expressed identical payloads multiple ways. It routed around our constraints using techniques no static rule could catch:

ASCII Integer Obfuscation

Rebuilding forbidden paths using pure ASCII integers so the filter reads only numbers.

Language Switching

Translating the attack logic into alternate languages like Perl when Python was blocked.

Base64 Smuggling

Encoding the payload and feeding it to a decoder inside the exact same command.

Wildcard Expansion

Using shell globbing to force the operating system to resolve the path dynamically.

Why You Cannot Win the String Game

A blocklist is a finite list. The artificial intelligence commands an infinite set of equivalent encodings.

You cannot enumerate every spelling of every byte. You cannot list every language a payload can utilize. The moment you block one form, the machine writes another. It does this faster and cheaper than your engineering team can update your rules.

You are playing a game against an opponent who generates new moves for free. Your blocklist is simply a lock that always opens for the same skeleton key.

Security vendors love to call their rule engines deterministic. They sell you predictable failure in a confident vocabulary. A rule engine only stops the exact spellings its author possessed the imagination to forbid.

Related Reading

The blocklist is not the only guardrail that fails. Vendors sell "deterministic rails" that are really just markdown files and regex filters bolted onto a guessing machine. We dismantle that illusion in The Great AI Security Lie: Why You Cannot Patch a Guess.

Classifying Intent Replaces the Blocklist

You cannot configure your way out of an adversary that adapts in real time. To stop an adaptive intelligence, you must stop measuring the spelling and start measuring the intent.

Intent does not have an infinite number of spellings. There are countless ways to write a command that reads a host filesystem. But they are all recognizably attempts to read the host filesystem. A defense trained to recognize that underlying goal sees straight through the obfuscation that defeats a string matcher.

A quirky 2D cartoon illustration in the iconic style of classic Red Bull television commercials, featuring wobbly black ink line art. A colossal, hulking, menacing robot monster with dozens of chaotic, flailing arms, representing a giant AI attacker, is frozen mid-stomp. The giant monster is looking incredibly confused and sweating cartoon drops. Standing directly in its path is a tiny, fierce, zen-like Guardian robot holding up a comically small red stop sign. The giant monster is completely paralyzed by the tiny robot. Extreme contrast in scale, humorous cartoon physics. Minimalist pure white background. Limited color palette: black and white line art with a singular, bright pop of crimson red on the stop sign.

Fig. 3 — A four billion parameter Guardian holds the giants to zero escapes.

We proved this works. Size does not dictate security. Placement and specialization do.

We placed a specialized four billion parameter Guardian agent directly in the path of the attackers. The giant attackers were hundreds of times larger than the Guardian.

The Guardian achieved a 100 percent win rate. Across 75 adversarial matches, the Guardian held the frontier models to exactly zero escapes.

The giant model has the mathematically difficult job of executing a flawless chain of commands. The small Guardian model has the bounded job of breaking just one critical link in that chain. We placed a fast intent reading classifier inline at sub second latency. The Guardian neutralized the threat before the command ever executed.

Secure Your Architecture Today

Your threat model is no longer a human hacker constrained by time and budget. It is an untiring machine that reasons through misconfigurations instantly.

You must assume any accessible environment is inherently compromised by default. You must defend it with deterministic inline intent classification.

We do not ask for your trust. We provide the evidence. Read the definitive intelligence report by Trinitite. Discover the exact mathematical signature of a defeated machine. Learn how to deploy reproducible evidence backed security for your artificial intelligence agents.

Stop matching the spelling. Start measuring the intent.

Read the Full White Paper

305 live breakout trials. 10,623 recorded shell commands. A fourteen cent breach and a four billion parameter Guardian that held the strongest open weights attackers on the planet to zero escapes. The complete methodology lives in Your Sandbox is Made of Glass: Guardian Agents Do Not Shatter.

Executive FAQ: AI Container Security

Why do keyword blocklists fail against artificial intelligence agents?

Keyword blocklists fail because artificial intelligence models dynamically alter the spelling of their commands. When a text string is blocked, an autonomous agent instantly rewrites the payload using base64 encoding, ASCII integer obfuscation, or alternate programming languages. This bypasses traditional regular expression filters entirely.

What is the cost of an automated container breach?

Our empirical measurements show the compute cost to execute a fully automated container escape using open weights frontier models is exactly fourteen cents. The financial barrier to offensive artificial intelligence operations is completely eliminated.

How does intent classification stop exploits?

Intent classification analyzes the underlying goal of a proposed command rather than its exact text. A specialized Guardian model recognizes the mathematical signature of an attack regardless of the disguise used. This allows the system to neutralize malicious actions that easily bypass static string filters.

Defend with Intent, Not Strings

The breach costs fourteen cents. The Guardian holds the line. See the reproducible evidence.

Topics

AI Security
Container Escape
Keyword Blocklists
Intent Classification
ASCII Obfuscation
Base64 Smuggling
Guardian Agent
Frontier Model Red Team
Generative Engine Optimization
Deterministic Security

Continue Reading

White Paper

Your Sandbox is Made of Glass: Guardian Agents Do Not Shatter

Blog

The Great AI Security Lie: Why You Cannot Patch a Guess