Escaping Execution

AI Safety’s Money Is Betting the Agent Will Cooperate.

Jun 16, 2026

Welcome to Silicon Sands News, the go-to newsletter for investors, senior executives, and founders navigating the intersection of AI, deep tech, and innovation. Join ~35,000 industry leaders across all 50 U.S. states and 117 countries, including top VCs from Sequoia Capital, Andreessen Horowitz (a16z), Accel, NEA, Bessemer Venture Partners, Khosla Ventures, and Kleiner Perkins. Our readership also includes decision-makers from Apple, Amazon, NVIDIA, and OpenAI.

This week, we explore the fastest-funding corner of AI security, where more than $100 million has moved in eighteen months into controls that an AI agent has to agree to call, and where the one architectural question nobody is scoring decides which of these companies is a business and which is a feature. Execution-time alignment is the third layer of AI safety. The funded version of it asks the agent nicely.

Let’s Dive Into It..

Key Takeaways

For VCs and LPs

The category is real money, and it is consolidating before it matures: Galileo has raised $68 million, Geordie AI took $30 million from Balderton on June 8, CodeIntegrity raised $5 million in May, and the two standouts already exited: Snyk bought Invariant Labs less than a year after it spun out of ETH Zurich, and Check Point bought Lakera in September. When incumbents acquire this early, the standalone product becomes a platform feature. Price the rounds as feature acquisitions unless the architecture says otherwise.
Most of the funded work sits in the layer the agent has to cooperate with: Filters, rails, gateways, and policy proxies all share one property: the agent or its orchestrator chooses to call them. That is genuine value, and it is also the most copyable kind. The diligence question that separates a moat from a wrapper is whether the control sits on the only path to a consequential action, or on a path the agent can decline to take.
The scarce position is structural enforcement that can be proven, and almost no one holds it: A control the agent cannot route around, with a machine-checked guarantee behind it, is rare because it is hard to build and harder to verify. The companies that can demonstrate it under adversarial conditions will command pricing that the filter vendors cannot, because they are selling containment rather than observation.
Demo fidelity is the wrong thing to score, and there is a cleaner test: A model that catches a prompt injection in the demo can still be walked around in production by an injection it did not catch. Ask the vendor a single question instead: Can the agent choose not to invoke this control? The answer reorders the entire competitive set.
The label is about to stop carrying information: Within two quarters, every agent tool will market itself as runtime security or agent governance, just as every model became a world model. The term will inflate and detach from any guarantee. The edge is in diligencing the architecture under the label, while most of the market is still buying the label.

For Senior Executives

Your agent’s guardrails most likely run on the honor system: The popular open-source controls, NVIDIA NeMo Guardrails, Guardrails AI, and Protect AI’s LLM Guard, inspect text going in and out and are, by the assessment of the people who catalog them, an LLM checking an LLM, fooled by the same tricks they catch. Even the out-of-process gateways depend on the agent routing its calls through them. Find out which of your controls the agent could skip.
Authentication is not authorization, and your stack probably conflates them: The MCP specification now classifies MCP servers as OAuth resource servers, which answers the question of who the agent is. It does not answer what the agent is allowed to do with a specific tool and specific parameters: demand action-level authorization that evaluates the call, not just the caller.
The threat is cataloged, not hypothetical: The OWASP Top 10 for Agentic Applications names goal hijacking as the leading risk, MITRE ATLAS has added agent-specific attack techniques, and CodeIntegrity demonstrated that it could compromise Notion in under four hours. Treat agent blast radius as a live exposure with a known attack catalog.
Use a four-part test for every safety vendor: Is the control outside the agent’s process? Does it fail closed when it cannot reach its policy? Does it sit on the only path to the action? And can an outside party verify its decisions? A control that misses any of these is monitoring with a safety label, useful for forensics and weak for containment.
The regulatory clock is running, and the readiness gap is wide: EU AI Act high-risk obligations apply from August 2, 2026, while Deloitte’s 2026 work finds that only 20 percent of organizations have mature AI governance. Close that gap on your own schedule, before an incident or an auditor sets the schedule for you.

For Founders

Do not build another filter, because the cooperative layer is already crowded and consolidating: Between the open-source projects, the funded gateways, and the two acquisitions, the market for inspecting agent inputs and outputs is full. The opening is the structurally hard part: enforcement the agent cannot route around, on the only path to the action.
Name your architecture, and it becomes a differentiator: When you can tell an investor exactly whether your control is cooperative or structural, and where that holds and where it breaks, you signal that you understand your own system. Vagueness reads as weakness to the diligence partners who know this space, and more of them know it every month.
Proof is a moat that capital cannot buy quickly: Formal verification of an enforcement guarantee, the kind a machine checks rather than a reviewer eyeballs, is rare in this field because it is genuinely difficult. If your enforcement path is machine-checked, instrument and document it before diligence asks, because it is the claim a competitor cannot fake in a demo.
Compose with the policy engines rather than fight them: Open Policy Agent, AWS Cedar, Cerbos, and Oso already support fine-grained rule authoring, and a whole MCP authorization ecosystem has formed around them. The scarce asset is not another policy language. It is the unbypassable place to run the policy, where their rules can be fed into.
The incumbents are buying, so pick your exit posture on purpose: Snyk and Check Point have shown they will absorb this category. Either build toward that acquisition with a clean architectural story, or build the one thing a platform cannot quickly replicate, which is verified unbypassability, and make them pay for it.

On a Wednesday in late May, a startup called CodeIntegrity closed a $5 million seed round. The pitch was a demonstration. The company had walked into the note-taking app Notion and compromised it in under four hours, a result clean enough to earn a mention in The Economist. Most of the room took the obvious lesson: AI agents are dangerous when they touch real systems. The quieter lesson is the one worth money. Every control CodeIntegrity and its competitors are selling works only if the agent agrees to call it.

That is the honor system, and right now, the safety money is going there.

The money found the agent’s safety.

The capital arrived fast, and it is still arriving. Galileo, which builds evaluation and guardrail tooling, has raised $68 million in a Series B led by Scale Venture Partners and Premji Invest, with Comcast and Twilio among its customers. Geordie AI, founded in 2025 by Darktrace alums, raised $30 million in Series A from Balderton on June 8 to provide enterprises with visibility into agent behavior. CodeIntegrity took its $5 million seed from Syn Ventures in May. Guardrails AI raised $7.5 million earlier. Add the open-source projects with real adoption, and the platform offerings from the cloud vendors, and a full category has formed in about eighteen months.

The faster signal is the consolidation. Snyk acquired Invariant Labs in June 2025, less than a year after the company spun out of ETH Zurich, folding its runtime gateway into a developer-security platform. Three months later, Check Point acquired Lakera, the company behind the Gandalf prompt-injection game, to anchor an end-to-end AI security stack. When a category’s two most admired startups get absorbed by incumbents within a year of founding, the market is telling you something about where the standalone value sits, and the answer is closer to feature than to franchise.

The demand for money is genuine. The OWASP Top 10 for Agentic Applications now ranks goal hijacking as the top risk; MITRE ATLAS has added agent attack techniques; and the EU AI Act’s high-risk obligations land on August 2, 2026, against a backdrop in which only one in five organizations has mature AI governance. The problem is real, the deadline is dated, and the buyers are nervous. That is a market. The question is what they are actually buying.

Safety has three layers, and the new one is a land grab.

It helps to put the whole field on one shelf. AI safety has historically lived in two layers. The first is training-time alignment, the work done before the model ships: reinforcement learning from human feedback, Constitutional AI, the fine-tuning that shapes what a model tends to do. The second is inference-time alignment, the work done as the model generates: system prompts, content filters, the guardrail libraries that read a prompt or a response and decide whether to allow it. Both are mature, both are valuable, and both have a ceiling. They shape what a model is inclined to say. They do not govern what an agent can do once it can invoke a tool, move money, or run a command.

That third layer has a name worth using. We will call it “execution-time alignment”: the discipline of constraining what an AI system is permitted to do at the moment of action. It is the layer the money just discovered, and the discovery is recent enough that the academic and commercial work is landing in the same season. A March paper from APort Technologies, titled Before the Tool Call, put the gap plainly, observing that agents today have passwords but no permission slips, and that neither training-time alignment nor after-the-fact evaluation enforces authorization at the level of individual tool calls. That is the layer, stated cleanly, by one of the several teams now racing into it. The research is converging on it from its own direction, too. A runtime-enforcement system called AgentSpec was presented at ICSE this year as a customizable enforcement framework for safe LLM agents, alongside work such as Pro2Guard, which uses probabilistic model checking to intervene before an agent acts. When the startups, the incumbents, and the peer-reviewed venues all arrive at the same layer in the same year, the layer is real.

So the framing that the market skipped this layer is already outdated. The layer is a land grab. The more useful question is what kind of control each entrant is actually building, because they sort into two camps that look similar in a demo and behave very differently in production.

Cooperative controls ask the agent to behave.

Start with the larger camp, because nearly everyone is in it. Cooperative control is one in which the agent, or the orchestration framework running the agent, must choose to invoke. The control can be excellent at its job and still belong to this camp, because its protection depends on being called.

The in-process filters are the clearest case. NeMo Guardrails, Guardrails AI, and LLM Guard run inside the application, inspect text, and try to catch prompt injection, leaked secrets, and toxic output. The people who review these tools for a living describe the limitation precisely: this is a classifier that relies on a probabilistic model and is vulnerable to the same techniques it is trying to detect. A guardrail in the agent’s own process can also be replaced, skipped, or starved of the call it needs to do its job.

The gateways and proxies are a real step up and remain cooperative. Invariant Gateway, now part of Snyk, sits as a proxy in front of model and tool APIs and applies policy at runtime. Maxim AI’s open-source Bifrost performs work similar to that of a gateway, with microsecond-scale overhead. CrowdStrike’s Falcon AIDR builds a policy framework on top of NeMo Guardrails, with monitor-then-enforce modes. These run outside the model, which is the right instinct, and they protect you to the exact degree that the agent’s traffic is routed through them. Point the agent at a tool the gateway does not mediate, and the gateway never sees the call.

Even the authorization engines, the most rigorous in the cooperative set, inherit this property. Open Policy Agent is being positioned as a containment boundary, where every tool call must pass through policy before reaching a service, with Strata’s Maverics gateway already enforcing this on MCP calls. The logic is exactly right: the agent does not decide what is allowed—the policy engine does. The catch is structural. The engine only rules on the calls that reach it, and the same writeup concedes that a centralized policy service is a failure point if it is not highly available. A whole MCP authorization ecosystem now exists on this model, including Cedar for Agents, Cerbos, Permit.io’s gateway, and ScopeBlind’s protect-mcp with signed receipts. Strong policy, cooperatively invoked.

None of this is a knock on the cooperative camp. These are useful systems, and several are excellent. The point is narrower and concerns investment, not engineering. A control whose protection depends on being called is a control whose value depends on a convention, and conventions are the easiest thing in software to copy, bundle, and give away.

Why cooperative collapses into a feature

Here is the part that should concentrate the mind of anyone allocating capital. The cooperative layer is on a path the industry has watched before. When a capability depends on a convention rather than an architecture, it gets absorbed.

Watch the motion. The filters are open source and free, so they cannot be a business on their own, only a feature of one. CrowdStrike already wraps NeMo Guardrails inside Falcon. The cloud vendors ship content safety as a checkbox in Bedrock and Azure. Platforms are acquiring the gateways, as happened with Invariant within Snyk and Lakera within Check Point. The policy engines are general-purpose infrastructure that predates agents, and OPA and Cedar will be in your stack whether or not your agent vendor charges for them. Every layer of the cooperative camp has a gravity pulling it toward zero price as a standalone and toward inclusion as a feature of a bigger security platform.

That is not a prediction; it is a description of moves that already occurred over the last 12 months. For an investor, it sets the ceiling. A company whose core asset is better cooperative control is competing against free, open-source software below it and acquisitive platforms above it, and the exits so far have been early acquisitions rather than independent scaling. Those can be fine outcomes. They are feature outcomes and should be priced as such, not as the franchise the funding-round headlines imply.

Authentication is not authorization.

There is confusion buried in most agent stacks that the funding has not fixed, and it explains why the cooperative layer keeps falling short. Late in 2025, the MCP specification formally classified MCP servers as OAuth resource servers, which was the right move for identity. Authentication establishes the caller. Authorization rules on the action. An agent can clear the first with valid credentials and still need the second to be stopped, and an agent with good credentials and a hijacked goal is exactly the case a system built on identity alone waves straight through.

This is the gap the authorization engines are trying to fill, and it is why the pattern of a policy engine gating every tool call is spreading. The agent should not decide what is allowed; a separate authority should rule on the action before it executes. The principle is correct, and the funded implementations keep landing it in the cooperative camp because the gate only ever sees the calls that are routed to it. Identity tells you the agent is who it claims to be. Authorization on the only path tells you the action is permitted, whether or not the agent wanted it checked. The second one is the harder build and contains a compromised agent.

Execution-Time Alignment

The defensible position is in the other camp and is defined by a single property. A structural control sits on the only path to a consequential action, given how the system is deployed, so the agent cannot choose whether to call it. The agent cannot route around it, cannot skip it, cannot starve it, because there is no other route to the action. The control is not a convention the agent is trusted to honor. It is a property of the deployment.

This is the camp we are building as an open-source project under an Apache 2.0 license. This week, we open-sourced a kernel designed around exactly that property, under Apache-2.0, with a paper behind it. The kernel runs in its own process, fails closed when it cannot reach its policy, sits on the only path between the agent and a consequential action, and writes a signed log that an outside party can verify. The fail-closed guarantee is machine-checked by an SMT solver and a model checker, rather than asserted in a README. I am not pointing you at it as a product pitch. I am pointing to it as the shape of the argument because the architecture is what I want you to recognize when someone else builds it, too.

APort’s pre-action authorization, the OPA-as-containment pattern, and ScopeBlind’s signed receipts are all reaching toward the same property from different directions. The differentiation in this camp is not the idea, which is now in the open literature. It is two things that are genuinely hard—first, making the control unbypassable by construction rather than by convention, so the guarantee survives a compromised agent—second, proving the guarantee with a machine rather than asserting it, so a buyer can trust it without trusting the vendor. The companies that hold both will be the ones the platforms cannot casually replicate, because you cannot acquire formal proof the way you acquire a feature.

For a buyer, the two camps reduce to a four-part test you can run inside one meeting. Is the control out of the agent’s process? Does it fail closed when it cannot reach its policy? Does it sit on the only path to the action? Can an outside party verify its decisions without trusting the vendor? A cooperative control passes the first question or two and fails the rest. A structural one is built to pass all four, and that fourth question is the one that turns a safety claim into something a regulator or an insurer can rely on.

The reason the test matters now is that the word is about to stop meaning anything. We watched this happen with world models this spring, when the CEO of one of the best-funded examples predicted to TechCrunch that every company would soon claim the term to raise money, and he was right within months. Agent security and runtime governance are next in line for that treatment. Once the label is on every deck, the only thing left to diligence is the architecture under it, and the four-part test is how you read the architecture after the marketing has gone uniform.

Let’s Wrap This Up

The next time a vendor walks you through an agent-safety demo, ask one question and watch what happens to the room. Can the agent choose not to call this control? If the answer is yes, you are looking at observability with good production values, and you should pay observability prices for it and expect a platform to bundle it within a year. If the answer is no, and they can prove it with something a machine checked rather than a slide that asserts it, you are looking at the rare thing the rest of the category is circling. The money has found execution-time alignment. It has not yet learned to tell the cooperative version from the structural one. That gap, between the label everyone will soon claim and the architecture almost no one has, is the trade.

Disclaimer: This is educational content, not financial advice. The author, Dr. Seth Dobrin, is the founder of ARYA Labs, which builds and has open-sourced software in the category discussed here, and that interest is disclosed in the body where the relevant work appears. Companies, rounds, and research are cited to primary or best-available sources for your own evaluation. The views and opinions expressed above are current as of the date of this document and are subject to change.

Discussion about this post

Ready for more?