Escaping Prototype Purgatory: Where is AWS for AI Agents?

January 31, 2026

This question has been running around my brain for a while, driven by two factors. First, building robust, production-ready enterprise agents that can handle scale, complexity and security is hard and complicated. Second, what if we could kind of abstract away all of that complexity in the way that AWS was so successful at?

The pitch sounds compelling: a managed platform that handles the gnarly infrastructure problems of deploying AI agents at enterprise scale. Security is baked in. Compliance, no problemo. Best practices are all there by default. Just bring your agent logic and go wild in the aisles!

I turned this into a sort of thought experiment, but the more I’ve considered the question, the more I think the AWS analogy breaks down in interesting ways. The hyperscalers are absolutely building toward this vision (AWS Bedrock AgentCore became generally available in October 2025, and Microsoft’s Azure AI Foundry is maturing rapidly), but what they’re creating is fundamentally different from the “neutral substrate” that made AWS transformative in cloud computing.

But first, the problem…

Building Enterprise Agents is a Mess

Before we get to the platform question, it’s worth understanding just how painful it is to ship production agents today, for those fortunate enough not to have had to do so. To be clear, we’re not talking about demo agents or “look what I built this weekend” prototypes. This is agents that handle sensitive data, integrate with business-critical systems, and need to satisfy compliance teams. The ones that if you’re not losing sleep over, you’re not doing it right.

The Security Problem Nobody Wants to Own

Every agent that can take actions is an attack surface. Prompt injection isn’t theoretical anymore; Lakera’s Q4 2025 data shows indirect prompt injection has become easier and more effective than direct techniques [1]. An agent that reads emails, queries databases, or browses websites is ingesting untrusted content that can manipulate its behaviour.

So you need input sanitisation. You need output filtering. Trust boundaries between different data sources are essential. You’ll probably want a separate security layer that operates outside the LLM’s reasoning loop entirely, because you can’t rely on the model to police itself. Unfortunately, most teams realise this after they’ve already built the “happy path”, only to then discover that retrofitting security is particularly brutal.

Identity and Authorisation

Your agent needs to act on behalf of users. That means OAuth flows, token management, scope limitations, and credential vaulting. It needs to access Salesforce “as Sarah”, but only read the accounts she’s allowed to see. It needs to query your data warehouse, but not the tables containing Personally Identifiable Information. This isn’t a solved problem, even for traditional applications. For agents that dynamically decide which tools to call based on user requests, it’s significantly harder.

Memory That Actually Works

Agents without memory are stateless assistants. Agents with memory need infrastructure to store it, retrieve it, scope it appropriately, and eventually forget it. Episodic memory (what happened in the conversation), semantic memory (facts about the user), and procedural memory (learned patterns) all require different storage and retrieval patterns. Build this yourself, and you’re suddenly maintaining a bespoke memory system alongside everything else.

Observability When You Can’t Predict Behaviour

Traditional application monitoring assumes you know what the system should do. Agent observability has to handle emergent behaviour, such as the agent deciding to try four different approaches before succeeding, or going down a rabbit hole that burned tokens for no good reason, or using a tool in a way you didn’t anticipate.

You need trace visibility at every step, cost tracking, and debugging tools that make sense of non-deterministic execution paths. Off-the-shelf Application Performance Monitoring tools don’t cut it.

Multi-Agent Orchestration

Single agents hit capability ceilings rather quickly. The current direction is toward multiple specialised agents coordinating themselves (a supervisor agent breaking down tasks, specialist agents handling specific domains, and handoffs between them). Gartner predicts that a third of agentic AI implementations will combine agents with different skills by 2027 [2], and to me, that seems conservative.

But orchestrating multiple agents means managing communication protocols, shared context, failure handling when one agent breaks, and preventing infinite loops when agents delegate to each other. More agents = More Complexity and Pain.

Compliance and Audit Requirements

In regulated industries, “the AI did something” isn’t an acceptable audit trail. You need to prove what data the agent accessed, what decisions it made, what actions it took, and that it operated within defined boundaries. This has to be tamper-evident and queryable.

Oh, and for bonus points, if you operate internationally, each jurisdiction will likely have its own requirements. For example, California’s new AI regulations took effect in January 2026, with enforcement shifting from policy to live production behaviour [3].

The point isn’t that any single problem described above is insurmountable. It’s that solving all of them simultaneously, whilst also building the actual agent functionality your business needs, is a massive undertaking. Most teams get stuck in what I’d call “prototype purgatory”. Impressive demos that never make it to production because the operational complexity is too high.

This is the gap that managed platforms are trying to fill. The mythical “AWS for AI Agents.”

Who’s Actually Building This?

The hyperscalers have moved aggressively into this space, as you’d expect. A few offerings stand out:

AWS Bedrock AgentCore

Amazon Bedrock Logo

Amazon’s entry is the most developed. AgentCore is pitched as “an agentic platform for building, deploying, and operating effective agents securely at scale—no infrastructure management needed” [4].

The service suite covers most of the pain points I listed above:

AgentCore Runtime: Serverless execution with session isolation using Firecracker microVMs. Each agent session runs in its own protected environment to prevent data leakage between users.
AgentCore Gateway: Transforms existing APIs and Lambda functions into agent-compatible tools, with native MCP (Model Context Protocol) support. Handles the plumbing of connecting agents to enterprise systems.
AgentCore Memory: Persistent memory management, including the recently added episodic memory, so agents can learn from interactions over time.
AgentCore Identity: OAuth-based authentication for tool access, with support for custom claims in multi-tenant environments.
AgentCore Observability: Step-by-step trace visualisation, cost tracking, debugging filters.
AgentCore Policy: This is the interesting one. Natural language policy definitions that compile to Cedar (AWS’s open-source policy language) and execute deterministically at the gateway layer, i.e., outside the LLM reasoning loop [5].

That last point really matters. Policy enforcement that operates outside the model means constraints are hard limits, not suggestions. It doesn’t matter how cleverly a prompt injection tries to reason around a restriction; the gateway blocks it before execution. For compliance teams, this is the difference between “we hope the AI behaves” and “we can prove it can’t misbehave.”

Microsoft Azure AI Foundry

Microsoft’s approach is similarly ambitious but more tightly integrated with its existing stack. The headline feature is that over 1,400 business systems (SAP, Salesforce, ServiceNow, Workday, etc.) are available as MCP tools through Logic Apps connectors [6]. If your enterprise already runs on Microsoft, this level of built-in integration is compelling.

Their AI Gateway API Management handles policy enforcement, model access controls, and token optimisation. The positioning is less “build from scratch” and more “extend what you already have with agent capabilities.”

Google Vertex AI

Vertex AI Agent Builder is a genuine competitor to AgentCore. The platform follows the same “build, scale, govern” structure as AWS. The Agent Development Kit (ADK) is Google's open-source framework that has been downloaded over 7 million times and is used internally by Google for its own agents [9]. Agent Engine provides the managed runtime with sessions, a memory bank, and code execution. Agent Garden offers pre-built agents and tools to accelerate development.

Security and compliance capabilities are mature through VPC Service Controls, customer-managed encryption keys, HIPAA compliance, agent identity via IAM, and threat detection via the Security Command Centre. Sessions and Memory Bank are now generally available, and the platform is explicitly model-agnostic; you can use Gemini, as well as third-party and open-source models from their Model Garden.

Where Google really differentiates itself is ecosystem integration. They offer more than 100 enterprise connectors via Apigee for ERP, procurement, and HR systems. Grounding with Google Maps gives agents access to location data on 250 million places. If you're already running BigQuery, Cloud Storage, and Google Workspace, these integrations may be compelling.

Salesforce Agentforce

Agentforce is worth mentioning because it represents the most opinionated end of the spectrum. It’s not trying to be a general-purpose agent platform. It’s saying “agents exist to automate Salesforce workflows, and that’s it.”

Agentforce 2.0 embeds autonomous agents directly into Salesforce to manage end-to-end workflows, from qualifying leads to generating contracts. The agents have self-healing capabilities (automatically recovering from errors) and native human handoffs when escalation is needed [11].

The tradeoff is stark. If you’re all-in on Salesforce, the integration depth is unmatched. The agents understand your CRM data model, your workflow rules, and your permission structures. No translation layer is required. But if Salesforce isn’t your system of record, Agentforce is largely irrelevant.

However, this creates a useful reference point for thinking about the spectrum of approaches. Salesforce Agentforce offers maximum lock-in and deep integration for a narrow use case. Amazon’s AgentCore offers moderate opinions with broader applicability. Framework-level tooling offers maximum flexibility but also a significant operational burden. There’s no objectively correct position on this spectrum; it all depends on what you’re building and what constraints you’re willing to accept.

The Consultants Have Joined The Call

It’s also worth mentioning PwC who launched an “agent OS” that orchestrates agents across multiple cloud providers and enterprise systems [7]. They’re essentially packaging best practices and governance frameworks atop hyperscaler infrastructure. Accenture and others are doing similar things, as you’d expect.

This makes objective sense. Enterprises often want a trusted advisor to de-risk adoption rather than building expertise in-house. The consultancies are betting they can capture value at the integration layer. IBM, for example, is trying to leverage its success in helping clients with multi-cloud implementations into AI.

What About the Drag-and-Drop Builders?

There’s a whole category of platforms (Relevance AI, n8n, Lindy, various other low/no-code agent builders) that I’d put in a different bucket entirely. These are designed to let business users create lightweight automation without writing much or sometimes any code.

They can absolutely work for certain limited use cases. But they primarily exist for experimentation and getting an agent running quickly, not “last-mile embedding” into production systems with proper auth, governance, and compliance [8]. The enterprise infrastructure play is about taking agents that development teams have already built and making them safe to deploy at scale. This is a fundamentally different thing.

Why the AWS Analogy Breaks Down

Here’s where I keep coming back to AWS. For those old enough to remember, Amazon won by being radically neutral about what you ran on their infrastructure. They didn’t care if it was a modern microservices architecture or a legacy Perl script from 2003. The value was in the primitives (compute, storage, networking), being reliable, scalable, and pay-as-you-go. Everything else was your problem.

This created incredible growth because no technology choice was “wrong” for AWS. Migrations could be lifted and shifted without major re-architecture. They captured the long tail of weird enterprise workloads that nobody else wanted to support. The agent platforms being built today are fundamentally different. And a bit like your slightly racist aunt, they’re very opinionated.

AgentCore doesn’t just say, “here’s compute, run whatever agent framework you want.” It says, “here’s how memory should work, here’s how tools should integrate, here’s how policies should be enforced, here’s how observability should be structured.” The value proposition is in their specific abstractions, not neutral infrastructure. If you don’t use those abstractions, you’re basically just using EC2 with extra steps.

Why the Shift to Opinionated Platforms?

There are a few reasons:

Security requirements force it. With traditional compute, if your application gets compromised, that’s your problem within your “blast radius”. When agents have tool access and can take actions in external systems, the platform must ensure containment. You can’t offer “run whatever agent logic you want” without guardrails; the liability is simply too high.

The primitives aren’t settled. When AWS launched, everyone largely agreed on what “compute” and “storage” meant. Nobody yet agrees on what “agent memory” or “tool orchestration” should precisely look like. MCP is emerging as a standard for tool integration, but it’s still evolving quickly. Memory architectures vary wildly. Multi-agent coordination patterns are experimental, so platforms are making bets on specific patterns, hoping they become the standard. This is inherently opinionated.

Higher value capture. Neutral infrastructure commoditises quickly, becoming a race to the bottom on price. Opinionated platforms can charge more because they’re solving harder problems. If you’re just selling compute, you compete on price. If you’re selling “enterprise-ready agent deployment with compliance built in,” you capture more margin.

Lock-in by design. Once you’ve built around AgentCore’s memory service and gateway patterns, migration is expensive. Of course, as many enterprises have found, this is also true to an extent with AWS, particularly if you have exotic components in your enterprise architecture that aren’t widely supported elsewhere.

The Trust Problem This Creates

The “support anything” approach was what made AWS trustworthy as an infrastructure provider. Enterprises could adopt it knowing they weren’t betting on AWS’s opinions being correct, only on AWS's operational excellence.

The opinionated agent platform approach requires a different kind of trust. It requires the belief that AWS (or Microsoft, or Google) has figured out the right patterns for agent development and is willing to build around them.

That’s a harder sell when:

The patterns are still evolving rapidly
Different use cases might genuinely need different architectures
The hyperscalers have obvious incentives to push you toward their own models (Nova for AWS, Azure OpenAI for Microsoft)

Yes, AgentCore supports external models like OpenAI and Anthropic [^9]. But the integration depth varies. The path of least resistance leads toward their ecosystem.

Could a Neutral Alternative Exist?

Theoretically, someone could build “EC2 for agents”, i.e., just isolated compute with no opinions. Run LangChain, CrewAI, AutoGen, your own custom framework, whatever. No prescribed patterns, just secure sandboxed execution.

The problem is that the hard aspects of agent deployment are exactly the things that require opinions:

How do you enforce that an agent can’t exfiltrate data? You need a position on network egress controls, on what counts as sensitive data, and on whether the agent can write to external APIs.
How do you audit what it did? This requires deciding what constitutes a step worth logging, how to capture tool calls, and what metadata matters.
How do you manage credentials for tool access? OAuth flows, token refresh, and scope limitations all require specific patterns.
How do you prevent prompt injection from untrusted sources? You need to decide where trust boundaries sit and how to sanitise retrieved content.

You can’t solve these without taking architectural positions. So the “neutral substrate” approach soon collapses into “you’re on your own”, which is exactly where most enterprises are today, and why some are struggling.

The Vercel Analogy Might Be Closer

A better comparison might be Vercel or Netlify, platforms that have taken a strong position on how web applications should be built and deployed. They didn’t try to be neutral infrastructure. They said “here’s the right way to do this” (JAMstack, serverless functions, edge rendering, etc.) and made that path the easy one.

Developers adopted them not because they supported everything, but because they made the opinionated approach feel effortless. Similarly, the winning agent platforms will probably be ones that make secure, observable, compliant agent deployment the path of least resistance, even if that constrains what you can do.

Where Value Will Accrue

So, following my thought experiment to its conclusion, here’s how this could play out:

Hyperscaler platforms will capture the majority of enterprise spend. Companies with real compliance requirements and limited appetite for infrastructure complexity will pay the premium and accept the lock-in. AgentCore and Azure AI Foundry are the obvious choices depending on existing cloud commitments.

Framework-level tooling (LangChain, CrewAI, Strands, custom implementations) will serve teams who want control and are willing to own operational complexity. So fintechs with strong engineering cultures, AI-native startups, and research teams. A smaller segment but more technically sophisticated.

The middleware layer (i.e., observability, security, evaluation) has room for independent players. These tools can be platform-agnostic in ways that the core runtime can’t. LangSmith for debugging, Say Arize for monitoring, the security layer that Lakera occupied before Check Point acquired them [10]. This might be where the interesting startups emerge.

Consulting and integration services will capture significant revenue, helping enterprises navigate the transition. The technology is complex enough that most companies will want guidance.

The Timing Risk

It is a particularly difficult time for large companies to assess how much AI Agent infrastructure to be working on. Building on any of the current platforms now means betting on architectural patterns that might get superseded. MCP could evolve in a way that fundamentally breaks certain things. Memory architectures might standardise around different approaches. Multi-agent orchestration patterns are still largely unproven at scale.

For enterprises adopting these platforms early (and, contrary to the hype train, it is still very early) they may be building on foundations of sand that then shift in different directions. But there is also risk for enterprises in waiting and staying stuck in “prototype purgatory” while competitors ship production agents and capture market position.

There is no obviously correct answer. Which is probably why this space feels so chaotic. And of course, chaos is inherently interesting.

Pass the popcorn.

—

References

[1]: Lakera Q4 2025 threat data showed indirect prompt injection becoming more effective than direct techniques, with attackers increasingly targeting the data ingestion surfaces of agentic systems.

[2]: Gartner predicts one-third of agentic AI implementations will combine agents with different skills by 2027, with 40% of enterprise applications featuring task-specific AI agents by the end of 2026. Source: Gartner Press Release, August 2025

[3]: California AI regulations took effect January 2026, shifting AI regulation from policy documents to live, in-production behaviour requirements.

[4]: Amazon Bedrock AgentCore product page. Source: AWS Bedrock AgentCore

[5]: AgentCore Policy integrates with AgentCore Gateway to intercept tool calls in real time. Policies defined in natural language automatically convert to Cedar and execute deterministically outside the LLM reasoning loop. Source: AWS What’s New, December 2025

[6]: Azure AI Foundry provides 1,400+ business systems as MCP tools through Logic Apps connectors, with AI Gateway in API Management for policy enforcement. Source: Microsoft Tech Community, November 2025

[7]: PwC’s agent OS is cloud-agnostic, enabling deployment across AWS, Google Cloud, Microsoft Azure, Oracle Cloud Infrastructure, and Salesforce, as well as on-premises data centers. Source: PwC Newsroom

[8]: Visual agent builder platforms are designed for first-mile acceleration—getting an agent running fast—not last-mile embedding inside production products with user-scoped auth and governance. Source: Adopt.ai analysis of agent builder categories

[9]: AgentCore works with models on Amazon Bedrock as well as external models like OpenAI and Gemini. Source: Ernest Chiang’s technical analysis

[10]: Check Point acquired Lakera in September 2025 to build a unified AI security stack, integrating runtime guardrails and continuous red teaming into their existing security platform. Source: CSO Online, September 2025

[11]: Agentforce 2.0 embeds autonomous agents directly into Salesforce with self-healing workflows that automatically recover from errors and transparent human handoffs when escalation is needed. Source: Beam AI analysis of production agent platforms

I am a partner in Better than Good. We help companies make sense of technology and build lasting improvements to their operations. Talk to us today: https://betterthangood.xyz/#contact