Chaos Agents

Why OpenAI & Google’s Push Into Autonomous AI Could Be Their Undoing

Oct 09, 2025

Forget passive chat - or even the Vibes, the latest wave in AI is all about “agents” — systems that don’t just respond, but take initiative: navigating the web, manipulating tools, automating entire workflows. OpenAI has just turned ChatGPT into an agent; Google is folding agents into its enterprise product suite. But here’s the catch: there’s zero revenue so far, and the practical, safety, and business challenges are enormous. This is not just hype — it may well be the make or break moment for the LLM vendors. Failure here may truly be the pin that bursts the AI bubble as Agents introduce whole new levels of risk, data theft, and system vulnerabilities at a speed and scale previously impossible. Fools rush in where angels fear to tread - and when it comes to AI, there’s no shortage of the former.
And with that, on with the show!

What Are “Agents,” Anyway?

Before diving in, a quick framing:

Traditional AI (e.g. GPT, Claude) responds to prompts. You ask, it answers.
Agents go further: they plan actions, chain tasks, interact with external systems (web, API, file systems), and attempt multi-step goals autonomously or semi-autonomously.
In practice, agent design requires tool invocation, memory / state retention, error correction, monitoring of side effects, and safety constraints.

So it’s not enough to be smart — you need to be trustworthy, safe, robust. That’s a much taller bar.

OpenAI’s Agent Moves: ChatGPT Agent & Agent Kit

OpenAI recently made its boldest agent play yet.

In July 2025, OpenAI launched ChatGPT Agent, which it describes as: “ChatGPT now thinks and acts, proactively choosing from a toolbox of agentic skills to complete tasks for you using its own computer.”
This agent can browse the web, manipulate web pages, run code, fill out forms, fetch data, integrate with user tools, and more.
OpenAI also rolled out an Agent Kit for developers and enterprises to build custom agents.
Earlier, OpenAI had experimented with Operator (a browser-based agent that could, for example, place orders, navigate websites) — though Operator is now deprecated in favor of the more capable ChatGPT Agent.
OpenAI’s own safety disclosures admit the new “agent mode” introduces new risk surfaces, such as malicious web prompts, agent hijacking, and abuse.

In short: OpenAI is telling us, “We’re transitioning from assistant to agent; ChatGPT will act, not just answer.” What could possibly go wrong?

Google’s Agent Ambitions: Gemini Enterprise & Browser Agents

OpenAI isn’t alone. Google is stepping into agent territory as well.

Google has launched Gemini Enterprise, a new AI platform aimed at businesses. One of its capabilities includes pre-built AI agents for tasks like deep research, analysis, and custom workflows.
Google describes its agents as part of unifying AI for business, including no-code / low-code tools to build them.
In another front, Google has deployed versions of its Gemini model that can “use the web like you do” — via a browser interface, clicking, submitting forms, navigating UI actions, etc. The Verge reports on a Gemini 2.5 model that can “carry out tasks like submitting forms and performing UI testing,” effectively acting as a mini agent in a browser environment.
Google also offers Jules, its AI coding agent, which now supports command-line tooling, APIs, deeper integration with development environments.

Thus, Google is attacking the agent layer from both enterprise workflow suites and browser-based task automation. Again, what could possibly go wrong?

What They’re Hoping to Achieve

Why push this hard into agents? Here’s what OpenAI and Google (and their backers) are chasing — and hoping.

Higher value / stickier products
Agents promise more than conversation: they promise action. In theory, that raises the value to endusers and makes the product harder to abandon.
New monetization levers
If an agent can complete tasks (booking, procurement, analysis, research), there’s the possibility of taking a slice—commission, API fees, tool fees, plugin monetization, or upsell to premium agent capabilities.
Workflow embedding
Agents embedded in enterprise systems, email flows, CRMs, ERPs, could become central to how work gets done, making AI deeply integral rather than a companion.
Competitive differentiation
In a crowded AI assistant space, “agentic” behavior becomes a feature differentiator. If your AI just answers, but mine acts, you have product marketing power.
Data feedback and control
Agents interacting directly with systems and web interfaces yield richer data signals (clicks, tool usage, heuristics) than passive chat logs. That can fuel model improvement, telemetry, personalization.
Lock-in & ecosystem control
If agents become your interface into your apps, cloud, and data, switching costs rise. The provider becomes more central not just for intelligence, but as operational infrastructure.

The Reality Check: Why Agents Are Likely to Fail (Or Fail to Scale)

The gap between vision and reality is vast. Many reasons suggest these efforts will hit the hard walls of LLM limitations.

1. Zero to minimal revenue so far

There is virtually no public evidence of substantial revenue or productivity directly attributable to agent usage or agent licenses. The announcements focus on capabilities, future monetization, and integration.
OpenAI’s current revenue base comes largely from API usage, subscription (ChatGPT for business/enterprise), and licensing—not yet from agent-commission or performance-based fees.

2. Fragility in real-world environments

Agents must deal with messy, unpredictable environments: UI changes, web quirks, rate limits, authentication, CAPTCHAs, multi-factor flows, API errors, edge cases. Many demos break under the slightest variance to conditions that they were not previously taught how to handle. Sam Altman’s failed demo was yet another illustration that what can go wrong will truly go wrong if all variables are not accounted for up front.

3. Safety, security, and trust

Allowing an AI to act carries big risks:

Prompt injection / adversarial web: Malicious websites can trick agents into harmful behavior.
Privilege escalation: If an agent has access to APIs, file systems, databases, any flaw or bug could be dangerous.
Data leakage & privacy risk: Agents handling sensitive contexts must guard against leaks.
Error propagation & cascading failure: One wrong click or mis-action could cascade upstream in process chains.

Taking a page from Anthropic’s policy OpenAI said the quiet part out loud - that there are new vulnerability surfaces exposed by agent mode and additional safeguards must be employed prior to implementation.

4. Cost & complexity

Agents are more expensive to build, maintain, monitor, and debug:

Tooling integration, sandboxing, orchestration, fallback logic, rollback, audit trails, monitoring — all add overhead.
Latency, compute, memory, storage for states, actions, “thinking” loops — cost escalates.
The more capable the agent, the more rare (and expensive) edge-case handling gets.

5. Low substitution elasticity

Many tasks that agents attempt are still better done by humans or by domain-specific automation. Agents must both outperform and be trustworthy. The margin is narrow.

6. Adoption inertia

Enterprises are risk averse. They won’t hand over control to agents blindly. Compliance, auditing, governance, regulatory scrutiny, audit trails — all will be barriers. And given the general results experienced by most Enterprise AI projects, the results will be suboptimal. When business needs to change to use a tool - rather than the tool changing to serve the business, it creates a huge barrier to adoption - especially given the resources, time, and cost required to implement the technology in the first place.

7. Hype cap vs realism threshold

Agents are the talk of the moment; many parts of AI follow the hype cycle. Demos get applause, but real production with edge consistency, reliability, and safety is another order of difficulty.

What the Public / Early Usage Signals Say

In IBM’s 2025 AI investing data (Google-commissioned “ROI of AI 2025”), many financial services firms report positive ROI and agent use in production; but that is still early, biased, and not necessarily robust. It is also based on one industry (Finance) and one use-case (Trading Desk automation).
Google Cloud reports that 88% of early AI agent adopters say they see value (in marketing or creative operations) — though these are self-reported survey metrics, not audited financials.
In business press, Google highlights that ~52% of executives say their orgs are using agents.

But using an agent experiment is one thing; making an agent reliably deployable, maintainable, and consistently profitable is vastly more difficult.

Conclusion & What to Watch

OpenAI and Google’s agent push is a logical evolution of the AI stack — from chat to action. But it is a moonshot-level leap in complexity, risk, and cost. Right now, there’s little proof that it produces steady profit, and every fault line (security, robustness, governance) is magnified.

If this wave fails, it won’t just a product flop but a reputational and financial drag. The AI narrative may go through a “agent winter” like we’ve seen other credibility crashes before (Petfood.com anyone..?).

What to Watch:

When (and if) agents start being billed as agent-as-a-service, commission models, or paid agent plugins and if pricing is realistic considering the cost of operation, power, and training of the underlying platforms.
Metrics on agent reliability, uptime, error rates, rollback stats.
Incidents of agent misuse, prompt-injection attacks, or agent-caused data leaks.
Enterprise adoption vs retention: how many initial agent pilots survive past POC stage?
Regulatory scrutiny: data privacy, audit logs, unintended consequences.

As I see it - this is just another iteration of stupid computer tricks - LLMs by their nature are not capable of the consistency and repeatability of their results, exposing these organizations to both process and security disruptions that are unnecessary - unless headcount reduction is the focus for the executive team. The Pandora’s box is being opened - let’s see what jumps out first.

Smash the Machines!