AI Agents Are the New Pentesters — And the New Attackers

Two hours. That is how long an autonomous AI agent needed to gain full read-and-write access to McKinsey's AI platform. No human help. No insider knowledge. Just a domain name.

The flaw it found? SQL injection — sitting undetected for over two years.

2 hoursTime to full database access

46.5MChat messages exposed

2+ yearsVulnerability went undetected

This is not about McKinsey having bad security. It is about a shift: AI agents now find flaws faster and more creatively than the tools most companies rely on.

What Happened: The McKinsey Lilli Breach

In late February 2026, CodeWall.ai — an autonomous offensive security platform — unleashed an AI agent against McKinsey's Lilli platform. The agent's only starting input was the domain name.

Lilli is not a side project. In production since 2023 and actively used in 2026, it serves over 43,000 McKinsey employees. More than 70% use it daily — for chat, document analysis, research retrieval, and AI-powered search across 100,000+ internal documents. It handles over 500,000 prompts every month.

Here is how the agent broke in:

Mapped the attack surface — It discovered publicly exposed API documentation containing over 200 endpoints
Identified weak points — 22 of those endpoints required no authentication at all
Found the flaw — One unprotected endpoint processed user search queries. The parameter values were safely parameterized, but JSON keys — the field names — were concatenated directly into SQL queries
Exploited it — Through 15 blind iterations, parsing error messages to understand query structure, the agent extracted live production data

The result: complete read-and-write access to the entire production database.

What Was Exposed

The scale of access was staggering:

46.5 million chat messages containing strategy discussions, client engagements, financial data, and M&A activity — all in plaintext (source)
728,000 files — 192,000 PDFs, 93,000 Excel spreadsheets, 93,000 PowerPoint presentations, 58,000 Word documents
57,000 user accounts representing nearly the entire employee base
384,000 AI assistants and 94,000 workspaces mapping the full organizational AI infrastructure
3.68 million RAG document chunks — decades of proprietary McKinsey research, methodologies, and frameworks — with exposed S3 storage paths
1.1 million files and 217,000 agent messages flowing through external AI APIs, including 266,000+ OpenAI vector stores

An IDOR vulnerability also allowed cross-user access to individual employee search histories, effectively revealing who was working on what.

Why Traditional Security Missed It

This is the part that should concern every CTO and IT leader. SQL injection remains the most common web application vulnerability according to OWASP — and yet this particular variant went undetected for over two years despite McKinsey running industry-standard scanning tools. JSON keys concatenated into SQL queries is a textbook flaw, but one that sits outside what automated scanners typically test.

Why? Traditional scanners are checklist-based. They test known patterns against known surfaces. They check parameter values for SQL injection — which were properly handled here. But they do not reason about the request structure itself.

⚠️

The Blind Spot

The vulnerability was in the JSON keys, not the values. Standard automated scanners parameterize values and move on — they never test whether the field names themselves are safe. This is the kind of creative exploitation that required reasoning, not just pattern matching.

The AI agent found it because it does not follow a playbook. It reads error messages. It forms hypotheses. It iterates and adapts. It thinks more like an experienced human attacker than a scanner — but at machine speed, without fatigue.

How AI Agents Think Differently

Traditional pentesting — whether performed by humans or automated tools — operates on a fundamentally different paradigm than the agentic workflows that power modern AI security agents. Understanding this difference is critical for any organization evaluating its security posture in 2026.

Traditional scanners follow a script. Test this endpoint for XSS. Test that parameter for SQLi. Check these headers. They are thorough within their scope but blind outside it. They do not chain findings. They do not reason about context.

AI pentesting agents operate in a loop: observe, hypothesize, test, learn, repeat. They chain discoveries. Exposed API docs lead to testing open endpoints, which leads to probing unusual input vectors, which leads to finding the JSON key injection. No static scanner would construct that path.

This is the same ReAct (Reason + Act) pattern that powers AI agents in enterprise settings. The architecture that makes agents useful for business also makes them effective at finding security flaws.

Here is the uncomfortable truth: if an AI agent can reason its way into your systems for defense, a malicious one can do the same. CodeWall's agent picked McKinsey as a target on its own — by reading the firm's public disclosure policy and recent platform updates. Their conclusion: "AI agents autonomously selecting and attacking targets will become the new normal."

"The same agentic architecture that helps a company automate workflows is what makes AI the most capable offensive security tool ever built. If you build with agents, you need to think like an attacker." — Keith Govender, Co-Founder of IJONIS, Hamburg

The Crown Jewel No One Is Protecting: System Prompts

The most dangerous finding in the McKinsey breach was not the millions of chat messages or the hundreds of thousands of files. It was the ability to silently rewrite the system prompts that control how the AI behaves — turning the company's own AI platform into a persistent, undetectable attack vector.

Lilli's 95 system prompts — across 12 model types — were stored in the same compromised database. These prompts control how the AI responds, what guardrails apply, what data it surfaces, and how it formats recommendations.

With write access, an attacker could:

Silently modify prompts without any deployment logs or code changes
Poison client advice by altering financial models or strategic recommendations
Instruct the AI to embed confidential data in its outputs for exfiltration
Remove safety guardrails enabling unauthorized disclosures
Create persistent backdoors — the AI itself becomes the attack vector

❌

The Prompt Infrastructure Gap

Most companies treat system prompts as configuration — stored in databases, edited through admin panels, rarely version-controlled, almost never monitored for integrity. In the age of AI agents, your prompts are your crown jewels. They need the same protection as your source code and cryptographic keys.

This is exactly the kind of threat that the Trust Spectrum framework addresses: system prompts sit at Layer 0 (the instruction layer), and without higher layers of protection — environment isolation, deployment gates, phase-based permissions, and kill switches — they are dangerously exposed.

The Asymmetry Problem: Offense Scales Faster Than Defense

The McKinsey breach shows a structural problem beyond any single flaw. AI-powered offense scales faster than traditional defense.

Consider the asymmetry:

Dimension	Traditional Defense	AI-Powered Offense
Speed	Quarterly pentests, annual audits	Continuous, 24/7 probing
Creativity	Checklist-based, known patterns	Reasoning-based, novel attack chains
Cost	$50K–$200K per engagement	Approaching zero marginal cost
Scale	One team, one target at a time	Parallel agents, unlimited targets
Learning	Reports filed, lessons learned slowly	Real-time adaptation within the attack

97% of organizations are already looking at AI for pentesting. The market is moving fast. Platforms like XBOW, Penligent, Aikido, and Horizon3.ai offer autonomous pentesting as a service. But if those tools exist for defenders, the same capabilities exist for attackers.

The window between "flaw exists" and "flaw is exploited" is collapsing. McKinsey patched within 24 hours of disclosure. The agent found the flaw in two hours. The exposure window is now hours — not the months or years that traditional security cycles assume.

What This Means for Your Business

You do not need to be McKinsey to be a target. If you deploy AI internally — chatbots, document search, RAG systems — you have the same class of attack surface. Mid-market companies often have less security budget but equally sensitive data.

Here is what to do now:

1. Audit Your AI Stack as Its Own Attack Surface

Your AI platform is not just another web app. It has unique attack vectors: prompt injection, RAG poisoning, model manipulation, IDOR on user data. Traditional web app pentests do not cover these. Demand AI-specific security assessments.

2. Protect Your System Prompts Like Source Code

Version-control all system prompts. Monitor for changes. Never store them in the same database as user data. Add integrity checks that alert on any edit. If your prompts are editable through an admin panel without audit logging, fix that today.

3. Assume Your API Documentation Is Public

McKinsey's 200+ endpoints were publicly documented. Yours might be too. Audit what is exposed. Remove docs from production. Make sure every endpoint requires auth — including the ones that "only" write search queries.

4. Move from Periodic to Continuous Security Testing

Quarterly pentests are a snapshot. AI-powered attackers run 24/7. Consider autonomous testing platforms that probe your systems nonstop. The economics have shifted — continuous AI testing is now cheaper than periodic manual engagements.

5. Build Defense-in-Depth for AI Systems

Do not rely on one security layer. The Trust Spectrum model gives you a framework: instruction-level controls, environment isolation, deployment gates, phase-based permissions, and kill switches. Each layer catches what the one below misses.

6. Train Your Team on AI-Specific Threats

Your security team knows XSS, CSRF, and SQL injection in parameter values. The McKinsey breach happened in JSON keys — a vector many professionals have never tested for. Update your threat models for AI-specific attack surfaces.

The Timeline That Should Wake You Up

Date	Event
Feb 28, 2026	AI agent identifies SQL injection, begins database enumeration, confirms full attack chain with 27 documented findings
Mar 1, 2026	Responsible disclosure email sent to McKinsey security team
Mar 2, 2026	McKinsey CISO acknowledges receipt, patches all unauthenticated endpoints, takes development environment offline, blocks public API documentation
Mar 9, 2026	Public disclosure

Discovery to full exploitation: two hours. Disclosure to patch: one day. McKinsey's response was fast. But the lesson is clear — the time to find and fix flaws has compressed from months to hours. Your security posture needs to match that speed.

The Bottom Line

The McKinsey Lilli breach is not an isolated incident. It is the new normal. AI agents that reason, adapt, and chain attacks are a different animal from the scanners most security strategies defend against.

The same architecture that makes AI agents useful for business — observe, reason, plan, act, repeat — also makes them the most capable offensive tool ever built. The companies that thrive will be the ones that understand both sides: how to build with agents and how to defend against them.

The question is not whether AI agents will probe your systems. They already are. The question is whether you hear about it from your own team — or from someone else's disclosure report.

Frequently Asked Questions

What is AI-powered pentesting?

AI-powered pentesting uses autonomous agents — not scripted scanners — to find security flaws. Instead of testing a checklist of known vulnerabilities, these agents reason about your system, form hypotheses, chain discoveries, and adapt in real time. They behave more like experienced human attackers than automated tools, but at machine speed and without fatigue.

How did an AI agent hack McKinsey's Lilli platform?

CodeWall's autonomous agent started with just a domain name. It found publicly exposed API docs, identified 22 unauthenticated endpoints, and discovered a SQL injection in JSON key names — a vector that standard scanners like OWASP ZAP missed. The entire process took two hours with zero human involvement.

Are system prompts a security risk?

Yes — and most companies do not treat them that way. System prompts control AI behavior, guardrails, and output formatting. In the McKinsey breach, all 95 prompts were stored in the same database as user data — with full write access. An attacker could silently modify them to poison outputs or remove safety guardrails without any deployment or code change.

How is AI pentesting different from traditional pentesting?

Traditional pentests follow checklists and test known patterns. AI agents reason, hypothesize, and chain findings — constructing attack paths no static scanner would try. They also run continuously rather than quarterly, and the cost per test is dropping fast. For a detailed breakdown of the leading platforms, see our autonomous pentesting tools comparison. The agentic workflow pattern that powers business automation is the same pattern that makes AI offensive tools effective.

What should companies do to protect their AI systems?

Six steps matter most. Treat your AI stack as its own attack surface with unique vectors like prompt injection and RAG poisoning. Protect system prompts like source code. Assume your API docs are public. Move from quarterly to continuous AI-powered testing. Build defense-in-depth using the Trust Spectrum framework. Train your security team on AI-specific threats beyond traditional web vulnerabilities.