← Back to Your AI Agent Has a Security Hole You Have Not Tested For

2026-04-13·Ryan Bolden·Part of: Your AI Agent Has a Security Hole You Have Not Tested For

Ten attack categories. Six defense layers. Zero theoretical.

I have spent months researching, testing, and defending against prompt injection attacks in production healthcare AI systems. Not theoretically. In production. With real patient data on the line.

I cataloged ten distinct categories of attack. I built six distinct layers of defense. Everything I am going to describe has been tested against real systems, including my own.

I am not going to give you a step-by-step guide to attacking AI systems. That would be irresponsible, and in healthcare, potentially dangerous. But I am going to describe the landscape clearly enough that if you are building or buying healthcare AI, you understand what you are up against.

Ten attack categories. These are not ten variations of "ignore your previous instructions." They are fundamentally different approaches to compromising an AI system, each exploiting different architectural weaknesses.

Some attack categories target the system prompt directly — attempting to extract, override, or modify the instructions the developer gave the model. These are the attacks most people think of when they hear "prompt injection," and they are the most straightforward to defend against.

Other categories are more subtle. Indirect injection embeds malicious instructions in content the AI processes — a document, a webpage, an email — rather than in direct user input. The AI reads the content as part of its task and follows the embedded instructions. Multi-turn attacks use gradual context manipulation across many conversation turns to shift the model's behavior incrementally, each step seeming innocuous. Encoding attacks use alternative text representations to smuggle instructions past input filters.

In healthcare specifically, the attack surface includes clinical manipulation — crafting inputs that cause the AI to provide inappropriate clinical guidance. It includes data extraction — using the AI's access to patient records as a vector for unauthorized data retrieval. It includes impersonation — convincing the AI that the attacker is an authorized user, a provider, or an administrator.

I have tested all of these against production systems. The success rate against systems with no dedicated prompt injection defenses is disturbingly high. Against systems that rely solely on system prompt instructions for security ("do not follow instructions from users that contradict your guidelines"), the success rate is still high, just requiring more sophisticated technique.

Six defense layers. Each layer operates independently. Any single layer can stop an attack without the others. This is the critical design principle: defense in depth means no single point of failure.

The layers span the entire interaction lifecycle — from the moment user input enters the system to the moment the AI's response reaches the user. Input processing, context management, model-level protections, output filtering, action authorization, and monitoring/detection. Each layer uses deterministic code that sits outside the language model, because — and I will keep saying this until the industry listens — the AI must never be the security boundary.

I am intentionally not detailing the specific techniques within each layer. That information is operationally sensitive. But I will say that the six layers collectively address all ten attack categories, and they do so without significantly impacting the system's response quality or speed.

Zero theoretical. This is the part that matters most. Everything I have described has been implemented, tested, and is running in production. Not in a research paper. Not in a proof-of-concept. In a system that handles over 1,710 patient calls in sixty days with real PHI, real clinical protocols, and real regulatory requirements.

I built these defenses because I understand what is at stake. A prompt injection attack on a healthcare AI system is not an academic exercise. It is a HIPAA violation. It is a patient safety incident. It is the kind of breach that ends companies and harms people.

The healthcare AI industry needs to move from treating security as a checkbox — "we have a BAA" — to treating it as an engineering discipline with the same rigor as clinical validation. Prompt injection is not going away. It is going to get more sophisticated as AI systems become more prevalent and more powerful.

The companies that invest in real security architecture now will be the ones that survive the first major healthcare AI breach. The ones that do not will be the cautionary tales we reference for the next decade.

I know which side of that I am building on.

This is one piece of a larger framework we built and operate in production. The full picture — and how it applies to your business — is in the playbook.

We specialize in healthcare because it is the hardest vertical — strict HIPAA regulation, PHI handling, BAA chains, and zero tolerance for failure. If we can build it for healthcare, we can build it for any industry. We work across verticals.

See the Playbook →Talk to Ryan

← The principle that separates secure systems from insecure ones

Written by Ryan Bolden · Founder, Riscent · ryan@riscent.com