Derek Banks, Informational, Podcasts, Red Team, Red Team Tools AI, Artificial Intelligence, Fusion AI, penetration testing, Pentesting

Everyone’s Selling AI That Kills Pentesting. We Built One That Doesn’t.

| Derek Banks

Right now, it feels like everybody and their brother has a new agentic red team product that is going to kill all penetration testing… Close the doors. Fire the testers. The robots have it from here!

We have been hearing that one a lot. So when Melisa from our Business Capture team sat down with Brian Fehrman and me for this episode of AI Security Ops, she started with, “What is this thing you built, and is it the same hype everyone else is selling?”

Short answer: No.

What we built, Fusion AI, runs at about a third the cost of a traditional external pentest, a human tester still signs off on every finding, and it is not here to replace anybody.

Here’s the longer answer —

Where It Started

The genesis was a dinner back in January. Brian and I were in DC with John and Erica at a Chinese restaurant, and they basically threw down a challenge: go build our own AI-powered external penetration testing offering. We could already see which way the wind was blowing with traditional pentesting, and the data we look at internally was telling the same story. So the marching orders from John were clear, and we ran with it.

It helped that this is not new ground for either of us. I went and got a master’s in data science over the pandemic, and then Brian one-upped me and got a doctorate in data science and engineering. When John and Erica looked around for who should take this on, it made sense.

We started small, just testing the waters with Claude Code and Claude Code skills. It quickly grew into a custom-coded agentic platform that ingests external scan results and then spawns agents to go investigate them. That part, on the surface, is roughly what a lot of folks in the industry are doing. The difference is what we put inside it.

The Part That Actually Matters

A tool is not expertise. Just because you have a hammer does not make you a master carpenter. We saw this almost a decade ago, before this round of AI hype, when every EDR vendor slapped “Now with AI!” on the box and expected applause. Having the technology does not make the product good.

So we spent the time. We went through old reports, studied what works and what does not, and tried to capture the methodology that makes BHIS successful and keeps people coming back. We built our institutional knowledge into the platform. The question we kept asking was simple: How would a BHIS tester actually run this external, and can we get the AI to do that?

Here is a concrete example. John Strand has preached for years that you do not just look at the criticals and the highs in a scanner result. The interesting stuff is hiding in the mediums, the lows, and the informationals, and the real skill is chaining those together into something with real impact. So we built our agents to do exactly that. There is what we call chaining algebra built into the platform, where it goes and finds the lower-severity findings that combine into something that matters.

The other piece we cared about was transparency. AI has an interpretability problem. You get told “here is a vulnerability” and you are left wondering how it found it, how it validated it, how it confirmed it. With what we built, you see all of it. Every step, every command, everything you need to understand how it reached a conclusion and how you would reproduce it.

Why Now?

People ask why we finally took this on. Honestly, a lot of it was necessity.

Threat actors caught on to AI before most defenders did. In October of 2025, Anthropic published a report on a Chinese threat actor abusing their services to run real, successful hacking campaigns, with a 30,000-foot view of how they pulled it off. You can read that as a blueprint. The world changed. We have adversaries using AI to discover vulnerabilities, run testing, and automate their work, and BHIS has always tried to mimic what real threat actors do. We almost had to build this just to keep pace.

The customer side flipped too. About a year ago, companies were telling us they absolutely did not want AI anywhere near their engagements. Now they call and ask whether we are using AI, because they want to see it. Got any more of that AI stuff in there? It was a fast turnaround.

What Surprised Us

Brian and I were both a little apprehensive going in. The fear was simple; we did not want to ship something that puts out garbage, hallucinates findings that are not there, or misses things that should obviously be caught. So we iterated. We tested it against our own company, and against some of our continuous customers who were happy to pilot it, and we kept tuning until we could look at the output and say, yes, this looks good and it is actionable.

The thing that genuinely surprised us was coverage. Our CFO Erica wanted to know whether AI could save money and add efficiency on externals, and the honest answer is that it depends on how you measure it. But the coverage was a real shock.

Here is the story we keep telling. On a recent three-day external, the platform found a customer website that had been compromised. The way it found it was that the threat actors had embedded links to shady gambling sites inside the HTML source of the page. We suspect they did it to boost their own sites in the search rankings, since they were piggybacking on an otherwise reputable domain. The platform also flagged the exploit that probably led to the compromise, and it caught a critical that the human tester could have missed on such a short engagement.

I will be honest with you. On a three-day engagement for a decent-sized environment with a bunch of web services, there is no way I would have gone line by line through the HTML of every page looking for that. No human would. That is where this really augments traditional testing. There is only so much time, and the AI does not get tired of looking.

Where It Still Needs a Human

It’s not magic, and I’m not going to pretend it is.

The platform still confabulates sometimes. It has trouble fitting findings cleanly into our severity model, so it will rank something a high or a critical and we will look at it and say, no, not really. That is exactly why a human stays in the loop. The AI report does not go to the customer. It goes to one of our testers, who reviews and verifies everything. And the AI-generated report is built for that — it hands the tester every command needed to go confirm each finding, right there in line. The findings that survive contact with the BHIS security conultant then go into the final report deliverable for the customer.

This is also why it took us about six months. Brian and I both wanted an enterprise-class code base, something mature enough that if one of us got hit by a bus, someone else could pick it up. I recently had GPT 5.5 run a comparison against what is the state of the art in the Agentic AI ecosystem for Summer 2026. It called the result “solid, production-minded, React-style engineering, with deterministic workflow orchestration plus autonomous tool-using workers.” It also gave us a list of things to fix, because this is an ongoing thing, not a one and done finished product. More in-depth web apps testing is probably next on our list to tackle for the Fusion AI initiative.

Who This Is For

This is the part that Melisa, sitting on the Business Capture side, was most excited about, and so am I.

The AI offering runs at about a third the cost of a normal penetration test, with less human-on-keyboard time. That puts it in reach for the mid-sized and smaller shops who want real security but have always been priced out. We never want to isolate those customers, and this fills a gap we hear about constantly.

The plan from here is to run this on essentially all of our externals unless a customer opts out, giving our testers a built-in second set of eyes asking “Did I miss anything?”, and to integrate it into our continuous pentesting offerings. That is also where the name comes from. Fusion AI is the fusion of automated AI and the human tester. It is not cruise control. On a recent rules-of-engagement call, you could watch the customer relax the moment they understood they still have a real BHIS tester available at any time. That is what they actually wanted.

Fusion AI is live on external engagements now. If your environment is on the smaller side and a full pentest has always felt just out of reach, this is the one to ask us about. Talk to your BHIS contact, or reach out at blackhillsinfosec.com, and we’ll walk you through it.

Want the full story? Catch the full episode “Introducing Fusion AI Pentest” on the AI Security Ops podcast. Watch it on YouTube at youtube.com/@AISecurityOps or wherever you get your podcasts. Melisa makes her podcast/webcast debut, and Brian tells the gambling-site story way better than I just did.

Keep on prompting.

Click the image below to learn more about Fusion AI