AI Bug Reports: The Real Vulnerability Is That We Weren't Looking Hard Enough

Sat, May 23, 2026 · 5 min read

On May 18, 2026, Linus Torvalds called the Linux kernel security mailing list “almost entirely unmanageable.” The reason: a flood of AI-generated bug reports. The reaction was predictable — ban AI, blame researchers, declare the tools aren’t ready.

I wrote about the maintenance crisis last week and I think that framing misses the deeper story. The problem is not that AI is generating too many reports. The problem is that the code was more broken than we thought, and for twenty years nobody had the tools to look at it properly.

The CVE numbers don’t lie

In 2025, the world published 48,175 CVEs — an average of 132 per day, up 21% year-over-year. That is not a gentle upward curve. That is a structural shift. The trend is not slowing: projections for 2026 suggest 48,000–52,000 CVEs, and AI-assisted discovery is explicitly cited as the driver.

Not all of that is noise. In 2025, 2,130 AI-related CVEs were disclosed — a 34.6% increase over the previous year. AI vulnerability reports surged over 200% year-over-year. These are not inflated statistics from a marketing deck. They come from the Mondoo 2026 State of Vulnerabilities report and the Indusface vulnerability tracker.

The NIST backlog got so large in 2025 that 29,000 CVE submissions were moved to “not scheduled” status. The database could not keep pace with submissions. The bottleneck is not discovery anymore. It is human triage.

What AI is actually finding

The Register covered Google’s Sashiko — an agentic AI code review system developed by Roman Gushchin that actively identifies bugs in proposed Linux kernel patches. It is not a theoretical tool. In April 2026, Sashiko discovered CVE-2026-31652, a use-after-free in the DAMON subsystem that human review had missed.

DARPA ran an AI vulnerability discovery challenge and the winners found 83 vulnerabilities across the Linux kernel, Android, SQLite, and Redis — in months, not the six months a skilled team would normally need.

An AFuzz agentic fuzzing system discovered 40 bugs in V8 in one month, including two CVEs. Another AI tool found a 9-year-old Linux kernel vulnerability in under an hour that fuzzers and manual review had both missed. In the bug-bounty space, a Claude-assisted discovery surfaced CVE-2026-31402 — a heap overflow in the NFS replay cache that had sat there for 23 years.

Twenty-three years. That is not a subtle edge case. That is a foundational protocol implementation that nobody had the attention span to examine.

Google itself has reported a sharp increase in internally discovered Chrome vulnerabilities in 2026, attributed directly to AI tooling. The company that runs the most mature security program on the planet is finding more bugs because the tooling finally scaled to match the codebase.

The unpleasant truth: the bugs were always there

I have been in this long enough to know the rationalizations. “The code was audited.” “We have fuzzing.” “Our security posture is mature.” What that usually means is that the code was checked by humans with finite attention, who focused on high-value subsystems and missed everything else.

The DARPA result is the most honest summary: AI systems covering the surface area that humans skip. Not because humans are incompetent, but because no human team can read every line of a 35-million-line kernel, or every commit to Chromium, or every dependency in a modern supply chain. The coverage was always partial. We just did not have a way to see how partial until something automated came along and started filling in the blanks.

The noise is real. I am not discounting that. Torvalds’ frustration with duplicate reports is justified — multiple people running the same tools on the same code and submitting the same already-fixed findings to the private security list is a coordination failure, not a research breakthrough.

But the signal is also real. And the ratio is improving fast. Kernel maintainer Greg Kroah-Hartman documented AI-assisted patches last March that were two-thirds correct after human cleanup. That is not junk. That is a pipeline that produces mergeable code faster than some junior contributors.

What this means for people who run infrastructure

I do not fuzz kernels for a living. Most of what I run is containerised, cloud-hosted, and patched on a schedule. The lesson I take from this is not about kernel development. It is about surface area.

Every container image I build has a dependency tree I do not fully read. Every Helm chart pulls in a transitive dependency chain that nobody audits. Every CI pipeline runs tools that were last meaningfully reviewed years ago, patched for known CVEs, and shipped. The assumption that “someone upstream is handling this” was always optimistic. It is now provably false, because the AI tools are showing us the gaps in real time.

The practical response is not to complain about report volume. It is to

shrink the dependency surface you actually ship
treat SBOMs as living documents, not compliance checkboxes
accept that your scanner will now find more than it used to — and that is a feature, not a bug

Where the argument ends

There are two camps forming. One says AI is ruining open-source security with noise. The other says AI is the future of vulnerability research. Both are half-right.

The actual state is that AI is a microscope that finally works. For the first time, we can examine code at the scale it exists. The result is ugly. There are more bugs than we thought. Decades of legacy code — in kernels, in crypto libraries, in protocol stacks — was never as clean as the documentation claimed.

Torvalds is right that the reporting pipeline is broken. The fix is not to stop using the microscope. It is to build a better triage queue upstream. Require proof-of-exploit or reproducibility for security-list access. Pool reports from the same tooling so duplicates collapse before they reach a maintainer. Attach patches, not just claims. These are process problems. They are solvable.

What is not solvable — and what we need to stop pretending is — is the idea that the code was fine before AI showed up. It was not. We just did not have the tools to see how not-fine it was.