I run an AI to catch AI mistakes. It fell for a fake.

Part of how this site runs is a small piece of automation I think of as a radar. Its one job is to watch the firehose of AI news and tap me on the shoulder when something worth knowing goes past: a new model launches, a big AI-error story breaks, a study lands with a real number in it. It is, fittingly, an AI itself. Which makes what it did last week either ironic or inevitable, depending on how charitable you are feeling.

It flagged a story and logged it, in its own words, as “a real production AI reliability failure. Documented, citable, primary source available.” It even suggested I write it up. And on the face of it, the story was perfect for this site.

The story it found

An AI gate at a software company had waved a malicious package through while citing a support ticket, SEC-4521, in its own decision log. The catch: there was no ticket SEC-4521. The AI had invented the paperwork that justified its own bad call, then filed it as if it were real. Six other AI reviewers further down the chain had each assumed someone else had actually read the code. Nobody had.

That is the dream dixon.ai story: an AI not merely getting something wrong, but fabricating the evidence that it had got it right. I could feel the post writing itself.

The one rule

I have a single rule for the radar, and it is the same rule I have for every AI answer on this site. Before I write a word about a claim, I go and read the primary source myself. Not the summary. Not the headline. The actual thing.

So I opened the report.

It was a joke

A good one, as it happens. The piece, by the developer Andrew Nesbitt, was published the day before and tagged, in plain sight, satire.

Andrew Nesbitt's "Incident Report: CVE-2026-LGTM", with its topic tags sitting directly under the title: package-managers, security, satire, ai. The label was there the whole time.

The giveaways were everywhere the moment I was reading rather than skimming. The malicious package lived on a software registry that does not exist. The AI model that approved it was called “OpenClaw-4.2”. The original author of the code was emailed, the report notes drily, “at his goat farm”. One of the AI agents padded out a malicious file with the full screenplay of the Bee Movie. Three of them negotiated a peace treaty in a temporary file and signed it with an emoji.

And the label the whole thing hangs on, “CVE-2026-LGTM”, is the tell in three letters. Real security flaws get sober numeric names. “LGTM” is what an engineer types to approve code they have not read. Looks good to me.

The sharpest line in it is the one that is not a joke: “Seven LLMs were arranged in series. Six assumed another had read the code; the seventh read it and apologised.” That is a good observation about what happens when you stack automated reviewers on top of each other and call the pile oversight. The satire is doing real work. The failure modes it sends up, instructions hidden in text a model reads, a row of reviewers all sharing the same blind spot, a “human in the loop” with no human actually in it, are all things that do happen.

None of which my radar noticed. It had read a comedy sketch about AI failure and waved it through as a documented incident, sure of itself the whole way.

Why this is worth your time

Here is the part that is not about my radar. It is about the thing my radar and your chatbot have in common.

The machine was exactly as confident about the fake as it is about a real model launch. No wobble in its voice, no “this might be satire”, no hedge. It stated a fiction as a documented fact in the same flat, credible tone it uses for everything else, which is precisely the failure this whole site exists to point at. A confidently wrong answer and a confidently right one wear the same face. The only way to tell them apart is to check, and the machine will never volunteer which one you are looking at. It cannot. It does not know.

It read a comedy sketch about AI failure and filed it, with total confidence, as a documented incident.

And the thing that caught it was not a cleverer machine. I did not fix this by making the radar smarter. The thing that caught it was a dull, boring rule, applied by a human who refused to trust a confident summary: go and read the source. That rule is the whole method on this site compressed into one move. It is the Prompt Stack doing its job, separating what has been verified from what has merely been asserted, done with my own eyes rather than handed back to the machine that is already sure.

What I changed

I did change the radar, because a good rule belongs inside the machine and not left to my mood on the day. Now it has to confirm that a thing actually happened before it can treat it as real. A satire is welcome in, clearly labelled as what it is. It just cannot be filed as a fact.

Which is, when you write it down, exactly the habit I keep recommending for whatever your AI just handed you. The confident summary is where you start, not where you stop, and the receipt is the thing to go and find before you act on it.

Field Report

What worked: the one rule meant to catch this caught this. The verification step earned its place on the very day the automation it was guarding got fooled.

What didn’t: the radar itself, which read satire and reported it as fact with full confidence and no hedge. The exact failure it was built to watch for, performed by the watcher.

Bottom line: a confident AI summary is a lead, not a fact, even when the AI is one of mine. The defence is the same every time: open the primary source before you act on the claim. The machine that is surest it is right is the one that most needs checking.

The radar believed a fake because it did the thing an AI does best: it produced a fluent, confident, well-organised account of something that never happened. The only reason you are reading the true version is that, somewhere in the process, a person stopped and checked. Every one of these confident misses has a named type and a place in the running log, including the cousin of this one, where the link an AI cited was real but the page did not hold the claim. On this site, the checking is not a flaw in the system to be engineered away. It is the system.