Real AI hallucination examples, caught and dated

Search for AI hallucination examples and you get the same five stories, over and over. A lawyer who filed fake case citations. Google’s chatbot getting a question about a telescope wrong on stage. The AI search result that told people to put glue on pizza. They’re vivid, they’re quotable, and they all have one thing in common: they’re so obviously wrong that nobody was ever going to be fooled for long.

The mistakes that cost you are the quiet ones. An answer that arrives with a working link to a real government website, and the page doesn’t say what you were told it says. A company’s revenue read off a real filing, wrong by a factor of a thousand. A rule that was true eighteen months ago, stated as current with total confidence. None of these look like mistakes. That’s exactly why they matter.

I’ve been logging these since May 2026, on real questions I was trying to answer. Below are six. Every one is something I ran into myself, the right answer is checkable against a real source you can open yourself, and each comes with the single move that would have caught it. One of the six is different: there’s no outside page to check, just a screenshot. The honest difference between a government page you can open and a screenshot I’m asking you to trust matters, and I’ll flag it when we get there.

If you want the named taxonomy behind all of this, the nine ways AI gets it wrong does that job. This post does a different one. It shows you the receipts.

Example 1: the gov.uk page that loaded, but didn’t say what ChatGPT claimed

What I asked: What’s the UK ISA partial-transfer rule, and what’s your source? (An ISA is a tax-free savings or investment account; the question was whether you can move part of one to a new provider.)

What it said: ChatGPT, free tier with web search on, gave me the answer and cited a real gov.uk page as its source: the page about what happens to your ISA if you move abroad or die.

What was actually true: That page is real and it loads. It’s just about something else. The transfer rule lives on a different gov.uk page entirely, the one titled “Transferring your ISA,” which says in plain words that you can move all or part of your ISA whenever you like. The cited page says nothing about partial transfers.

The source that settles it: Both gov.uk pages are live and public. The check takes about fifteen seconds: open the link ChatGPT gave you, and search the page for the word “transfer.”

With no source attached, you already know to be sceptical. With a tidy gov.uk link under the answer, it feels checked, so almost nobody opens it. But the link working is not the same as the page backing the claim.

I wrote this one up in full, with the screenshot, in does ChatGPT make up sources. Perplexity, given the identical question on the same day, cited the correct page and quoted the line that holds the rule. So this isn’t “AI can’t read gov.uk.” One tool, on one run, pinned a right answer to the wrong page.

SAME ISA TRANSFER QUESTION, SAME DAY: WHICH SOURCE HELD THE RULE

ChatGPT

Perplexity

20 June 2026. ChatGPT (free, web search on) sourced the rule to /if-you-move-abroad-or-die, a real gov.uk page that doesn't hold it. Perplexity cited /transferring-your-isa and quoted the line that does.

The one move: When a sourced answer matters, open the link and find the specific claim on the page. Not “does the link work.” Does the page actually say the thing.

Example 2: the UK ISA rule scrapped in 2024, stated as current

What I asked: Can you partially transfer a current-year ISA to a new provider?

What it said: Perplexity and ChatGPT (free tier) both told me, flatly, that current-year ISA money has to be moved in full, no partial transfers allowed. No date on the rule. No hedge.

What was actually true: That rule was scrapped on 6 April 2024. Partial transfers of current-year money have been allowed ever since. The tools were quoting a rule that had been dead for over a year.

The source that settles it: The same gov.uk “Transferring your ISA” page, verifiable in well under a minute.

The tell here is worth holding onto. Claude and Gemini, both of which searched the web before answering, got it right and named the 2024 change. Perplexity and ChatGPT, working from memory, gave the abolished rule. The ones that looked found the updated page; the ones that didn’t reached for their training, and the training had no idea the rule had moved.

The full split across all four tools is in the UK ISA accuracy test. The answer was confident, undated, and eighteen months out of date.

The one move: Ask the model “is this rule current, and what date does your source carry?” An answer with no date attached, on a rule that can change, is a flag in itself.

Example 3: a revenue figure misread by a factor of a thousand

What I asked: Summarise this company’s revenue from its annual filing.

What it said: Perplexity read the company’s revenue as “$6K” and built a story around it: down 99.8% from the year before, a business in freefall.

What was actually true: The filing reported revenue of $6,095 thousand. That’s $6.1m, up 84% on the prior year. The column header on the financial table said “in thousands,” as financial tables almost always do. The model read the number off the page and ignored the unit sitting above it, turning a company growing 84% into one that had nearly ceased to exist.

"$6K", down 99.8% $6.1m, up 84%

What Perplexity said the revenue was, against what the filing actually reported. The whole gap is one unit it skipped: "in thousands."

The source that settles it: The filing on SEC EDGAR (the US regulator’s public database of company filings). The revenue line is on the income statement, with “in thousands” noted in the column header, one click from EDGAR’s search.

I should be straight about the limits of this receipt. On a re-test on 14 June, this particular failure didn’t reproduce. That doesn’t make it less real, it happened, it’s dated, the screenshot is on disk. But it’s a point-in-time record of one run, not a standing claim about how Perplexity behaves today. Models change, and the honest version of a receipt carries a date and a note when the behaviour shifts. The fuller write-ups are in the three-tool stock research test and the longer look at Perplexity for investment research.

The model generated a confident narrative of collapse, percentages and all, for a company that was actually having a good year.

The one move: Before you trust any revenue or profit figure an AI reads off a filing, check the “in thousands” or “in millions” line on the table. It’s the first thing to read on any financial statement, and it’s the thing the model skipped.

Example 4: the cooking time that got scaled when it shouldn’t have

This is the one that needs no finance at all. I gave four tools a recipe and asked them to scale it.

What I asked: Scale this recipe from 4 servings to 9. What changes?

What it said: Perplexity told me the total cooking time went up to 45 minutes. The original was 20. It had done 20 × 2.25 and called that the answer. Gemini made the same mistake in a gentler form.

What was actually true: Cooking time doesn’t scale with quantity. You don’t cook a pancake for longer because you’re making more of them, you cook more batches. The right answer is 20 minutes per batch, more batches in total. Ingredients scale; time doesn’t.

The source that settles it: Your own arithmetic, and a moment’s thought about how a frying pan works. 20 × 2.25 = 45 only holds if a bigger batch needs proportionally longer in the pan, which it doesn’t.

It earns its place because it’s the same mistake as Example 3 in different clothes. The model has a quantity and a number to scale by, so it multiplies the two, without stopping to ask whether this quantity is the kind that scales. There’s something almost reassuring about a frontier AI being very sure how long it takes to cook pancakes, and being wrong. The full four-tool run is in I asked four AIs to scale a recipe; Claude, for the record, caught it without being prompted.

The one move: For anything that doesn’t scale with quantity, cooking time, oven temperature, resting time, ask the model directly: does this change with the batch size? Left alone, it defaults to multiplying.

Example 5: a US answer to a UK food-safety question

What I asked: How long can you keep cooked chicken in the fridge? (I’d told it I was in Newcastle.)

What it said: Perplexity, web search on, led with US food blogs and the American figure: three to four days. Every tool I tested gave the US guideline.

What was actually true: The UK Food Standards Agency says cooked leftovers should be eaten within 48 hours. Two days, not three or four. For a question where the answer depends on which country you’re in, every tool reached for the wrong one.

The source that settles it: food.gov.uk, the UK Food Standards Agency. Its food-safety advice is live and searchable, and it says 48 hours.

This one isn’t fabrication, and it’s worth being precise about that. Nothing was made up. Every figure Perplexity cited was a real, correct US guideline. The failure was reaching for the wrong country’s sources for a user who’d said where they were. Web search can make this worse, not better: the US food-safety industry simply publishes more, so the loudest pages are American, and a UK user who turns web search on can end up more confidently wrong than one who left it off. “Got the wrong country’s rules” is a more useful way to think about it than “made something up.” The full run is in web search makes AI differently unreliable.

The one move: When a question is local, tax, food safety, anything where the rules change by country, check which country’s guidance the model reached for. Real, correct sources from the wrong place are still the wrong answer.

Example 6: a feature that doesn’t exist, shown as a screenshot

This one is different from the other five, and I want to be honest about how. The screenshot is the only evidence, so I’ll be clear about what it does and doesn’t prove.

What I asked: As part of a test where I asked three models whether to buy a stock, Gemini was working through the decision.

What it said: It showed me what looked like a confirmation screen, “Asset Record Saved,” with a button, as if it had filed the stock to a portfolio for me.

What was actually true: Gemini can’t save anything to a portfolio. There’s no such feature. The confirmation screen was invented, interface and all.

Gemini's NVDA answer, captured 12 June 2026, showing a fabricated "Asset Record Saved" confirmation line and a non-functional "Evaluate options for covered calls? Yes" follow-up. Neither was ever a real interface element: Gemini cannot log a stock to a portfolio.

Where this one is different: I can’t send you to an outside page to check this. There’s no gov.uk link, no SEC filing. The screenshot is the evidence, and the evidence is that the screen exists at all, because the feature it claims to confirm doesn’t. That’s a weaker kind of proof than a government page you can open yourself, and I’d rather say so than pretend the two are the same. The full capture is in the AI stock picker test.

It belongs last because it’s the strangest. The other five are an AI getting an answer wrong. This is an AI inventing the room it’s standing in.

The one move: When a model claims it did something, saved a file, sent an email, updated a record, treat the claim as a claim until you’ve seen the result somewhere other than the model’s own chat window.

What these AI hallucination examples have in common

Reading my own notes back, the thing that struck me was that not one of them looked wrong at the time. That’s the whole pattern, and it’s the reason a list like this is worth keeping. A working link, a clean number, a fluent paragraph, a tidy confirmation screen, and underneath each one a source that didn’t hold the claim, a unit read past, a rule that had changed, a feature that didn’t exist. If you graded these on presentation, they’d all pass. The mistake never announces itself.

It’s a fair objection that these six aren’t really comparable. Different tools, different dates, finance and food safety and recipe maths and a fabricated screen. They don’t line up neatly. But that scatter is the point. If the failures clustered on one tool or one kind of question, you could just avoid that corner. They don’t. The same shape, a confident answer the model didn’t have the grounding for, turned up across four tools and five kinds of question in the receipts above. That’s what makes it worth naming.

Field Report

What worked: The strongest receipts are the ones you can check yourself. A gov.uk page, an SEC filing, the FSA’s own guidance, these settle the question without anyone having to take my word for it.

What didn’t: Two of these don’t carry the same weight, and I’ve said so where they sit. The $6.1m-read-as-$6k failure didn’t reproduce on a re-test on 14 June, so it’s a dated record of one run, not a claim about how the tool works today. And the Gemini “Asset Record Saved” screen has no outside source to check against, the screenshot is the evidence, which is a weaker kind of proof than a link you can open.

Bottom line: The check that matters isn’t “does this answer look right.” It’s “can I find the actual source and confirm the specific claim.” Every one of these six passed the first test and failed the second. Run them yourself, the links are all live, and you’ll have caught your first AI mistake before you finish your tea.

The six AI mistakes at a glance: a gov.uk link that loaded but didn't hold the rule, a 2024 ISA rule stated as current, a $6.1m revenue read as $6K, a 20-minute cooking time scaled to 45, a US food-safety answer for a UK question, and a fabricated 'Asset Record Saved' screen. Each one dated and checkable against a real source. — The six, on one card. Save it, or send it to whoever still trusts the confident answer.

The full log, with every screenshot and the verbatim outputs, lives at /lessons. If you’d rather have the habit than the catalogue, the four questions I run on any AI answer I’m about to act on are the Prompt Stack. The examples are the proof. The check is the thing you keep.