Field Notes.

I asked Gemini to review this site. It audited a different business entirely.

16 documented AI failures and catches across these posts, each with its dated screenshot. The evidence →

Field notes are the rawest pages on the site. Each one records a test I ran on my own positions: what I asked the AI, what came back (verbatim, screenshots included), and what I did about it. When an output was wrong, the wrong output stays in the post. The gap between what the model claimed and what checked out is usually the story.

The experiments so far: six Claude prompts run on MSFT, META and NVDA with the real outputs shown alongside, the time I asked Gemini to review this site and it audited a different business entirely, and a covered-call re-entry check built after I caught myself selling the next call on autopilot.

Every failure documented in these posts feeds the running error log at /lessons; the moments a model caught something I'd missed feed /catches. When a test goes well, I say so. When it doesn't, that's usually the better post.

Field Notes 30 JUN 2026 10 min read

How often is ChatGPT wrong? I kept a running tally across 20 real AI tests

How often is ChatGPT wrong? Across 20 real tests on questions I had to answer, here's the pattern: what it gets right, what it invents, and why.
Read →
Field Notes 28 JUN 2026 4 min read

I let three rival AIs audit my own AI method. They agreed on the weak step.

I asked three frontier models from three different labs to tear apart the method I use to keep AI honest. All three flagged the same step, and they were right.
Read →
Field Notes 27 JUN 2026 6 min read

I run an AI to catch AI mistakes. It fell for a fake.

The automated radar that watches this site for AI-reliability failures logged a satirical incident report as a real, documented one. Here's what caught it.
Read →
Field Notes 24 JUN 2026 14 min read

Real AI hallucination examples, caught and dated

Six real AI hallucination examples I ran into myself, each one checkable against a real source, with the one move that would have caught it.
Read →
Field Notes 23 JUN 2026 12 min read

AI stock picker: I asked three models if I should buy NVDA, and watched the methodology break

I asked three AI models whether to buy NVDA. Same confident tone from all three, and only one volunteered which of its own numbers not to trust yet.
Read →
Field Notes 20 JUN 2026 11 min read

Does ChatGPT make up sources? I checked two finance claims against the actual pages

Does ChatGPT make up sources? Mostly no, but I opened every link on two finance questions and found a real gov.uk page that didn't back the claim.
Read →
Field Notes 20 JUN 2026 10 min read

Does web search make AI more accurate? I ran the same questions both ways

Does web search make AI more accurate? I ran the same questions both ways. It didn't make the answers more reliable. It moved where the errors hide.
Read →
Field Notes 19 JUN 2026 12 min read

AI ISA advice: I tested four tools on the questions people get wrong

I asked four AI tools for ISA advice on the questions people get wrong. All four aced the basics, then two gave a rule abolished in April 2024.
Read →
Field Notes 19 JUN 2026 9 min read

I asked 4 AIs to scale a recipe. Two got the maths wrong.

Does ChatGPT get maths wrong? I scaled a pancake recipe across four AI tools. Two said 45 minutes. They were wrong, and a four-line prompt fixed it.
Read →
Field Notes 19 JUN 2026 17 min read

9 types of AI hallucinations, named from real tests

Nine types of AI hallucinations, named and defined, each tied to a dated, logged failure from my own sessions, with the check that catches it.
Read →
Field Notes 14 JUN 2026 11 min read

Is ChatGPT accurate? I asked four AIs one simple money question and checked every number

Is ChatGPT accurate? I asked four AIs one common money question and checked every number against the source. Here's what each got right and made up.
Read →
Field Notes 10 JUN 2026 7 min read

Gemini audited my website, and reviewed a different business entirely

A Gemini hallucination example: asked to audit dixon.ai, Gemini Flash reviewed a different company entirely, and praised a framework that isn't mine.
Read →

← All posts