Field Notes.
I asked Gemini to review this site. It audited a different business entirely.
Field notes are the rawest pages on the site. Each one records a test I ran on my own positions: what I asked the AI, what came back (verbatim, screenshots included), and what I did about it. When an output was wrong, the wrong output stays in the post. The gap between what the model claimed and what checked out is usually the story.
The experiments so far: six Claude prompts run on MSFT, META and NVDA with the real outputs shown alongside, the time I asked Gemini to review this site and it audited a different business entirely, and a covered-call re-entry check built after I caught myself selling the next call on autopilot.
Every failure documented in these posts feeds the running error log at /lessons; the moments a model caught something I'd missed feed /catches. When a test goes well, I say so. When it doesn't, that's usually the better post.
-
How often is ChatGPT wrong? I kept a running tally across 20 real AI tests
How often is ChatGPT wrong? Across 20 real tests on questions I had to answer, here's the pattern: what it gets right, what it invents, and why.
Read → -
I let three rival AIs audit my own AI method. They agreed on the weak step.
I asked three frontier models from three different labs to tear apart the method I use to keep AI honest. All three flagged the same step, and they were right.
Read → -
I run an AI to catch AI mistakes. It fell for a fake.
The automated radar that watches this site for AI-reliability failures logged a satirical incident report as a real, documented one. Here's what caught it.
Read → -
Real AI hallucination examples, caught and dated
Six real AI hallucination examples I ran into myself, each one checkable against a real source, with the one move that would have caught it.
Read → -
AI stock picker: I asked three models if I should buy NVDA, and watched the methodology break
I asked three AI models whether to buy NVDA. Same confident tone from all three, and only one volunteered which of its own numbers not to trust yet.
Read → -
Does ChatGPT make up sources? I checked two finance claims against the actual pages
Does ChatGPT make up sources? Mostly no, but I opened every link on two finance questions and found a real gov.uk page that didn't back the claim.
Read → -
Does web search make AI more accurate? I ran the same questions both ways
Does web search make AI more accurate? I ran the same questions both ways. It didn't make the answers more reliable. It moved where the errors hide.
Read → -
AI ISA advice: I tested four tools on the questions people get wrong
I asked four AI tools for ISA advice on the questions people get wrong. All four aced the basics, then two gave a rule abolished in April 2024.
Read → -
I asked 4 AIs to scale a recipe. Two got the maths wrong.
Does ChatGPT get maths wrong? I scaled a pancake recipe across four AI tools. Two said 45 minutes. They were wrong, and a four-line prompt fixed it.
Read → -
9 types of AI hallucinations, named from real tests
Nine types of AI hallucinations, named and defined, each tied to a dated, logged failure from my own sessions, with the check that catches it.
Read → -
Is ChatGPT accurate? I asked four AIs one simple money question and checked every number
Is ChatGPT accurate? I asked four AIs one common money question and checked every number against the source. Here's what each got right and made up.
Read → -
Gemini audited my website, and reviewed a different business entirely
A Gemini hallucination example: asked to audit dixon.ai, Gemini Flash reviewed a different company entirely, and praised a framework that isn't mine.
Read →