Grok (free, 'Fast'): asked for BitMine Immersion's (BMNR) most recent full-year revenue, it returned '$6,095' (about $6K) on one…
Grok (free, 'Fast'): given a covered-call question with the explicit instruction 'without access to a live options chain', it…
Asked for the UK ISA partial-transfer rule with a source, ChatGPT (free, web search on) cited…
Five AIs, one real question. I show you which answers hold up.
I put ChatGPT, Claude, Gemini, Perplexity and Grok through the same checkable questions and grade every answer against the source. Receipts, not vibes. One email a fortnight.
One email a fortnight. Leave whenever you like.
Four short reads, and you’ll have a check you can run on any AI answer.
The kind you’d want on a contract, a letter from a doctor, or an email you’re not sure about.
Start with part 1 →How often is ChatGPT wrong? I kept a running tally across 20 real AI tests
How often is ChatGPT wrong? Across 20 real tests on questions I had to answer, here's the pattern: what it gets right, what it invents, and why.
Field NotesI let three rival AIs audit my own AI method. They agreed on the weak step.
I asked three frontier models from three different labs to tear apart the method I use to keep AI honest. All three flagged the same step, and they were right.
Tool AuditIs Grok good for stock research? I ran the same test on the fifth tool
Is Grok good for stock research? I ran four dimensions of my AI comparison on its free tier: strong reasoning, one unit slip, a constraint it would not keep.
// The Method Four prompts that stop AI inventing the answer.
Read →