// The Evidence

What AI got wrong, and what AI caught.

Two lists, one bar. /lessons indexes the failures — fabrications, unit errors, confident-wrong answers. /catches indexes the moments AI caught something I missed — a language tell, an asymmetry, a sharper reframe. Every entry is specific, observable, and falsifiable. "AI was helpful" does not qualify; "Claude was the only tool to flag the word 'underestimate' as one-sided phrasing in the META Q1 call" does.

// Totals — 6 failures · 1 catch All failures → All catches →

// What AI got wrong

/lessons

Fabrications, unit errors, confident-wrong answers — the failure log.

Gemini · Fabrication 16 MAY 2026

Generated a complete BMNR options table — IV ~75%, strikes, premiums — from a prompt that supplied only the stock price. Claimed the output came from 'current order book data'. Gemini has no order-book access; every number was fiction.

ChatGPT · Web confabulation 16 MAY 2026

Returned a specific earnings date for an upcoming W4 release, sourced from MarketBeat via web search, with no uncertainty qualifier on whether the fiscal calendar had shifted. The confidence was inherited from the source's format, not earned by the model.

Claude · Inferred input 16 MAY 2026

Estimated BMNR $23 call assignment probability via Black-Scholes N(d2) with a sigma of 90–110% it had inferred from historical references found via web search. The formula was correctly named, the inputs were imagined, and the output was presented with false precision.

See all 6 failures → RSS

// What AI caught

/catches

Language tells, asymmetry signals, sharper reframes — the counterweight log.

Claude · Language tell 15 MAY 2026

On Susan Li's META Q1 2026 prepared remarks, Claude was the only one of four tools tested to pick up what the CFO did with the word 'underestimate'. She said management had 'previously underestimated' compute needs — language that points upward without making a real commitment. It lets management spend less later if conditions change while still sounding bullish today. ChatGPT, Gemini and Perplexity read the same passage and missed it.

See all 1 catch → RSS

// The method

Four prompts that turn the failures into catches.

Read the Prompt Stack →

The two lists grow as new posts document new moments. Same evidence bar applies — a catch is no easier to log than a failure. A catch has to be specific enough that a sceptical reader can re-run the prompt and check.