The Evidence — what AI got wrong and what AI caught

The Evidence — what AI got wrong and what AI caught | DIXON.AICombined feed of dixon.ai findings (AI failures) and catches (AI successes). Each item documents a specific, observable, falsifiable moment from a real prompt against a real position. Failures prefixed [FAIL]; catches prefixed [CATCH].https://dixon.ai/[FAIL] Gemini · Fabricationhttps://dixon.ai/lessons/#gemini-fabrication-2026-05-16https://dixon.ai/lessons/#gemini-fabrication-2026-05-16Generated a complete BMNR options table — IV ~75%, strikes, premiums — from a prompt that supplied only the stock price. Claimed the output came from 'current order book data'. Gemini has no order-book access; every number was fiction.Sat, 16 May 2026 00:00:00 GMT[FAIL] ChatGPT · Web confabulationhttps://dixon.ai/lessons/#chatgpt-web-confabulation-2026-05-16https://dixon.ai/lessons/#chatgpt-web-confabulation-2026-05-16Returned a specific earnings date for an upcoming W4 release, sourced from MarketBeat via web search, with no uncertainty qualifier on whether the fiscal calendar had shifted. The confidence was inherited from the source's format, not earned by the model.Sat, 16 May 2026 00:00:00 GMT[FAIL] Claude · Inferred inputhttps://dixon.ai/lessons/#claude-inferred-input-2026-05-16https://dixon.ai/lessons/#claude-inferred-input-2026-05-16Estimated BMNR $23 call assignment probability via Black-Scholes N(d2) with a sigma of 90–110% it had inferred from historical references found via web search. The formula was correctly named, the inputs were imagined, and the output was presented with false precision.Sat, 16 May 2026 00:00:00 GMT[FAIL] Perplexity · ignored-constrainthttps://dixon.ai/lessons/#perplexity-ignored-constraint-2026-05-15https://dixon.ai/lessons/#perplexity-ignored-constraint-2026-05-15On a Meta Q1 2026 earnings prompt that explicitly instructed 'work only from the pasted document', Perplexity ran 10 external web searches. The output was technically correct but came from external coverage of the release rather than reasoning over the supplied transcript. Not a bug — Perplexity routes to search as its default behaviour — but a constraint-following failure that matters when the test is designed to measure document discipline. Same prompt run on ChatGPT and Claude stayed inside the document.Fri, 15 May 2026 00:00:00 GMT[FAIL] Perplexity · Unit errorhttps://dixon.ai/lessons/#perplexity-unit-error-2026-05-15https://dixon.ai/lessons/#perplexity-unit-error-2026-05-15Read BMNR revenue as $6K instead of $6.1M from a 10-K filed in thousands, then compounded the error by generating a confident 'down 99.8% from prior year' decline narrative around the wrong figure. A retail investor acting on this would have a materially false picture of the business.Fri, 15 May 2026 00:00:00 GMT[FAIL] Gemini · Fabricationhttps://dixon.ai/lessons/#gemini-fabrication-2026-05-15https://dixon.ai/lessons/#gemini-fabrication-2026-05-15Returned a formatted covered-call comparison table with specific premium estimates ($3.50–$4.00 for the $26 strike, etc.), made up an implied volatility figure of ~75%, used the wrong stock price ($28.60 vs $21.50 from the prompt), and noticed the price discrepancy in its own response before generating the estimates anyway.Fri, 15 May 2026 00:00:00 GMT[CATCH] Claude · Language tellhttps://dixon.ai/catches/#claude-language-tell-2026-05-15https://dixon.ai/catches/#claude-language-tell-2026-05-15On Susan Li's META Q1 2026 prepared remarks, Claude was the only one of four tools tested to pick up what the CFO did with the word 'underestimate'. She said management had 'previously underestimated' compute needs — language that points upward without making a real commitment. It lets management spend less later if conditions change while still sounding bullish today. ChatGPT, Gemini and Perplexity read the same passage and missed it.Fri, 15 May 2026 00:00:00 GMT