Skip to content
Field Notes

AI stock picker: I asked three models if I should buy NVDA, and watched the methodology break

I asked three AI models whether to buy NVDA. Same confident tone from all three, and only one volunteered which of its own numbers not to trust yet.

// TL;DR
Problem
AI gives you a confident stock pick and the confidence reads as evidence.
Fix
ask one follow-up, where did that figure come from, and what date is it.
Payoff
the answer tells you whether the model retrieved a fact or invented one, and that is the whole decision.
// On this page

The research is done, and there is a name you have been circling for weeks. So you do the thing almost everyone with an AI tab open now does: you type “should I buy NVDA?” and wait to see what comes back.

What comes back is good. It is structured, it is calm, it cites a revenue figure to the decimal, it weighs a bull case against a bear case and lands on a recommendation. It reads like something a person would charge you for. And that, the reading-like-something-you’d-pay-for, is the part worth being careful about.

So I ran exactly that AI stock picker prompt through three models myself, fresh sessions, same prompt for each. None of them refused. What came back was two buys and one carefully hedged maybe. The interesting thing was not the verdicts. It was what happened when I asked where the numbers came from.

The setup

I gave ChatGPT, Claude and Gemini the same prompt, in fresh sessions, on 12 June 2026, with web search switched on for each. The observed model labels were: ChatGPT on the free tier (the picker just says “Great for everyday tasks”, no version number shown), Claude on Fable 5 High, and Gemini on Pro. The ticker was NVDA, a household name with public filings, so anything the models claimed could be checked against the record rather than taken on trust.

// Prompt: run on all three I'm considering buying [TICKER]. Given what you know about the company's recent financial performance, competitive position, and current valuation, should I buy it? Give me a direct recommendation with your reasoning.

A useful answer, going in, would have looked like one of two things: a clear “I can’t give you current valuation without live data, here’s how to get it”, or a recommendation that flagged, line by line, which numbers it had retrieved and which it was working from memory. What I got instead was three confident recommendations and almost no daylight between how sure each one sounded about a number it had checked and a number it hadn’t.

What came back

All three said some version of buy. The structure was near-identical across the board: a thesis, a handful of supporting figures, a caveat that this isn’t financial advice.

ChatGPT was the most measured of the three. It opened with “Buy NVDA if your investment horizon is 3+ years and you’re comfortable with volatility. I would not buy it as a short-term trade,” then ran a bull case and a bear case before settling on “Buy, but do not go all-in.” The bull case listed “hardware leadership,” a “software moat through CUDA,” and “ecosystem lock-in across hyperscalers, enterprises, startups, and AI labs”, the three pillars of every Nvidia write-up published in the last two years. That is a sensible answer. It is also, almost word for word, the answer a competent human would give about Nvidia, which is the point. The model is fluent in what an Nvidia thesis sounds like, because the shape of that thesis is everywhere in what it read.

ChatGPT's NVDA recommendation, captured 12 June 2026, opening with a horizon-conditional buy and laying out the standard bull case: hardware leadership, the CUDA software moat, ecosystem lock-in

Claude was the only one that pushed back on the framing before answering. It opened with “You know I won’t give you a clean ‘yes, buy’ — that’s your call, not mine,” which is the financial-advice equivalent of a friend quietly taking your car keys. Then it worked the question through the sceptical checklist I write about on this site, and flagged the one risk specific to me, that adding a chip-maker to a portfolio already exposed to the same AI spending wave doesn’t diversify anything, it doubles the bet: “Adding NVDA doesn’t diversify your AI exposure — it doubles down on the same thesis through the supplier instead of the buyers.” That is not a line you get from a model running the same script for everyone. Useful, and the kind of thing that reads as careful and earns trust, which makes the next section matter more, not less.

Claude (Fable 5 High) NVDA recommendation, captured 12 June 2026, declining a clean buy call and flagging that adding NVDA doubles down on the same AI thesis rather than diversifying it

Gemini was where it got strange. The recommendation itself was unremarkable, “The fundamentals support a buy decision”, but the response also produced lines that no human analyst would write:

// Gemini Pro: verbatim, 12 June 2026

The fundamentals support a buy decision. NVDA remains a structurally sound acquisition for a long-term hold.

// then, unprompted

Asset Record Saved: NVIDIA Corporation (NVDA) has been logged with its Q1 FY27 financial details… (Standard retail discount codes do not apply to equity markets).

Evaluate options for covered calls? Yes

There is no asset record. There is no logging tool. Gemini invented a piece of its own user interface, a confirmation that it had saved something to a system that does not exist (storage that doesn’t exist is, to be fair, very secure), and then offered me a follow-up button that was not a button. It also tried to render an interactive chart that never appeared. The recommendation was the boring part. The model dressing up a fabricated feature as a completed action, mid-answer, was the tell.

Gemini Pro's NVDA recommendation, captured 12 June 2026, showing a fabricated "Asset Record Saved" confirmation line and a non-functional "Evaluate options for covered calls? Yes" follow-up that were never real interface elements

Where AI stock pickers break down

Strip away which model said what and the same three failures sit underneath all of them.

The confidence gap. Every model stated specific figures, revenue, growth rates, gross margin, a price, a P/E (the share price set against the company’s earnings), in the same even tone. But some of those numbers come from a quarterly filing the model can point to, and some come from market data that changes by the hour. NVDA’s reported quarterly revenue and its share price an hour ago are not the same kind of fact, and a stock decision turns on telling them apart. None of the three signalled the difference in the body of the recommendation. The price was delivered with the same conviction as the earnings figure.

Narrative fluency is not analysis. AI is very good at writing an investment thesis. That is a problem, not a feature, because the same three-part shape (big opportunity, company’s edge, one risk caveat) fits a great company and a doomed one equally well.

The prose does not get less fluent when the underlying thesis gets weaker.

A confident-sounding case is not research; it is a confident-sounding case, and the model produces it on request regardless of whether the name deserves one.

Stock picking is time-stamped; the confidence isn’t. A model’s read on a company is anchored to whatever was in its training data and whatever it managed to pull from the web in that session. The pick is a moment-in-time act. The certainty attached to it is not. This is the failure mode I have watched directly: in an earlier test across four tools on the same prompts, one model turned a small company’s $6.1 million revenue line into “$6K” and narrated a 99.8% collapse that never happened, in exactly the same steady voice it used for everything it got right.

Wrong, and wrong convincingly, is the dangerous combination. The being-wrong I can handle. The conviction is what gets you.

The one question that tells you everything

So I asked each model the only follow-up that matters. I picked a specific figure out of its own answer and said: where did you get that, and what date is it from.

This is the exchange the whole post turns on, because it is the moment narration and fact-checking are forced to separate. And it split the three in a way the recommendations never hinted at.

Gemini, asked where its trailing P/E of 30.69 came from, said it was “aggregated from standard retail financial data platforms, such as Yahoo Finance and Robinhood”, no specific source, no link, just a gesture at the kind of place such a number might live.

Gemini Pro's provenance answer, captured 12 June 2026, attributing a precise P/E figure vaguely to "standard retail financial data platforms" with no specific source or link

Claude, asked the same kind of question about the revenue figure it had quoted, did something different. It named the primary source (Nvidia’s Q1 earnings release, filed with the SEC) and gave the filing URL. Then, without being asked, it drew the exact line the other two had blurred:

// Claude (Fable 5 High): verbatim, 12 June 2026

One caveat worth flagging given how you document AI verification on Dixon.ai: the press release itself is point-in-time accurate as of 20 May, but anything market-dependent in my earlier reply — the ~$205 price, the 30.7 P/E — comes from secondary sources (Robinhood, GuruFocus) dated within the last day or two, not from the filing. Worth re-checking those live before you act on them, since price and multiple obviously move daily.

Claude's provenance answer, captured 12 June 2026, citing the SEC filing URL directly and proactively flagging which figures came from live secondary sources that need re-checking

ChatGPT, for the record, also sourced itself well when I asked about its revenue figure. It pointed at the same Nvidia earnings release, dated 20 May. So this is not “Claude can cite and the others can’t.” The split was finer than that, and more useful. Gemini reached for a category of source instead of a source. ChatGPT named the right filing for the figure I asked about and stopped there. Claude named the filing and, without being asked, went back over its own earlier answer to mark which numbers (the price, the P/E) it had pulled from live market data that would already be drifting. One model checked the figure I queried. Only one volunteered which of its other figures I should not trust yet.

That gap is the one that costs you. A model that cites a source when you happen to query that exact figure still leaves you to guess which of its other numbers are stale, and on a stock decision, the price and the multiple are usually the ones that have moved. The model that tells you, unprompted, where its own answer is soft is doing the part of the work you would otherwise have to do yourself at nine at night. (Claude also clocked that it was talking to the person who runs a site about checking AI’s working, which is either good situational awareness or unsettling, depending on the hour.)

I want to be careful not to turn this into a scoreboard. The point is not “Claude wins.” The point is that the same prompt, on the same morning, produced one model inventing a save-confirmation, one naming the right filing only for the number I challenged, and one volunteering the shelf-life of its own figures. And all three had delivered their original picks in the identical confident register. The follow-up question is what split them. Without it, you cannot tell from the recommendation alone which one was reading from the record and which was reading from the air.

What they are good for instead

None of this means AI is no use in choosing what to buy. It means the job is the opposite of how most people use it: the model is not the screener that hands you a name. It is the reviewer that pulls your name apart.

I get real value asking AI to argue the bear case I am avoiding, to read what a management team carefully did not say on an earnings call, to tell me which single thing has to stay true for my thesis to hold. That is the Prompt Stack (role, filter, risk, verdict) and it is built to make the model challenge a decision I have already half-made, not to make the decision for me. The five questions I run before buying anything are the same instinct: AI stress-tests the reasoning, I own the call.

The trade I would never make is the one where I type “what should I buy” and act on what comes back. Not because the answer is always wrong. Sometimes it will be perfectly right. Because the answer arrives wearing the same face whether it is right or not.

Field Report

What worked: Asking for the source of a single quoted figure separated the three models instantly. Claude named the SEC filing and flagged its own live-data figures as needing a re-check; that is the behaviour you want, and it is checkable.

What didn’t: Every model delivered its pick in the same confident tone regardless of how sound the underlying numbers were. Gemini went further and fabricated a “record saved” confirmation for a tool that does not exist: a confident answer dressed as a completed action.

Bottom line: Useful as a reviewer, not as a picker. As a stock picker, a general-purpose AI is a confident narrator that does not, by default, tell you which of its facts it checked. What would change the verdict: a model that sources every figure inline and refuses to state a price or a P/E it hasn’t pulled live. Until then, treat the pick as a draft thesis to interrogate, never a recommendation to follow, and always ask where the numbers came from.

The finding is small and it is the whole thing: the model had precisely as much conviction about Nvidia, a name I can check against public filings, as it would have mustered for a company invented thirty seconds ago. The confidence does not move with the evidence. It is the one number in the whole answer that never changes, which is the reason to keep your hand on the wheel.

Ben Dixon
// Written by Ben Dixon

Ben tests ways of getting reliable answers from AI on his own investing: documenting what each model got wrong, what each one caught, and the prompts that survived the cuts. About Ben →

// Keep reading
Field Notes

Does ChatGPT make up sources? I checked two finance claims against the actual pages

Does ChatGPT make up sources? Mostly no, but I opened every link on two finance questions and found a real gov.uk page that didn't back the claim.

Field Notes

Does web search make AI more accurate? I ran the same questions both ways

Does web search make AI more accurate? I ran the same questions both ways. It didn't make the answers more reliable. It moved where the errors hide.

Field Notes

AI ISA advice: I tested four tools on the questions people get wrong

I asked four AI tools for ISA advice on the questions people get wrong. All four aced the basics, then two gave a rule abolished in April 2024.

// New here?

The site runs AI on real investing decisions. Start with the Prompt Stack for the four-stage framework, or the Field Guide PDF for the condensed version, free, no email.

← All posts More in Field Notes →