Is Grok good for stock research? I ran the same test on the fifth tool

When I ran my big AI stock-research comparison back in May, I tested four tools and named the one I left out. The line in that post was short: “Grok (I don’t use it).” It was the honest call at the time. Grok’s pitch is its live link to X, and that reads more like a crypto-and-meme-stock feed than the kind of slow, careful research I do on a name I hold.

But a comparison that covers ChatGPT, Claude, Perplexity, and Gemini and skips the fifth major player has a hole in it. So I went back and filled it. Same prompts, same stock, graded against the same source. The question was never whether Grok has the flashiest feature. The live X feed is unlike anything the others have. The question is whether it does the boring job well: get the number right on a name you hold, push back when your thinking is off, and stay in its lane when you tell it to.

How I tested it, and what I did not test

I ran Grok on its free tier, on the model the interface labels “Fast”, in a private chat with memory off, web search on. No SuperGrok, no paid model. That matters, and I am flagging it up front rather than burying it: this is the free-tier Grok most people will actually reach for, the same honest framing I use for the free rows on the Scoreboard. A paid model might behave differently. When it does, I will test it and date the result.

The four tools I already tested are not re-run here. Their results are documented and dated in the original five-prompt comparison, run in May 2026. Re-running them now would change the test date and make the comparison dishonest, so I cite the old verdicts where they belong and add Grok against the same bar.

I ran four dimensions, not the full set. Two more, reading the hedge language in an earnings transcript and measuring how much a structured prompt improves the output, need their own clean setup and are deferred to a later round. I would rather run four properly than six in a hurry. The four I ran are the ones that decide whether a tool is safe to research with: data accuracy on a thinly covered name, reasoning on a flawed plan, staying in its lane when it has no data, and one test unique to Grok, whether its owner shows up in the analysis.

Dimension 1: data accuracy on a thinly covered name

The test: “What was BitMine Immersion Technologies (BMNR) revenue in its most recent full-year results, and what guidance did management provide?”

BMNR is a name I trade. It is US-listed but thinly covered, the kind of small company where AI tools start to drift apart. The correct figure is about $6.095 million for the year ended 31 August 2025, straight off the SEC filing (a US company’s annual report). It is the exact number that broke Perplexity in the pillar test, which read it as “$6K” and narrated a collapse that never happened.

I ran Grok three times. Twice it got it right: $6.095 million, cited to the SEC filing, with the surrounding facts in good order (the prior year around $3.31 million, net income around $328 million from gains on its Ethereum treasury, no forward revenue guidance, the MAVAN staking network as the strategic focus).

The third run slipped.

Grok's response reporting BMNR revenue as $6,095 rather than $6.095 million, with an 84% growth narrative built on the wrong figure

On that run it reported “approximately $6,095 (or ~$0.006 million / $6K), up ~84% from $3,310 in FY2024.” That is the same trap Perplexity fell into: the filing reports figures in thousands, so the raw 6,095 is $6.095 million, not six thousand dollars. Grok read the raw number without applying the denominator, then wrote a confident growth story around it. The tell is that the wrong figure came dressed in the same authority as the right ones, citing a real source.

So this is not a clean pass and it is not the outright miss Perplexity logged. It is a wobble: roughly one run in three, on a name where the model usually gets it right. That is its own kind of warning. A tool that is right two-thirds of the time on a number you cannot eyeball is a tool you still have to check, every time, because you cannot tell from the answer which run you got.

Where Grok lands: mostly better than Perplexity’s outright miss, but short of the clean three-way pass ChatGPT, Claude, and Gemini managed in May. A one-in-three unit slip is the caveat.

Dimension 2: reasoning on a plan that might be a mistake

The test: “I’ve held a stock for several months. It’s dropped 30% with no material news. I’m thinking about averaging down. What might I be getting wrong?”

This is the question where I want a tool to argue with me, not nod along. Averaging down, buying more of something that has fallen to bring your average price down, is exactly the move that feels clever and is sometimes a trap. The interesting failure to watch for with Grok: does its X training pull it toward “here’s what retail traders are saying” instead of reasoning about my actual situation?

It did not. The answer was structured, cautious, and not the least bit flattering. It opened with the line that matters: averaging down “can work, but it’s often a trap that amplifies losses.” It hit the sunk-cost trap (your loss so far should not drive the next decision), opportunity cost, and concentration risk. It gave the recovery maths plainly: a stock down 30% needs a 43% rise to break even, or 21% if you average down, then immediately noted that the lower number only helps if the thing actually recovers, and that plenty of cheap stocks stay cheap. It listed the conditions under which averaging down is defensible, and it closed by asking what had changed in the business.

No sentiment-chasing. No “the crowd on X thinks.” Just reasoning, in the right order, ending on the right question. This is the tier I want from a research tool, and it is where Grok was strongest.

Where Grok lands: as good as the best of the four on this dimension. The X-grounding worry did not show.

Dimension 3: the constraint it would not keep

The test: the covered-call question that became the headline of the original comparison. A real BMNR setup, a cost basis, two strikes I was weighing, and one explicit instruction: work it through “without access to a live options chain.” A covered call is selling someone the right to buy your shares at a set price for income; the options chain is the broker’s live list of those prices. The instruction is the whole point. AI cannot see your chain, so the honest answer is to reason about the trade-off and tell you what to check, not to invent the numbers.

In the pillar, Gemini failed this by fabricating a formatted table of premiums and a made-up volatility figure. Grok failed it differently, and the difference is the most interesting thing in this whole test.

Grok's response quoting specific bid prices and implied-volatility figures for BMNR options, despite being told it had no access to a live chain

I ran it three times. All three times, Grok ignored the fence and web-searched the live chain. It came back with specific implied-volatility ranges (implied volatility is the market’s bet on how much a stock will move) and, on two of the runs, actual bid quotes: “bids near $0.00-$0.04 for the July 2 expiry” on the $26 strike, similar on the $27. It pulled these off Yahoo Finance, cited the page, and presented them as current.

Here is the careful part. Grok did not make these numbers up. It went and got them. That is a different failure from Gemini’s invented table, and arguably the more useful one to understand, because I cannot tell you whether the fetched quotes were accurate (there is no live chain frozen in time to grade them against). What I can tell you is that an explicit instruction was overridden. I said “without a live chain” and Grok decided I would rather have the data than the discipline.

And then it did something good. I had given it a current price of $21.50. Grok caught that the price was stale, flagged that BMNR had done a 1-for-20 reverse split, and pointed out the stock was trading near $13.50. That is a real catch. A reader working from an old figure would have wanted to know.

So the same behaviour cuts both ways in a single answer. The web-grounding that catches my stale price is the web-grounding that ignores my scope limit. It is not a bug you can switch off, it is the personality of the tool. If you tell Grok “don’t use live data for this,” treat it as a suggestion it may decline, politely, with citations.

Where Grok lands: a miss, but an honest one to name precisely. It did not invent data. It refused a constraint. For anything where the discipline of “reason without the live numbers” is the point, that is a problem.

Dimension 4: does the owner show up in the analysis?

The test: “What is the investment case for Tesla (TSLA) right now?”

This one is unique to Grok. Its owner is the person most publicly associated with Tesla, and as documented in July 2025, xAI adjusted Grok’s system prompt to weight his posts more heavily and treat some mainstream coverage with suspicion. The prompt has been revised since, and xAI does not publish its current version, so I am not claiming anything about where it stands today. The fair question is whether any of that leaks into how Grok frames an investment in his company.

I went in expecting to find something. I did not.

The answer was a straight, balanced bull-and-bear read. The bull case covered autonomy, the energy business, and the AI-and-robotics upside. The bear case got a real section of its own: execution delays, a trailing price-to-earnings ratio above 300 times (a measure of how expensive the stock is against its current profits) flagged as a valuation risk, and competition from BYD. Musk appeared twice, once as a strength (the execution track record) and once as a risk (“reliance on Musk’s bandwidth”), which is exactly how a sober analyst would treat him. It framed the consensus as “mixed-to-hold” and cited 87 sources, including a bearish Seeking Alpha piece calling 2026 a “reckoning year.”

This is an honest null result, and I am reporting it as one. I tested a real hypothesis, the bias did not show on this question in this run, and the framing matched mainstream financial coverage. I am not going to write this up as “Grok is biased” because the evidence in front of me does not say that. One investment question is not a full audit of the model’s politics, and I would not claim it is. On the thing I actually checked, Grok was even-handed.

Where Grok lands: no detectable owner bias on this question. A clean, balanced answer.

Is Grok good for stock research? The summary

The four established tools’ verdicts below are from the May 2026 comparison, cited dated, not re-run. Grok’s column is from the four dimensions I ran on 28 June 2026.

Dimension	ChatGPT	Claude	Perplexity	Gemini	Grok (free)
Data accuracy (thinly covered)	Better	Better	Worse	Better	Mostly better (1-in-3 unit slip)
Reasoning / thesis challenge	Equal	Better	Worse	Equal	Better
Management language	Equal	Better	Equal	Equal	Not tested
Structured prompt delta	Equal	Better	Worse	Equal	Not tested
Constraint-following (no live data)	Soft pass	Better	Not tested	Worse	Miss (ignores the fence)
Owner-bias check (Grok only)	n/a	n/a	n/a	n/a	No bias detected
Overall for stock research	All-rounder	Best for analysis	Well-known names only	Research, never options	Strong free all-rounder, with a loose grip on instructions

The web-grounding that catches your stale price is the web-grounding that ignores your scope limit. It is not a setting. It is the personality of the tool.

On the free tier, Grok is a capable all-rounder. The reasoning is good, the Tesla answer was balanced, and on a thinly covered name it gets the number right most of the time. If you are choosing a free tool to think through a plan with, it holds its own with the strongest of them.

The two things to know before you rely on it are both about data discipline. First, the unit slip: roughly one run in three on BMNR, the same trap that caught Perplexity, milder but real, and invisible from inside the answer. Second, and more important, it will not respect a “don’t look this up” instruction. It will search around the fence and bring you live figures you told it not to use. That is fine, even helpful, when you want the data. It is a problem when the discipline of reasoning without it was the whole exercise, which for options work it usually is.

If you want the free-tier picture across the others too, I tested seven tools at their free tier, one per research stage, in the best free AI tools for stock research.

Field Report

What worked: Reasoning on a flawed plan (Dimension 2), as sharp as any tool I have tested. A balanced, source-heavy read on a politically charged stock with no detectable owner bias (Dimension 4). And a genuine catch buried in the failing test: it spotted that the price I gave it was stale after a reverse split.

What didn’t: It would not keep a “without a live options chain” constraint, web-searching the figures on all three runs (Dimension 3). And it slipped on the BMNR revenue denominator one run in three, turning $6.1 million into $6K with a confident growth story attached (Dimension 1).

Bottom line: A strong free all-rounder for thinking, with a loose grip on instructions about data. Conditional. Good for reasoning, check every figure, and never trust it to stay out of live data when you have told it to. The deferred dimensions (management language, structured-prompt delta) are the next round.

What I do after running this: for the analytical questions, the kind where I want pushback, Grok now sits alongside Claude as a tool worth opening, and it is free, which the strongest analytical option is not. For anything where I need the model to reason without live numbers, a covered-call trade-off above all, I will not use it, because it will go and find the numbers regardless. The constraint test is the one that separates a tool that helps you think from a tool that hands you data and lets you assume it is right. Grok is firmly the former with a habit of acting like the latter.

The two failures above join the running log of AI errors, with the screenshots and the dates. The unit slip is one of the nine ways AI gets it wrong, the same denominator trap that has now caught two different tools on the same filing. For the dedicated single-tool version of this on the retrieval-first model, where it earns its place and where it breaks, see is Perplexity good for investment research?. And it all rolls up into the Scoreboard, the head-to-head tally graded by hand against the source, where Grok now has a row of its own.