Why I now split every serious AI question into stages before I trust the answer

The first time I asked a language model to review a small-cap position for me, it did exactly what I’d have done if I were trying to sound clever at a dinner party: it produced a confident, well-structured answer that sounded like analysis but was actually just a tour of the available narrative.

It wasn’t wrong, exactly. It knew the sector, understood the company, could name the risks. But it said all of this in a tone of mild approval that bore no relationship to the quality of the evidence it was drawing on. The observable facts and the editorialising were blended together so smoothly you couldn’t tell where one ended and the other began.

That’s not analysis. That’s a very expensive echo chamber.

The problem with asking directly

When you ask a model a direct question — “Is this a good investment at this price?” — you are essentially asking it to produce a verdict. Models are very good at producing verdicts. They’re trained to be helpful, which in practice means they’re trained to give you something that sounds like an answer.

The issue is that a good verdict requires a specific discipline: separating what you know from what you’re inferring, surfacing what would change your mind, and quantifying what you can’t see clearly. That discipline doesn’t happen automatically when you ask a direct question. The model skips straight to the confident part.

I started thinking about this the way I think about interviewing an analyst. If you ask a junior analyst “what do you think of this stock?”, you get their opinion. That might be useful. But if you ask them to first tell you the five most important observable facts about the business — not opinions, observable facts — and then separately tell you which of those facts are actually established versus assumed, you get something different. You get a map of what they actually know.

The four-stage approach

After enough failed experiments with direct prompting, I developed what I now call the Prompt Stack. It’s not complicated. It’s just a way of preventing the model from skipping steps.

Stage 1 — ROLE: Before anything else, I establish that the model is reviewing this as a cautious analyst, not a cheerleader. This sounds trivial, but it changes the register meaningfully. The model produces less marketing language, fewer unsupported superlatives, and more of what I actually want.

Stage 2 — FILTER: This is the most important step. I ask the model to separate observable, verifiable facts from assumptions, inferences, and model filler. The output here is often surprising. What looks like solid analysis frequently turns out to be built on two facts and a lot of extrapolation.

Stage 3 — RISK: I ask specifically for timing risk, downside scenarios, and what evidence would invalidate the positive thesis. Note that I’m not asking “what are the risks” in the generic sense — that produces a canned list. I’m asking what would prove the thesis wrong. That’s a different question.

Stage 4 — VERDICT: Only after the three stages above do I ask for a practical conclusion. Importantly, I also ask the model to state a confidence level and explain what’s driving it. A verdict of “cautiously positive, 55/100, primarily on X and Y but with material uncertainty about Z” is useful. “This looks like a compelling opportunity” is not.

What changes

The output is less flattering. That’s the point.

When you force the model to establish the factual foundation first and separate inference from evidence, you get a cleaner picture of what it actually knows — and more importantly, of what nobody knows yet. The resulting verdict is less confident, more conditional, and much more useful for making an actual decision.

I’ve also found that this process often reveals that I haven’t asked the right question in the first place. The FILTER stage in particular has a habit of surfacing assumptions I was treating as facts. That’s uncomfortable, but it’s exactly what I need.

A practical note

The Prompt Stack isn’t a magic formula and it doesn’t make the model reliable on empirical questions. If you ask it to tell you a company’s debt-to-equity ratio, it might get it wrong. You still have to verify the facts against primary sources.

What it does is force a structure that makes the model’s reasoning more legible and less bullshit-friendly. The impressive, confident, slightly wrong analysis gets harder to hide once you’ve pulled the reasoning apart into its component stages.

That’s the trade. More friction upfront, less noise at the other end. In my experience, it’s a fair one.

Why I now split every serious AI question into stages before I trust the answer

The problem with asking directly

The four-stage approach

What changes

A practical note

Other lab notes

I asked AI to find the downside in a position I was already long. It found things I'd missed.

I gave the same investment question to four AI tools. The results were instructive.

The prompts I actually use when reviewing a position