Skip to content
Lab Report

I asked AI to find the downside in a position I was already long. It found things I'd missed.

An adversarial prompt against my own portfolio. The model surfaced three risks I'd glossed over, and one I'd actively been wrong about.

// On this page

The natural way to use AI on a position you already own is to ask it for confirmation. “What’s the bull case for this stock?” The model gives you a tidy three-paragraph summary that sounds like you’ve done your homework. You haven’t. You’ve outsourced your validation to a machine that will agree with you for as long as you keep asking the right kind of question.

I wanted to test the opposite. Take a position I’m long, brief the model on the situation as honestly as I could, and then explicitly ask it to argue against me. Not “what are the risks” — that just produces a checklist. Argue against me. Find the things I haven’t priced in. Tell me where I’m wrong.

This is the lab report.


The setup

The position was a UK small-cap I’d held for around eight months at the time of the experiment. Decent run since I bought, no recent news to speak of, broker estimates broadly stable. The kind of position where complacency is the real risk — there’s no obvious thing to react to, so you stop looking.

I gave the model a context document that included:

  • The basic business description (one paragraph in my own words)
  • The thesis as I’d written it down at the time of buying
  • Recent results highlights, in numbers
  • The current valuation, with a note on what assumptions were embedded in it
  • A line saying “I am long this position. Argue against me.”

I deliberately did not include a list of risks. I wanted to see what the model would surface unprompted, not what it would re-summarise from my own framing.

The prompt structure followed the Prompt Stack — Role, Filter, Risk, Verdict — but with the Role specifically calibrated to be adversarial: “You are a sceptical analyst whose job is to find what the long holder is missing. Assume the long holder is intelligent and informed. Look for the second-order risks they may have rationalised away.”


What it found

Four things came back. I’ll describe them in the order the model surfaced them, not the order I think they matter.

1. Customer concentration in a way I’d been counting as strength

I’d treated the company’s largest customer relationship as a positive — long contract, structural lock-in, the kind of stickiness that justifies a premium multiple. The model pointed out that the same lock-in, viewed from the customer’s side, was a procurement risk that would absolutely be reviewed at the next contract cycle, and that the renewal date was inside the window where my thesis assumed stable revenues.

I knew about the contract. I’d never thought about how the customer’s procurement team would think about the renewal. That’s not a fact I was missing — it’s a perspective I wasn’t taking. Cost: a week of background reading on the customer’s recent procurement disclosures, which surfaced exactly the kind of cost-pressure language I should have seen.

2. A regulatory adjacency I hadn’t thought about

The company’s primary market is fine. An adjacent market — one I’d treated as a potential upside option — was being reshaped by a piece of regulation I hadn’t read carefully. The model noted that the way the regulation was written, the adjacency wasn’t an option for growth; it was a mandatory cost line that nobody on the call had quantified.

That’s a specific, falsifiable claim. I went and read the regulation. The model was directionally right and slightly overstated. The cost was real, the timing was further out than implied, but the option I’d been carrying was worth less than I’d had it priced.

3. Capex discipline as the actual swing variable

This one I bristled at. The model argued that the entire thesis was, in practice, a bet on capital allocation discipline that the management team had only demonstrated for two years. Two years isn’t a track record. Two years is a sample.

I didn’t want this to be true. I’d been treating the recent capex restraint as a structural feature of the new management team. The model was right that I hadn’t tested whether it was structural or circumstantial — capex had been low partly because there hadn’t been many opportunities to deploy capital. The discipline hadn’t really been put under pressure yet.

This didn’t change the position. It changed what I’d be watching for.

4. The thing I was actively wrong about

The fourth point was where the model was useful in the way a junior analyst is occasionally useful: it caught a factual error in my context document. I’d written the wrong figure for working capital intensity, and the entire downstream argument I’d built on top of that figure was therefore wrong.

This was not subtle. It was an arithmetic mistake. I noticed it because the model said “if your figure is correct, the implication would be X, which seems inconsistent with the segment margin you cited.” The inconsistency was real. The model had detected it and asked. I went back and checked, and I’d transcribed the figure wrong.

I have done this kind of mistake more often than I would like to admit. Having a machine ask “are you sure?” every time the numbers don’t tie out is, on its own, a strong argument for the workflow.


What it didn’t find

Worth listing, because the absence of false positives is the real measure of whether this is useful.

The model did not invent a risk. It did not flag a “potential macro headwind” or a “deteriorating sentiment” or any of the other content-free phrases that AI output tends toward when you ask for risks. Each of the four things it raised was a specific, verifiable claim about the company or the situation. Some held up under inspection, some were partially right, one was an arithmetic catch.

It also did not, despite the explicit framing, give me a verdict. I had set the Role to adversarial and the Verdict stage to “summarise the strongest single argument against the position, and assess your own confidence.” The model’s confidence on the argument-against was 6/10. That felt about right to me — not a “sell immediately” signal, but a “this is more uncertain than you have it priced as” signal.

I’d rather have a 6/10 honest assessment than a 9/10 confident one that’s wrong half the time.


The change in my workflow

I now run this prompt against every position I own at least quarterly, and against every position I’m seriously considering buying before I buy. The cost is roughly twenty minutes of preparing the context document and another ten of reading the output carefully. The benefit is, conservatively, one corrected error or surfaced perspective per run, on a process that previously had no formal step for adversarial review.

The thing I want to flag is that this only works if the context document is honest. The model can only argue against what you’ve told it. If you write a flattering brief, you’ll get a flattering critique. The discipline of writing the context document — laying out your own thesis in plain language, pricing your own assumptions — is doing at least half the work. The model is the second pair of eyes. The first pair is yours, when you write it down.


What this didn’t replace

It didn’t replace reading the report. It didn’t replace knowing the management team. It didn’t replace the slow accumulation of context that lets you tell the difference between a real signal and a plausible-sounding model output. The fourth point — the arithmetic catch — only worked because I’d already built the context document carefully enough that an inconsistency could be detected.

What it replaced was the failure mode of going long, watching the price, and never again putting the position under deliberate adversarial pressure. That’s a failure mode I’d been quietly running for years.

If the only thing this method does is force one structured argument-against conversation per quarter for every meaningful position, that is, in the actual data, more rigour than I had in my process before.

// Keep reading

Other lab notes

Lab Report

I gave the same investment question to four AI tools. The results were instructive.

Not a ranking. A controlled test on a real question, with notes on what each tool actually produced and why most of it was useless.

Decision Process

Why I now split every serious AI question into stages before I trust the answer

How I use a four-stage prompting method — Role, Filter, Risk, Verdict — to get useful AI analysis instead of confident-sounding noise.

Prompt Stack

The prompts I actually use when reviewing a position

Not a list of clever prompts to copy. A walkthrough of the actual sequence I run, and why the order matters more than the wording.

← All posts More in Lab Report →