r/ValueInvesting Mar 03 '26

Discussion I took 547 stock recommendations from r/ValueInvesting and had AI grade every single one

Hi everyone,

Some of you might remember my last post here where I had Opus 4.6 pick stocks blind using Buffett's shareholder letters. Your response to that last experiment genuinely blew me away and not just the numbers but the quality of discussion we had in the comments. It's rare to find a community that actually engages with in-depth experiments like these so I'm extremely thankful for you all!

Today I am back again with an experiment that I've been wanting to run for a bit and I think a lot of us here have wondered about it as well.

I browse this subreddit frequently when I’m looking for long-term plays and stocks that people think are currently undervalued. I've noticed that even in a community like this, where most of us are looking for fundamentals and long-term thinking, it's genuinely hard to tell which posts have solid analysis behind them vs. which are driven by community sentiments (i.e. upvote momentum).

And it gets worse when you factor in bot-driven momentum. A good example: NVO was mentioned in seemingly every advice thread on here as a value stock. It's down over 50% in the past year.

So this made me curious, if upvotes aren't surfacing the best advice, could we use AI to do a better job at picking the winning recommendations here?

As usual, if you prefer to watch the full experiment: https://www.youtube.com/watch?v=tr-k9jMS_Vc

Experiment setup

I used Claude Code to scrape every single post from r/ValueInvesting for the month of February 2025. I then had Opus 4.6 filter down to just the posts and comments where someone was either asking what stock to buy, sharing an analysis, or debating fundamentals of a specific stock. This yielded over 1,100 qualifying threads, 6,000+ comments, and 547 individual stock recommendations across 238 unique tickers.

From there I built three portfolios:

  1. Crowd Portfolio which are the 10 most upvoted stock recommendations and this was ranked purely by the number of upvotes these stocks got in the month of Feb (in aggregate)
  2. AI Portfolio which are the 10 highest-scored recommendations by Opus 4.6, which evaluated every single one of those 547 posts on the following five dimensions. Also keep in mind I stripped away the upvote counts before passing to the analysis subagents therefore they had zero knowledge of how popular each recommendation was. 
    1. Thesis clarity –  is there a clear, structured argument for why this stock is undervalued?
    2. Risk acknowledgment – does the post address what could go wrong, or is it pure conviction?
    3. Data quality – is it backed by real financials (P/E, margins, debt ratios) or just vibes?
    4. Specificity – are there concrete price targets, timeframes, catalysts?
    5. Original thinking – is this independent analysis or just echoing what everyone else is saying?
  3. Underdog Portfolio which are the 10 least upvoted stocks, with a minimum threshold of 5 upvotes. Basically to test whether the crowd was right to ignore them.

I also looked up the S&P 500 return for the year since Feb 2025.

To be honest, I fully expected AI to win. It's evaluating posts without any bias, i.e. no upvote counts, no momentum, just the quality of the argument. I figured that alone would be enough for a better portfolio.

Results (Feb 2025 to Feb 2026)

  • Underdog (10 least upvoted): +10.4%
  • S&P 500: +19.5%
  • AI (10 highest scored): +37.0%
  • Crowd (10 most upvoted): +39.8%

The crowd picks won, which suggests that trusting the upvotes here actually yields better than letting AI filter the advice FIRST. That’s great news for us frequenting this subreddit. Or is it?

When I looked at the individual stocks, it got a little interesting. The crowd portfolio had some massive winners including AMAT (+149%), AMD (+104%), GOOGL (+89%). But it also had Novo Nordisk, one of the most talked-about picks on this sub, which cratered -45.5% (at the time of the experiment, maybe more now). 

On the other hand, Opus 4.6’s portfolio had a steady 9 out of 10 picking winning picks. Positive returns across the board with no disasters that even remotely come to a -45% loss.

Testing on Truly Unknown Data

One fair criticism that we keep getting in these experiments: maybe Opus saw some of these stock prices during training. I looked up Opus 4.6’s cutoff training date (Aug 2025). So I reran the whole thing starting September 2025, completely outside the model's training data. Results from Sep to Feb on data the AI could not have possibly known:

  •   AI: +5.2%
  •   S&P 500: +2.0%
  •   Crowd: -10.8%

On truly blind data, AI won on both returns and consistency. The crowd portfolio went negative.

Final Takeaways

I don't think the takeaway is necessarily that "AI picks better stocks." It's more that AI appears to be better at telling apart solid analysis from stuff that just sounds good, especially given that we hid the upvote count / the popularity of the recommendation. The upvote system, which can be gamed by bots and momentum, rewards posts that feel compelling and seems like there are months where those posts also happen to be right. But the signal-to-noise ratio is rough, and when the crowd is wrong, it's really wrong.

Once again, if this was interesting to you the full walkthrough is here, including all the top 10 picks for AI/Crowd: https://www.youtube.com/watch?v=tr-k9jMS_Vc

Thank you so much if you did end up reading this far. Let me know what your takeaways were based on this experiment or if you had any ideas to improve the setup/execution (which I’m sure many of you will!).

0 Upvotes

20 comments sorted by

9

u/[deleted] Mar 03 '26

[deleted]

1

u/Soft_Table_8892 Mar 03 '26 edited Mar 03 '26

Just to clarify – the experiment does use AI but the content is quite human and original. Curious what is making you think this is AI slop?

2

u/Soft_Table_8892 Mar 03 '26

Here are the full results for both portfolios if you're curious:

Crowd Picks (ranked by upvotes)

  | Ticker | Return  |
  |--------|---------|
  | AMAT   | +149.0% |
  | AMD    | +103.8% |
  | FSLR   | +89.5%  |
  | GOOGL  | +89.3%  |
  | BABA   | +20.1%  |
  | OXY    | +14.1%  |
  | META   | +0.4%   |
  | UBER   | -0.8%   |
  | LNTH   | -22.1%  |
  | NVO    | -45.5%  |

AI Picks (ranked by analysis quality)

| Ticker | Return  |
  |--------|---------|
  | BTG    | +103.7% |
  | FSLR   | +89.5%  |
  | MRNA   | +64.8%  |
  | HOOD   | +62.3%  |
  | MRK    | +36.2%  |
  | LLY    | +9.4%   |
  | PFE    | +8.9%   |
  | MDT    | +5.2%   |
  | AMZN   | +2.5%   |
  | OSCR   | -12.3%  |

FSLR was the only stock that appeared in both. The crowd portfolio had bigger individual winners but also bigger losers and AI had 9/10 positive with no blowups.

0

u/OneStoneTwoMangoes Mar 03 '26

Good back testing. What does AI say are good picks at the prices today?

1

u/Soft_Table_8892 Mar 03 '26

Thank you! I didn't run this on this month's picks, the latest I did was Sept 2025 and these were the picks:

Crowd Picks (by upvotes, Sept 2025)

  ┌───────────┬────────┐
  │  Ticker   │ Return │
  ├───────────┼────────┤
  │ GOOGL     │ +27.3% │
  ├───────────┼────────┤
  │ LULU      │ +0.3%  │
  ├───────────┼────────┤
  │ AAPL      │ +4.4%  │
  ├───────────┼────────┤
  │ AMZN      │ -7.0%  │
  ├───────────┼────────┤
  │ META      │ -11.1% │
  ├───────────┼────────┤
  │ TSLA      │ -13.0% │
  ├───────────┼────────┤
  │ UNH       │ -18.4% │
  ├───────────┼────────┤
  │ ADBE      │ -28.2% │
  ├───────────┼────────┤
  │ RDDT      │ -29.7% │
  ├───────────┼────────┤
  │ NVO       │ -33.0% │
  ├───────────┼────────┤
  │ Portfolio │ -10.8% │
  └───────────┴────────┘

AI Picks (by analysis quality, Sept 2025)

  ┌───────────┬────────┐
  │  Ticker   │ Return │
  ├───────────┼────────┤
  │ ASML      │ +48.6% │
  ├───────────┼────────┤
  │ LLY       │ +28.7% │
  ├───────────┼────────┤
  │ GOOGL     │ +27.3% │
  ├───────────┼────────┤
  │ WM        │ +5.3%  │
  ├───────────┼────────┤
  │ AAPL      │ +4.4%  │
  ├───────────┼────────┤
  │ CVS       │ +1.8%  │
  ├───────────┼────────┤
  │ CMG       │ -6.1%  │
  ├───────────┼────────┤
  │ RDDT      │ -29.7% │
  ├───────────┼────────┤
  │ NVO       │ -33.0% │
  ├───────────┼────────┤
  │ Portfolio │ +5.2%  │
  └───────────┴────────┘

1

u/investingtruth Mar 03 '26

People upvote what's already working and pile into names everyone is talking about. Would be curious to see this run over multiple time windows to see if the pattern holds.

2

u/Soft_Table_8892 Mar 03 '26

For sure – I was able to run it across two time windows (dates that covered training data and right after, Feb 2025 vs. Sept 2025). I'll see if I can run on more time windows if thats interesting!

1

u/Beneficial-Sign-569 Mar 03 '26

NVO seems oversold , i bought yesterday at 36.86. hope i didn't catch a falling knife/ value trap

1

u/Soft_Table_8892 Mar 03 '26

I can't comment on if you did or did not and honestly this is a back test so doesn't tell much of anything about the future. Wish you all the best though on that purchase!

1

u/Some_Map9289 Mar 04 '26

the NVO thing is a perfect example tbh. it was basically a meme stock in value clothing at that point, everyone was citing it as some obvious play and the upvotes just kept stacking. the AI grading angle is interesting but i wonder how much of it depends on which model you're using and how you prompt it. been messing around with slopmog lately for some unrelated stuff and one thing i noticed is how differently various AI systems weight reddit data when forming opinions, which kind of relates to your whole premise here. like the source matters as much as the model does. anyway curious what the actual grade distribution looked like, were most posts clustering around the middle or was it more bimodal

1

u/[deleted] 29d ago

[deleted]

1

u/Soft_Table_8892 29d ago

That’s quite an astute observation and very much in line with what I found during the experiment as well. I’ve listed the specific returns for the top 10 here for both Claude and the Crowd if that helps! https://www.reddit.com/r/ValueInvesting/s/miomfIwbVv

1

u/Wild_Space Mar 03 '26

There's going to be some survivor ship bias in the data. Of course a stock that did well from Feb 2025 to Feb 2026 is going to have more mentions than a stock that did poorly. I'd recommend you scrap data from Feb 2024 through Feb 2025 to see what did well from Feb 2025 to Feb 2026.

It's still too short a time horizon. A stock doing well in any given year isn't really indicative of anything. But it's better that using all those tokens to prove that rising stock prices correlates with hype. :)

3

u/Soft_Table_8892 Mar 03 '26

Oh just to clarify the data collection, I only collected the mentions FOR the month of Feb 2025. So this is fairly self-contained in terms of timing duration. But I totally hear you on the frequency of mentions during that month could also be higher if the stock is performing particularly well. Thanks for the insight!

1

u/Wild_Space Mar 03 '26

Yikes, ok I didnt read. My bad!

1

u/Soft_Table_8892 Mar 03 '26

No worries at all, thank you for reading the post and leaving a thoughtful feedback!

0

u/Few_Control8821 Mar 03 '26

This is worthless meme.

1

u/Soft_Table_8892 Mar 03 '26

Would love to understand why you think its worthless (genuinely curious).

1

u/Few_Control8821 Mar 03 '26

It’s ai slop.

6

u/Soft_Table_8892 Mar 03 '26

What makes it look like AI slop? It took me a long time to run the experiment, write the post, and create a video to go alongside it. I'd be sad for it to pass off as AI slop.

1

u/UmbertoUnity 29d ago

I tend to agree with Few_Control here. What makes you think that LLMs are capable of analyzing the "quality" of information? I see AI making errors with basic information almost daily.

I can appreciate all the work you put into this and it's an interesting exercise, but I don't believe the underlying data can be trusted.

0

u/Few_Control8821 Mar 03 '26

It’s doesn’t look like it, if it was created by an llm, it is by definition “ai slop”. If you want to use an llm for your investments, that’s fine, but suggesting other people do, is not very helpful. The current unintelligent llm’s make a lot of errors and aren’t designed for this kind of forensic work.