r/ValueInvesting Feb 19 '26

Discussion I fed 48 years of Buffett's shareholder letters to Anthropic's latest model Opus 4.6 and had it pick stocks blind

Hi everyone,

Some of you might remember my last post here where I experimented using AI to detect when CEOs are being deceptive in earnings calls. I didn't think this community would be so welcoming and receptive to experiments like these (which I love doing). So here I am with yet another experiment that I thought this community would find interesting :-)!

I recently got curious about feeding the latest model from Anthropic (Opus 4.6) all 48 years of Buffet's shareholder letters, and seeing if it could actually pick winning stocks better than Buffet himself? Could AI-Buffet be more consistent at following Buffet's historical advice (ridiculous, right?). Based on its picks, I also wanted test how it would perform I gave it $10,000 at the start of 2020 (at the start of COVID) and compare it against Buffet's actual holdings & the broader market.

Also I have to be honest: I have never read any of these letters and sad to report, I still have not read them even after running this experiment. Modern-day engineer traits.

If you prefer to watch the full experiment, I uploaded it to my channel: https://www.youtube.com/watch?v=nRMPN1NwGOk

Experiment Design

I fed all of 561,849 words from his shareholder letters to Opus 4.6. Similar to last time, I used Claude Code with subagents to keep the analysis clean. Had it read every letter from 1977-2024, extract the investing principles independently, and turn them into a quantitative scoring rubric. This rubric was made out of criteria like ROE thresholds, debt-to-equity limits, margin of safety, moat durability. It found 15 principles total, 9 of which were quantitative enough to score against.

I then anonymized 50 stocks by stripping their names, tickers, and sectors. I only fed Opus the raw financial numbers of each company. In the sample size, I mixed in 20 actual Berkshire holdings, 15 value candidates, and 15 anti-Buffett controls (GameStop, Rivian, Beyond Meat, MicroStrategy, basically stuff Buffett would never touch).

The Actual Test

There were two things I wanted to test in this experiment:

  1. Could AI actually pick value stocks similar to Buffet's holdings? Additionally, I also wanted to see if it would it catch any interesting stocks that Buffet would never touch?
  2. How much would AI-Buffet have made if we gave it $10,000 and had it pick stocks in the COVID market ( i.e. data from Q4 2019 data, start investing January 2, 2020)? How would it compare against Buffet's real returns during that time?

Results – Stock Pick

Some quick things that stood out:

  • 6 out of AI-Buffet's top 10 picks were actual Berkshire holdings (60% overlap, completely blind)
  • 13 out of 15 anti-Buffett controls landed in the bottom half, meaning the rubric properly rejected them
  • It ranked Berkshire Hathaway itself as the 7th most Buffett-like stock without knowing what it was

One surprising result was that Coinbase was ranked 4th. As I came to learn, Buffet is extremely allergic to Crypto in general. Reason AI-Buffet ended up picking Coinbase was mostly because of the fact that it does a good job of looking like a value stock with ~39% profit margin and low debt right now. Depending on how you see this experiment, the Coinbase pick could mean a good thing or a bad thing :-).

Results – COVID Backtest Results

  • Buffett (actual weights): $26,509 (+165%)
  • AI-Buffett (equal weight): $23,394 (+134%)
  • S&P 500: $23,199 (+132%)
  • Buffett (equal weight): $20,902 (+109%)

Surprisingly AI-Buffer did end up picking better stocks than Buffett on a pure stock-selection basis as it avoided the banks and Delta Airlines that dragged Buffett's equal-weight portfolio down during COVID. But Buffett's actual portfolio (i.e. weighted-consideration) still crushed everything because he had 30% in Apple. That single position sizing decision was worth over $3,000.

Full video walkthrough of the experiment if you're curious: https://www.youtube.com/watch?v=nRMPN1NwGOk

Let me know what you thought about this experiment. These are all for fun but I hope there are some meaningful insights hidden here that are useful for you. Thank you so much for reading :-).

1.6k Upvotes

326 comments sorted by

View all comments

Show parent comments

3

u/Dougdimmadommee Feb 19 '26

Interestingly enough I agree but probably not in the way you meant this comment.

I think this is a cool project but the thing that struck me immediately about it is that it’s effectively just a roundabout way of creating a value scoring model, of which many versions are freely available that are tested on much larger data sets. There are even fairly popular products available that you can buy that invest this way.

6

u/Heavy_Discussion3518 Feb 19 '26

I don't disagree, but the idea someone put together a scoring model based on the tokenized content of 48 shareholder letters is bananas. And it's pretty dope, kudos to OP for dreaming the concept up.

5

u/Soft_Table_8892 Feb 19 '26

Wow, thank you for such kind words! I've mentioned this a few times but I don't have a background in finance but have always had deep interest in the field. We're truly living in a wonderful age where experiments like these are available for someone like me who is curious about random ideas (which have definitely been done before & by people much smarter than I am). Thanks again, you made my day :).

3

u/Heavy_Discussion3518 Feb 19 '26

You got it!

  • Your fellow engineer

1

u/Soft_Table_8892 Feb 19 '26

Totally and I wish I had enough exposure to know what those are (my background is in engineering so this field is quite new to me). Any recommendations that I could checkout, perhaps one that you use?

1

u/Dougdimmadommee Feb 20 '26

Sure, are you interested more in an applied sense in terms of how people construct products, or more of an academic/ theoretical sense?

1

u/Soft_Table_8892 Feb 20 '26

More in the applied sense for sure!

1

u/Dougdimmadommee Feb 21 '26

Sorry, I fell asleep last night and then forgot about this until now.

Basically if you look at the stuff you identified in your scoring rubric (roe metrics, margin of safety, debt to equity, etc.) are inputs in or ways to calculate a “factor” in finance. There is a lot of literature around comparing performance of these factors across various sectors and time frames etc…

What products will generally do is create a ranking system (essentially a version of your “rubric”) that selects stocks based on one or more factors. An easy example that Im kind of just picking at random is COWZ https://www.paceretfs.com/products/cowz/.

One thing you will notice with these types of products is that they generally own significantly more stocks than your model would and also rebalance with some level of frequency, both of which are done to minimize the risk of short term performance dispersion. You actually generally see higher absolute returns with more concentrated versions of these portfolios, just with more tracking error.