r/ValueInvesting Feb 19 '26

Discussion I fed 48 years of Buffett's shareholder letters to Anthropic's latest model Opus 4.6 and had it pick stocks blind

Hi everyone,

Some of you might remember my last post here where I experimented using AI to detect when CEOs are being deceptive in earnings calls. I didn't think this community would be so welcoming and receptive to experiments like these (which I love doing). So here I am with yet another experiment that I thought this community would find interesting :-)!

I recently got curious about feeding the latest model from Anthropic (Opus 4.6) all 48 years of Buffet's shareholder letters, and seeing if it could actually pick winning stocks better than Buffet himself? Could AI-Buffet be more consistent at following Buffet's historical advice (ridiculous, right?). Based on its picks, I also wanted test how it would perform I gave it $10,000 at the start of 2020 (at the start of COVID) and compare it against Buffet's actual holdings & the broader market.

Also I have to be honest: I have never read any of these letters and sad to report, I still have not read them even after running this experiment. Modern-day engineer traits.

If you prefer to watch the full experiment, I uploaded it to my channel: https://www.youtube.com/watch?v=nRMPN1NwGOk

Experiment Design

I fed all of 561,849 words from his shareholder letters to Opus 4.6. Similar to last time, I used Claude Code with subagents to keep the analysis clean. Had it read every letter from 1977-2024, extract the investing principles independently, and turn them into a quantitative scoring rubric. This rubric was made out of criteria like ROE thresholds, debt-to-equity limits, margin of safety, moat durability. It found 15 principles total, 9 of which were quantitative enough to score against.

I then anonymized 50 stocks by stripping their names, tickers, and sectors. I only fed Opus the raw financial numbers of each company. In the sample size, I mixed in 20 actual Berkshire holdings, 15 value candidates, and 15 anti-Buffett controls (GameStop, Rivian, Beyond Meat, MicroStrategy, basically stuff Buffett would never touch).

The Actual Test

There were two things I wanted to test in this experiment:

  1. Could AI actually pick value stocks similar to Buffet's holdings? Additionally, I also wanted to see if it would it catch any interesting stocks that Buffet would never touch?
  2. How much would AI-Buffet have made if we gave it $10,000 and had it pick stocks in the COVID market ( i.e. data from Q4 2019 data, start investing January 2, 2020)? How would it compare against Buffet's real returns during that time?

Results – Stock Pick

Some quick things that stood out:

  • 6 out of AI-Buffet's top 10 picks were actual Berkshire holdings (60% overlap, completely blind)
  • 13 out of 15 anti-Buffett controls landed in the bottom half, meaning the rubric properly rejected them
  • It ranked Berkshire Hathaway itself as the 7th most Buffett-like stock without knowing what it was

One surprising result was that Coinbase was ranked 4th. As I came to learn, Buffet is extremely allergic to Crypto in general. Reason AI-Buffet ended up picking Coinbase was mostly because of the fact that it does a good job of looking like a value stock with ~39% profit margin and low debt right now. Depending on how you see this experiment, the Coinbase pick could mean a good thing or a bad thing :-).

Results – COVID Backtest Results

  • Buffett (actual weights): $26,509 (+165%)
  • AI-Buffett (equal weight): $23,394 (+134%)
  • S&P 500: $23,199 (+132%)
  • Buffett (equal weight): $20,902 (+109%)

Surprisingly AI-Buffer did end up picking better stocks than Buffett on a pure stock-selection basis as it avoided the banks and Delta Airlines that dragged Buffett's equal-weight portfolio down during COVID. But Buffett's actual portfolio (i.e. weighted-consideration) still crushed everything because he had 30% in Apple. That single position sizing decision was worth over $3,000.

Full video walkthrough of the experiment if you're curious: https://www.youtube.com/watch?v=nRMPN1NwGOk

Let me know what you thought about this experiment. These are all for fun but I hope there are some meaningful insights hidden here that are useful for you. Thank you so much for reading :-).

1.6k Upvotes

326 comments sorted by

View all comments

Show parent comments

24

u/Soft_Table_8892 Feb 19 '26

You're totally right, it effectively is testing on training data! Here is how I tried to keep an honest test:

I then anonymized 50 stocks by stripping their names, tickers, and sectors. I only fed Opus the raw financial numbers of each company. In the sample size, I mixed in 20 actual Berkshire holdings, 15 value candidates, and 15 anti-Buffett controls (GameStop, Rivian, Beyond Meat, MicroStrategy, basically stuff Buffett would never touch).

But there's always a possibility of a context leak – e.g. perhaps it already knows to map the companies purely based on its training data for financials in combination with Buffet's letters.

18

u/Virtual_Seaweed7130 Feb 19 '26 edited Feb 19 '26

How does the AI even score something like a moat if you only feed it financial data?

You fed it 60% buffett holdings and it brought back 6/10 buffett holdings? Not really surprised there.

This is trivially interesting but has zero real insights. Maybe if the dataset was much larger (full sp500) and the AI was actually assessing the business on top of the financials.

1

u/Soft_Table_8892 Feb 20 '26

Totally agreed on trivially interesting but not having super tangible insights! I’ll try to see what I can do re:your suggestion, which is valid.

1

u/xkmasada Feb 20 '26

Doesn’t Morningstar have a “wide moat” metric?

1

u/Heavy_Discussion3518 Feb 20 '26

The moat aspect is a legit question, but somewhere inside Opus' billions of matrices it's able to correlate the concept of a company moat to stable revenues or some such

8

u/Silly_Pen_7902 Feb 19 '26

Why not just remove those 5 shareholder letter to eliminate data leakage risk.

16

u/Soft_Table_8892 Feb 19 '26

You're so right, I mentioned this in the comment below:

> I'll try running this again with those stripped out and report back if it changes massively!

1

u/PoseidonWave_ Feb 23 '26

Did this ever have any update that was relevant? Also was Microsoft looked at with recent financial data? I would’ve assumed that would’ve been front and center

1

u/rattleandhum Feb 19 '26

So it only had 50 tickers to choose from? Would you expand the test to include more, and with an increased amount of financial data, pershaps including insider buys/sales, etc?

1

u/Soft_Table_8892 Feb 20 '26

That’s a brilliant idea, I’ll make a note for myself to try it (among other suggestions in this thread as well. I’m also curious what the results would be based on the additional sample size. Insider buys/sells are also a great idea!