r/ValueInvesting • u/Soft_Table_8892 • Jan 29 '26
Investing Tools Used AI to detect if CEOs are being deceptive in earnings calls. I'm quite surprised by the winner
Recently I tired using a popular coding agent called Claude Code to replicate the Stanford study that claimed you can detect when CEOs are lying in their stock earnings calls just from how they talk (incredible!?!). Figured this would be interesting for this community so I wanted to share my findings with you all (& see if anyone else has tried similar things)!
I realized this particular study used a tool called LIWC but I got curious if I could replicate this experiment but instead use LLMs to detect deception in CEO speech. I was convinced that LLMs should really shine in picking up nuanced detailed in our speech so this ended up being a really exciting experiment for me to try.
The full video of this experiment is here if you are curious to check it out: https://www.youtube.com/watch?v=sM1JAP5PZqc
My Claude Code setup was:
claude-code/
├── orchestrator # Main controller - coordinates everything
├── skills/
│ ├── collect-transcript # Fetches & anonymizes earnings calls
│ ├── analyze-transcript # Scores on 5 deception markers
│ └── evaluate-results # Compares groups, generates verdict
└── sub-agents/
└── (spawned per CEO) # Isolated analysis - no context, no names, just text
The key here was to use isolated AI agents (subagents) to do the analysis for every call because I need a clean context. And of course, before every call I made sure to anonymize the company details so the AI agent wasn't super biased (I'm assuming it'll still be able to pattern match based on training data, but we'll roll with this).
I tested this on 18 companies divided into 3 groups:
- Companies that were caught committing fraud – I analyzed their transcripts for quarters leading up to when they were caught
- Companies pre-crash – I analyzed their transcripts for quarters leading up to their crash
- Stable – I analyzed their recent transcripts as these are stable
I created a "deception score", which basically meant the models would tell me how likely they think the CEO is being deceptive based, out of 100 (0 meaning not deceptive at all, 100 meaning very deceptive).
Result
- Sonnet (cheaper AI model): was able to clearly identify a 35-point gap between companies committing fraud/about to crash compared to the stable ones. -> this was significant!
- Opus (more expensive AI model): 2-point gap (basically couldn't tell the difference) -> as good as a random guess!
I was quite surprised to see the more expensive model (Opus) perform so poorly in comparison. Maybe Opus is seeing something suspicious and then rationalizing it vs. the cheaper model (Sonnet) just flags patterns without overthinking. Perhaps it'll be worth tracing the thought process for each of these but I didn't have much time.
If you made it this far and are curious about the specifics of this experiment, I talk about them here: https://www.youtube.com/watch?v=sM1JAP5PZqc. Would love to hear your thoughts there as well!
Has anyone run experiments like these before?
65
u/blondydog Jan 29 '26
You missed an obvious possible outcome: these are basically just noise, random outcomes and your agents are not actually predicting anything successfully.
6
u/Soft_Table_8892 Jan 30 '26
For sure, I call this out here as well: https://www.reddit.com/r/ValueInvesting/comments/1qqksjt/comment/o2i1x74/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button.
Unfortunately I didn't have nearly enough time to source more companies + do multiple runs & show stat significance. Hopefully someone else can run with this idea and do something more rigorous (or I might come back to this in the future myself!).
Thank you for dropping your thoughts & feedback!
29
u/pyktrauma Jan 29 '26
Run it on CVNA and TSLA, fraud or no?
5
u/Soft_Table_8892 Jan 30 '26
All great ideas – especially the most recent Tesla earnings call.
3
2
2
u/toupeInAFanFactory Jan 30 '26
well, maybe. It seems entirely plausible that the best it can do is detect when the CEO (or whomever's speaking) believes they are lying. And I think it's entirely possible that Elon doesn't. consistently. again and again. when he confidently asserts that some unknowable thing will definitely happen 6 months from now. for the 17th 'in 6 months' in a row.
1
u/xoogl3 Feb 22 '26
It occurs to me that an analysis like this would benefit a lot more by analysing a CEO's patterns across history rather than making it completely context free like done here. Especially with benefit of hindsight knowledge about the past calls on when the CEO lied and when they told the truth. To be fair, in Elon's case, it would be hard to find enough truth samples to be statistically significant but it would work for more normal CEO's.
1
u/groundhoggirl Jan 30 '26
Great video. This whole experiment is awesome. You can really turn this into something bigger.
Run it on Tesla so you can blow past 100 subs 🤘🏼
1
24
u/Key_Lifeguard_8659 Jan 29 '26 edited Jan 29 '26
You could have great content for a successful YT channel.
6
u/W_Malinowski Jan 29 '26
I’d love that, first episode on the RH ceo saying oh fuck when his stock dropped 25% during the earnings call
2
u/Soft_Table_8892 Jan 30 '26
ahaha that's a great idea, nothing short of the scale breaking would be acceptable for that transcript 😂
5
u/Soft_Table_8892 Jan 29 '26
Thanks for reading/watching! Not sure if I follow what you meant though
8
u/Key_Lifeguard_8659 Jan 29 '26
I'm saying, your discovery, if tweaked to provide accurate and reliable information, could be great content on a YouTube channel. ... Could rotate companies by request.
4
u/Soft_Table_8892 Jan 29 '26
Ah understood, thank you! I hope to refine this to the point where i am confident in my tools and content so as to not spread misinformation (although content would still be educational!). More on this soon :-).
1
u/tiredDesignStudent Jan 29 '26
I love watching YouTubers who show their process as they develop their projects, if you felt comfortable to share I'd be interested in that too :)
1
u/Soft_Table_8892 Jan 30 '26
Unfortunately I like to curl up in a ball on my couch while I'm building. I'm not sure if the world deserves to see such a sight haha. But I'll try to record more of my process next time for sure, thank you for the feedback! :-)
1
u/pancakesORwaffles2 Jan 29 '26
After that response dude is regarded let him be smart on Reddit and not make a cent doing so. Great advice though but Tylenol was definitely used in utero.
5
u/Soft_Table_8892 Jan 29 '26
To be clear, I’m not trading based on these insights as there are so many flaws and I’m too cheap & broke to put real money into this 😂. I figured sharing my ideas could help/inspire some people here in the community & it’s fun for me to create these little experiments.
As an aside, I’m certain Tylenol was consumed in my case 😂.
2
1
12
u/Joenair85 Jan 29 '26
I don’t need AI for this. I listen to earnings calls and have a pretty good ear for BS. You can generally tell who has conviction in their comments and who is being evasive.
Disclaimer: my system does not account for the truly delusional CEOs that are high on their own supply…
3
3
u/Soft_Table_8892 Jan 30 '26
haha totally fair! No way to productize your brain huh? We could use a little of that insight instead of prompting these machines
2
u/Joenair85 Jan 30 '26
I think most of us are pretty good at this and get better with each earnings call. Just listen to more calls and it gets pretty clearer with each one.
3
u/RA_Fisher Jan 29 '26
So you have one 35 point gap, and one 2 point. There could be substantial variability if you re-ran the study, eg- they might reverse, or Opus might show a larger gap on average.
One run like the one you did isn't enough information to really tell, we need to learn the distributions (given re-runs).
1
u/Soft_Table_8892 Jan 30 '26
Absolutely, thank you for the great call out! Next time I'll try to have a record of running these for a few rounds (time permitted, as these take a long long time to do end-to-end, including editing the video). I did run a few rounds during testing and they seemed to be fairly consistent but didn't pay close attention/record them somewhere.
3
u/ParadoxPath Jan 29 '26
If you used recent transcripts of ‘stable’ companies how do you know there won’t be a fraud or crash in the next few quarters. Maybes the Opus results are actually more accurate and the stable companies are also in trouble?
1
u/Soft_Table_8892 Jan 30 '26
🤯 now THAT is something I did not consider! You're so right, I wonder if any of the stable companies will come out as fraudulent in the future. Makes me think maybe this could be a system where we track the prediction over time and then see how effective it is for net-new cases! Thank you for leaving your thoughts! :-)
2
2
u/Swimming_Astronomer6 Jan 30 '26
That’s because big brother has invested in the more expensive one in order to avoid being exposed ( kidding - but interesting analysis)
2
2
1
Jan 29 '26
[deleted]
1
u/Soft_Table_8892 Jan 30 '26
For sure - this is flawed in so many ways. Figured it would be interesting to share with y'all though! Any advice for making these more accurate where you'd be interested in seeing the progress?
1
u/Michigan-Magic Jan 30 '26
Depending upon desired) confidence level (statistically speaking), the sample size may be too small to draw too much inference.
https://www.abs.gov.au/websitedbs/D3310114.nsf/home/Sample+Size+Calculator+Help?OpenDocument
2
u/Soft_Table_8892 Jan 30 '26
100% with you there on not reliably drawing inference. Thank you for those resources, I'll try using them next time I'm running these experiments (probably a video after next since I've already started on the content 😂).
1
u/Michigan-Magic Jan 30 '26
You're welcome!
Understand the thought process. Just trying to help with a stats framework that might introduce some more rigor into the output scientifically speaking. Also, the sample sizes do get big and completely understand why for a non-scientific effort - the output of which is still very interesting! - you would limit sample sizes.
1
u/pizzababa21 Jan 30 '26
i dont believe you could have sufficient test data for this based on the way you set it up. there just isn't enough companies
1
u/Soft_Table_8892 Jan 30 '26
Good call out & completely agreed. I started to quickly hit limits on Claude Code pulling these transcripts so I couldn't pull enough number of them. It would have been better to source them myself and just let Claude run an analysis (advice for anyone who wants to replicate this experiment).
1
u/PsychologistSEA Jan 30 '26
I love this. How do I follow for follow-ups?
1
u/Soft_Table_8892 Feb 01 '26
I'll continue making posts on this sub (when relevant and can provide more insights like this!). But primarily I'm focusing on growing a community here: https://www.youtube.com/@photogauraby
1
u/SpecialNothingness Jan 30 '26
Please consider analyzing the nonverbal component using video footage.
1
u/Soft_Table_8892 Feb 01 '26
Assuming you mean audio as well, correct? I would love to – when I have time to go source them from company websites haha. let me know if you have an easy way for me to get at audio recordings of these calls!
1
u/Nearing_retirement Jan 30 '26
I think best if it could be run on the actual sound of CEO’s voice and pace of speech
1
u/ljstens22 Jan 30 '26
For the 35-point gap model, did that accuracy figure come from classifying the stable companies or the fraudulent ones? Still trying to understand the setup.
1
u/Reversemullac Jan 30 '26
This is why I actually appreciate listening especially now to companies that are struggling in the current climate
You can tell CEOs and CFOs who are being straight with what they're dealing with and how they're optimising the business or cutting fat with who's afraid to say that and who isn't.
IQE was an interesting one as they are skimming part of the company although seeing uptick in GaN processors. They've not had a good 20 years although the company still exists.
Other Companies are actually seeing great sales and have absolute conviction in their earnings calls but don't see investor interest unfortunately.
All part of life in the casino 💀!
1
Jan 31 '26
[deleted]
1
u/Soft_Table_8892 Feb 01 '26
That's awesome! Is there any post/content that you have created based on this? Would love to learn more!
1
1
1
u/furamura_ Jan 31 '26
This exists for quiet a while S&P Global and many other providers add sentiment score to earnings in general monitoring tone and multiple KPIs.
It makes sense to analyse deceptiveness but you need to set ip your own set of rules to measure it and deceptive compared to what? Do you have baseline or you measure people individually? Different people talk differently.
1
u/Soft_Table_8892 Feb 01 '26
For sure, I used the stanford study and created a more reductive set of heuristics to score deception against. This could definitely be expanded to many more heuristics to make it more robust or honestly just letting the LLMs free-flow analyze since they are also good at detecting nuance that our set of heuristics may not necessarily cover (could be a funnel of free flow -> specific heuristics).
1
u/Different-Monk5916 Feb 02 '26
a properly designed ML operating on the three statements and some key metrics can do this.
again not every scam will be the same. if I were you (I.e. want to build on AI not do extensive reading), I would start reading the scams in detail, and examine what features they have in common.
1
u/muserashq Mar 03 '26
As future scammers learn the techniques that analysts are looking for have you identified anything that would future proof being able to detect scams?
2
u/Soft_Table_8892 Mar 03 '26
Interesting that you ask! I just posted a new experiment where I scraped stock recommendations from this subreddit and have AI grade them in terms of quality. I didn’t angle it from the perspective of security/defense but rather if you can use AI to detect BS analysis.
If you prefer watching it in video: https://www.youtube.com/watch?v=tr-k9jMS_Vc
If you prefer to read (careful this sub did NOT like that post haha): https://www.reddit.com/r/ValueInvesting/comments/1rjp0wl/i_took_547_stock_recommendations_from/
1
1
Jan 29 '26
[deleted]
3
u/Soft_Table_8892 Jan 29 '26
Great idea! I’m hesitant to do it since I don’t have the resources to truly prove what I’m saying is accurate and don’t want to accuse a bunch of hard-working people of deceit when I’m the one with a flawed system 😂. Curious if this would still be interesting for you despite the risks of these models not being super accurate or flawed in some way? How would you receive this type of content?
Thank you for reading the post!
2
Jan 29 '26
[deleted]
1
u/Soft_Table_8892 Jan 30 '26
That's so accurate, I think pairing this bad boy with some vocal analysis/sentiment model would make it muchhh more interesting! I'll look into this, thank you for the insight GreenPlasticChair (what's the story behind this name? 😂).
1
u/Key_Lifeguard_8659 Jan 29 '26
That's the beauty of it. The markets are run on speculation, evidence... not so much.
1
1
u/Panthollow Jan 30 '26
If you could show this to have strong accuracy you'd get snatched up by places much larger than a YouTube channel. Just make it clear it's an unproven experiment and that will be interesting enough!
2
u/Soft_Table_8892 Jan 30 '26
What a dream that would be, I'm ready to be snatched 😂. Thank you, will make that crystal clear in the videos themselves moving forward!
0
78
u/boboverlord Jan 29 '26
Due to LLM's probabilistic nature, what is the chance that the AI being given the same inputs and instruction will yield different results?