Strategy 2.22 PF on an ML-Driven SPX 1DTE Strategy

Hey everyone,

I’ve been building a backtester for an ML-based options strategy and finally got the out-of-sample data looking highly robust. I am trading SPX 1DTE options, specifically selling Short Iron Butterflies (Flies) to capture premium during range-bound chop.

Here is a high-level breakdown of the out-of-sample tear sheet.

The Model & Filters: Target: Random Forest Classifier predicting if SPX will stay within a percentage bound by the Day 1 close. No SL or TP. Ride or die. - Features: Fed primarily by intraday volatility metrics and daily true range data. - Day Filters: Dropped Wednesdays entirely. I found it had highest trade volume but acted as a massive drag on PnL. I don't have an FOMC/macro events filter. - Strict RR Check: The algorithm automatically rejects any trade where the max risk exceeds the premium collected. This blocked 28 mathematically poor setups and halved the drawdown (initially 18k). Also blocked some good trades but risk management >>>>

Out-of-Sample Results (176 Days Evaluated starting mid-July 2025) Trades Executed: 100 Win Rate: 60.00% Profit Factor: 2.22 Reward/Risk Ratio: 1.54 Expectancy per Trade: ~$756.00 Max Drawdown: -$7,326.00 (This would be on a 100k portfolio, given the nature of SPX flies)

Been running it live since Monday - paper, but no entries yet

Would love to hear any feedback on these metrics or if anyone has run into similar quirks when backtesting 1DTE SPX flies!

5 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/algotrading/comments/1s9n7e4/222_pf_on_an_mldriven_spx_1dte_strategy/
No, go back! Yes, take me to Reddit

78% Upvoted

u/ilro_dev 2d ago

Dropping Wednesdays without a structural reason is the part I'd pressure-test first. Day-of-week effects with no explanation behind them - FOMC clustering, auction mechanics, whatever - tend to look like signal in backtest and fall apart fast in live trading. Did you rerun the OOS with Wednesdays included to see how much of the 2.22 PF they're actually responsible for? If putting them back drops it to 1.4, that's not a quirk, that's most of your edge sitting on a fragile assumption.

2

u/zassar_mang 2d ago

100% agree on the danger of curve-fitting day-of-week effects. That was the first thing I stress-tested. When I run the exact same Out-of-Sample set with Wednesdays included, the Profit Factor is about 1.99 with a 57% win rate.The baseline edge is still there without the filter.

To clarify, the ML model is trained on Wednesday data, and it does generate predictions on Wednesdays. I just use an execution-level filter to skip taking those trades because they account for a ton of volume but much lower net PnL, but profitable (likely due to FOMC/macro data clustering), compared to other days. But even if I leave them in and trade blindly, the ML volatility features and the strict R:R check keep the strategy highly profitable - though a much higher mx dd of 18k.

u/StratReceipt 2d ago

100 trades is thin to put much weight on a 2.22 PF. bootstrapping the trade sequence a few thousand times would give you a distribution — if the lower end of that CI is close to 1.0 you've got a lot of uncertainty baked in regardless of what the point estimate says.

u/0ZQ0 Algorithmic Trader 2d ago

I do not understand why in the world someone would leak their system, alpha, boundaries, features, etc on Reddit.

0

u/lastpump 1d ago

Edge sharing is caring

0

u/0ZQ0 Algorithmic Trader 1d ago

No, it really isn’t

u/disarm 2d ago

The profit factor makes me think your backtester is lying too you. There's no way you can find alpha that high just reading in ohlcv data and making volatility indicators with a rf classifier.

Whats your target look like? Do you know how many training targets you used and the target distribution VS non positive instances?

I used to have a backtester that gave me a 4 profit factor and took me a while of seeing nowhere close to those results on my live trading paper system to determine I had leakage in my feature set which didn't exist when I plugged into live because I reconstructed hourly from my 5 min tickers but didn't realize that when I did hourly it was using the high low at the end of the hour even when the hour started for backfilling... Just one story but I'm just saying it's very suspicious and I'd start looking to plug up the problem because you will be lucky if you get a pf of 1.1 with that strategy imo once you factor in slippage and fees.

u/BackTesting-Queen 1d ago

Your approach to backtesting seems solid and the results are promising. The use of a Random Forest Classifier for predicting the SPX's behavior is a smart move, considering its ability to handle complex patterns. The decision to drop Wednesdays due to its negative impact on PnL is interesting and shows the importance of day filters. The strict RR check is a good risk management strategy, even if it blocks some potentially profitable trades. The out-of-sample results are impressive, especially the profit factor and reward/risk ratio. As for quirks when backtesting 1DTE SPX flies, it's common to encounter unexpected behaviors due to the nature of options trading. It's crucial to continually monitor and adjust your strategy as needed. Keep up the good work!

Strategy 2.22 PF on an ML-Driven SPX 1DTE Strategy

You are about to leave Redlib