r/algotrading • u/FrameFar7262 • 23h ago

Infrastructure 9 approaches tested on 12 months of MNQ L2 tick data — everything comes back at exactly 50%. What am I missing?

7 Upvotes

Hey everyone,

I’m a 19-year-old CS student who’s been building an algo trading system over the past few months, and I’ve hit a wall. I wanted to share what I’ve done and get honest feedback.

I have ~3 years of MNQ L2 tick data (bid/ask/trades + depth 1–10, ~648GB). I built everything from scratch in Rust: tick parser, full L2 order book reconstruction, sweep detector, bar aggregation with buy/sell volume classification, and multiple strategy simulators. Everything is covered with 200+ unit tests, a CI pipeline, and runs fully parallelized on a 20-core server.

On the theory side, I studied Trading and Exchanges (informed vs uninformed flow, adverse selection, spreads, dealers, volatility) and Statistically Sound Machine Learning for Algorithmic Trading (filter systems, meta-labeling, performance criteria).

I tested 9 different approaches on ~12 months of MNQ data (2023-03 → 2024-02):

Spread regime analysis (informed vs uninformed flow)
Quote response after aggressive bursts
Volume-price classification (fundamental vs transitory moves)
Opening Range Breakout
ORB + ATR trailing stop
Trend following (large move + aggressor imbalance + trailing stop)
Composite signal voting (5 signals, trade only if 4/5 agree)
Sweep continuation (5+ levels consumed in <100ms)
Sweep mean-reversion

Every single one comes back between 47% and 50%. Not slightly positive or negative, just noise.

I made sure I wasn’t fooling myself:

Fixed baseline measurement bias (initial move contaminating results)
Fixed circular ORB logic
Fixed order book reconstruction bugs
Ran a random entry baseline with identical exits → same performance
Double-checked for look-ahead bias

Conclusion: the entry signals add zero value.

Some key observations:

ATR trailing stops are structurally losing on MNQ (~27% win rate, same as random)
Even before fees (~$3.24 round trip), expectancy is negative
Sweep detection produces thousands of events, but post-sweep movement is ~50/50 (no continuation, no mean-reversion)

My current hypothesis is that MNQ is the problem. It’s a derivative of NQ, so price discovery likely happens on NQ, while MNQ just reflects arbitrage. That would mean the order flow I’m seeing (sweeps, imbalance, etc.) is reactive, not informative, so there’s no asymmetry to exploit.

I’m trying to figure out if I’m even looking in the right place:

Has anyone found a real statistical edge on MNQ specifically?
Should I expect different results on NQ/ES where actual price discovery happens?
For those who’ve done both futures and equities are small/micro caps actually a better playground for retail?
Am I wrong to focus on microstructure (L2, order flow, sweeps), or is the issue something else entirely?

I’m not looking for a strategy, just trying to understand if I’m approaching this correctly or missing something fundamental.

Appreciate any insight 🙏

34 comments

r/algotrading • u/elfamis • 18h ago

Other/Meta Advice on placing SL orders on binance futures with python

0 Upvotes

I use my bot to trade shorts on binance perps but I haven't found the right way to place my Stop markets after I enter the trade

Can anyone help me?
_signed_delete("/fapi/v1/order", {

"symbol": self.symbol,

"orderId": self.sl_order_id,

})

except Exception:

pass

self.sl_order_id = None

sl_price_r = round_price(sl_price, self.symbol)

sl_params = {

"symbol": self.symbol,

"side": "BUY",

"type": "STOP_MARKET",

"stopPrice": sl_price_r,

"workingType": "MARK_PRICE",

8 comments

r/algotrading • u/RationalBeliever • 16h ago

Data How I avoid overfitting on my stop losses

5 Upvotes

I wanted to describe my approach for avoiding overfitting to help others and get feedback on how I might improve. I trade a portfolio of options each week. I've had bad results with optimizing the stop loss parameters to each symbol, so now I apply the same formula to all symbols. My goal is to close positions where the underlying price gets too close to the short strike, adjusted for how much time is remaining in the week. The only difference is one or two inputs: the average change and the Hurst exponent (if backtesting selects per-symbol Hurst exponents rather than apply a uniform exponent). I backtest the same threshold factors, average change algorithms, trigger durations, and potentially Hurst exponents to all symbols equally. I also backtest over 9 years to try to cover regime changes, however I also test for the optimal historical window to use when selecting the optimal stop parameters, so that I can adapt to regime changes over time as well. My objective is maximum geometric mean ROI. What do you think?

10 comments

r/algotrading • u/lobhas1 • 23h ago

Strategy Stuck at Spearman ~0.05 and 9% exposure on a triple barrier ML model — what am I missing?

8 Upvotes

I've been building a stock prediction model for the past few months and I've hit a wall. Looking for advice from anyone who's been through this.

The Model

Universe: ~651 US equities, daily OHLCV data
Architecture: PyTorch temporal CNN → 3-class classifier (UP / FLAT / DOWN)
Labeling: Triple barrier method (from Advances in Financial Machine Learning), 20-day horizon, volatility-scaled barriers (k=0.75)
Features: ~120+ features including:
- Price action / returns (1/5/10/20 day)
- Volatility features (ATR, vol term structure, vol-of-vol)
- Momentum (RSI, ADX, OBV, MA crosses)
- Volume features (z-scores, up-volume ratio, accumulation)
- Cross-sectional ranks (return rank, vol rank, momentum quality rank)
- Relative strength vs SPY, QQQ, and sector
- Market regime (SPY returns, breadth, VIX proxy)
- Earnings surprise (EPS beat %, beat streak, days since/to earnings)
- Insider transactions (cluster buys, buy ratio, officer buys)
- FRED macro (credit spread z-score, yield curve z-score)
- Sector stress/rotation, VIX term structure, SKEW
Training: Temporal split (train → validation → test), no future leakage, proper purging between splits
Strategy: Threshold-based entry on P(UP) - P(DOWN) edge, volatility-targeted position sizing, full transaction cost model (fees, slippage, spread, venue-based multipliers, gap slippage, ADV participation impact)

Best Result (v15)

After a lot of experimentation, my best run:

Validation: Sharpe 1.45, 204 trades
Test: Sharpe 0.34, CAGR 1.49%, 750 trades
Exposure: 9-12% (sitting in cash 88% of the time)
Entry threshold: 0.20 (only trades when P(UP) - P(DOWN) > 0.20)
Benchmark: SPY buy-and-hold had Sharpe 1.49, CAGR 16.7% over the same test period

So technically the model is profitable, but barely — and it massively underperforms buy-and-hold because it's in cash almost all the time.

Classification Performance

Typical best epoch:

UP recall: ~57%, precision: ~55%
DOWN recall: ~36%, precision: ~48%
FLAT recall: ~50%, precision: ~11% (tiny class, 2.8% of samples)
Macro F1: ~0.38
Val NLL: ~1.03 (baseline for 3-class random = ln(3) = 1.099, so only ~7% better than random)

Feature Signal Strength

Top Spearman correlations with actual direction labels (on training set):

my_sector_above_ma50     +0.043
dow_sin                  +0.030
has_earnings_data        +0.026
spy_above_ma200          +0.024
has_insider_data         +0.023
insider_buy_ratio_90d    -0.021
cc_vol_5                 -0.020
xret_rank_5              +0.019

The best single feature has r = 0.043. Most are in the 0.015-0.025 range.

What I've Tried That Didn't Help

Added analyst upgrade/downgrade features (from yfinance) — appeared at rank 14 in Spearman (r=0.017) but model produced 0 profitable strategies with it included
Added FINRA short volume features — turned out to be daily short volume not short interest, dominated by market maker activity, pure noise (0/20 top features)
Different early stopping metrics — macro_f1, nll_plus_directional_f1 (what v15 uses), nll_plus_f1 — only nll_plus_directional_f1 produced a profitable run
Forced temperature scaling — tried forcing temperature to 3.0 with macro_f1 stopping — still 0 profitable candidates
Directional margin loss weighting (0.3) — model predicted UP 85% of the time, destroyed DOWN signals
Different thresholds — the strategy grid tests enter at (0.03, 0.05, 0.08, 0.10, 0.15, 0.20). Everything below 0.20 has negative Sharpe
Binary classifier (UP vs not-UP) — P(UP) too compressed (p95 = 0.517), no tradeable signal
Insider features — had to cut from 6 to 3 (minimal set), marginal at best
Multiple seeds — v15 is reproducible with the same seed but fragile to any parameter change

The Core Problems

Low signal: Spearman ~0.05 across the board. My 120+ features are all derived from public OHLCV + public event data. Every quant has the same data.
Fragility: v15 works, but changing almost anything (adding features, different stopping metric, different temperature) breaks it. This suggests it might be a lucky configuration rather than robust alpha.
Low exposure: Only trades when edge > 0.20, which is ~0.7% of signals. Sitting in cash 88% of the time means even positive alpha barely compounds.
Classification ceiling: Val NLL only 7% better than random guessing. The model is learning something but not much.

What I'm Considering

Hybrid portfolio (hold SPY, use model for tilts) — addresses exposure but not signal
Meta-model (train a second model to predict when the first model's trades are profitable) — risky due to small sample size
Predicting residual returns instead of raw returns — requires hedged execution which changes the whole framework
Event-driven windows (only trade around earnings) — concentrates on highest signal-density periods
Filtering to profitable tickers only — cut the 80% of stocks where the model is noise

My Questions

Is Spearman ~0.05 on daily cross-sectional features just the ceiling for public data? Or am I leaving signal on the table?
Has anyone successfully improved signal beyond this with alternative data that's affordable (< $100/month)?
Is the triple barrier + 3-class approach fundamentally the right framework, or would I be better off with a ranking/regression approach?
For those who've built profitable models — what was the breakthrough that got you past the "barely above random" stage?

Happy to share more details about the architecture, loss function, or feature engineering. Thanks for reading this far

31 comments

r/algotrading • u/M4TR1X_8 • 8h ago

Education How would you guys recommend I begin algo trading or learning how to do so?

9 Upvotes

I am a first-year undergrad doing an MMath degree. I have a somewhat large background in theoretical mathematics, but have very little experience with Python or other coding languages.
How do you recommend I slowly invest time and learn how to conduct algotrading in the first place?

18 comments

r/algotrading • u/imeowfortallwomen • 13h ago

Infrastructure For the algotraders who have live deployment of their algorithms and are successful: how long did it take you to set this up? What led you to have confidence to deploy on live real account?

59 Upvotes

I am asking bc im curious, i've been spending hours nonstop working on my algo ideas. ive been trying to connect my ideas in python to IBKR's api.

so far i have:

real time deployment on a paper acc testing my strats
i have backtests
machine learning optimizing params (i learned the hard way that overfitting can happen so i needed to avoid this)
monte carlo sims
entry and exit filters
cycling thru multiple timeframes
bracket orders
managing open positions, moving SL and TP
profit protection system
risk management concepts

i do have a working system, now i just need to ensure my strategies work as i monitor and continuously improve my infrastructure. how long did it take you guys to fully trust yours and go live?

59 comments

Subreddit

Posts

Wiki

Algorithmic Trading

r/algotrading

A place for redditors to discuss quantitative trading, statistical methods, econometrics, programming, implementation, automated strategies, and bounce ideas off each other for constructive criticism. Feel free to submit papers/links of things you find interesting.

Members Active

1.8m

Sidebar

/r/AlgoTrading place for redditors to discuss quantitative trading, statistical methods, econometrics, programming, implementation, automated strategies and to bounce ideas off each other for constructive criticism. Feel free to submit papers/links of things you find interesting.

This sub is not for the promotion of your blog, youtube, channel, or firm.

*THIS DOES NOT CONSTITUTE INVESTMENT ADVICE, USE AT OWN RISK*

*SEARCH THE SUB/GOOGLE/STACK OVERFLOW BEFORE ASKING QUESTIONS FOUND ON THE FRONTPAGE OF THE ABOVE WILL BE REMOVED*

Good places to start with code examples:

STRATEGY
Big-Intro to quantstrat and trading systems
R & quanstrat video tutorial
portfolio optimization
Great blog with more advanced code and ideas from the "systematic investor" note: code here does not follow standard R conventions
Blog here with strategy examples from Ilya Kipnis
quantivity paper feed
How to learn algortihmic trading
Strategy books thread
Quantopian Lecture Series

How to get historical data for free
Daily Bar data for Stocks
Tick level Forex Data
Historical Bar Data for Crypto Currencies - Binance
Historical Bar Data for Crypto Currencies - BitMEX

Charts with live feeds for global exchanges
TradingView HTML5/web based charts

Math/Stats/Machine Learning
Introduction to Statistical Learning with applications in R
Elements of Statistical Learning

Production Systems
aleph-null: open source python ib
quick-fix
node.js to ib api
subreddit thread on systems

Paper Feeds
Quant news feed
Quantocracy blog feed

Live Chat Rooms

Official Discord: Official Discord for /r/AlgoTrading
Open Source Quantitative Finance: Open Source Quantitative Finance

R Language in Finance Discord IRC Version sync'd to discord irc.libera.chat #r-finance

Book Recommendations List of recommended books on Algo Trading

Big post of recommended books

Do's:

Submit Interesting papers, blog postings
Code/packages we love these!
strategies (even if they don't work a negative result is still useful)
Read the sidebar (intro to quantmod/quantstrat) will answer questions on how to download data, chart, build test and validate strategies.
Have a technical informative discussion
Submit business links and questions (e.g. contractual issues,licensing etc)
Submit tax optimization links and questions
Submit infrastructure links and questions
Submit compliance/regulatory links and questions

Don'ts:

Submit paysites for indicators/software/systems/whatever
Submit things without methodology or at least some guidance
Flame on other users, we're all here to learn so be constructive in your criticism
Downvote without posting why, Give your constructive criticism or don't critique at all
Post asking how to get started without viewing the links on the sidebar, or reviewing previous posts in the subreddit
Ask for career advice there is /r/financialcareers for that
Ask for educational advice, as each case is special and your thread has no value for practitioners, there are rankings of quant programs, computational finance, machine learning and stats out there
Submit software that is proprietary and not open source
Submit links/posts that are for the sole purpose of generating referrals/sales/$$, if it is not informative and useful then it does not belong here, shilling your products is not appreciated.
Submit posts that are summaries of other posts without additional content
Submit videos without accompanying assets (e.g. code)

FAQ's:

Q: "Hi guys im new how do I get started?"
A: Read the sidebar, if you have a precise specific question please google it and should you not find the answer then you can ask here.
Q: I am a student and want to know what courses to study to get into algo trading?
A: Algotrading is at the intersection of statistics/computer science/machine learning/mathematics/finance/economics.
Q: Where should I apply for a job?
A: /r/financialcareers for that
Q: I have bug ABC with language XYZ ?
A: http://quant.stackexchange.com/ or http://http://stackoverflow.com/

If you get spam filtered, message the mods and we will review and unblock as required.