Statistical Arbitrage
For traders who want the real thing, not the pitch. Cointegration and pairs, baskets, dynamic hedges and market-neutral books, on NSE equities, with every result shown gross and net, in-sample and out-of-sample. Brutally honest about the line between a statistical relationship and a tradable edge.
Foundations and Honesty
What stat arb really is, the data it stands on, and the statistics that decide whether a relationship is real.
What Statistical Arbitrage Really Is
Trade a stationary relationship, not a price. The family tree from single-name reversion to pairs, baskets and cross-sectional books, why the edge has decayed, and an honest setting of expectations.
Building the Historical Database
Download once, read fast. How the local DuckDB cache of 50 NSE names plus the index is built with OpenAlgo, verified for gaps, and kept current, so every later chapter reads in milliseconds.
The Data Layer and Its Biases
Most stat-arb results die from data problems before they die from bad statistics. The aligned price panel, liquidity filters, survivorship and symbol-change bias, and adjusted versus tradable close.
Research vs Real Markets
A long/short equity backtest is a research abstraction. The short leg in India needs a real vehicle (intraday cash, SLB, or a stock-futures proxy), and the gap between a statistical relationship and a tradable edge.
Stationarity and Random Walks
Why a single stock cannot be traded as a mean-reverter. Random walk versus stationary, prices versus returns, the ADF and KPSS tests done properly, and the Hurst exponent, on real NSE names.
Correlation Is Not Cointegration
The lesson almost everyone gets wrong, computed live on NSE data: the most correlated pairs need not be cointegrated, and a less correlated pair can be. Spurious regression and why correlation traders blow up.
Cointegration Mechanics
The machinery end to end on a real cointegrated NSE pair: Engle-Granger, the hedge ratio (OLS versus total least squares), the spread, and the Ornstein-Uhlenbeck half-life that says how long you would hold.
Building a Pair, Then Breaking It
Find a pair without fooling yourself, build the signal, watch the pretty backtest, then take it apart honestly.
Finding Pairs Without Fooling Yourself
Economic prior first, then the cointegration scan, then the multiple-testing trap: scanning many pairs guarantees false positives. Bonferroni and false-discovery control, and how few survive out-of-sample.
The Spread and the Signal
The z-score strategy on one real pair: the spread, rolling and robust z-scores, entry, exit and stop, rupee-neutral sizing, and a first in-sample backtest with next-bar fills that looks great, on purpose.
The Brutal Reality Check
Take the pretty backtest apart: out-of-sample collapse, realistic NSE costs net of gross, the spread de-cointegrating, and look-ahead, with the losing net curve shown, not hidden.
Doing It Properly
Dynamic hedges, baskets, cross-sectional books, and the risk and portfolio construction that keep a book alive.
Dynamic Hedge Ratios with the Kalman Filter
A static hedge ratio is wrong because the relationship drifts. A hand-rolled Kalman filter for a time-varying hedge, and the honest finding that more flexibility can lose to a simple static beta.
Baskets and the Johansen Test
Beyond two names: cointegrating vectors among several stocks with the Johansen test, choosing the rank, reading the basket weights, and how unstable the vector is out-of-sample.
Cross-Sectional, Factor-Neutral Stat Arb
The version that scales: remove market and sector returns, rank by short-term residual reversal, and build a market-neutral long-short book across the universe, and how turnover and costs invert the gross edge.
Risk, Sizing and Portfolio Construction
From signals to a survivable book: beta versus rupee neutrality, volatility targeting, covariance shrinkage, the stop-loss-on-a-spread dilemma, risk contribution, and a rule for retiring a broken pair.
Research to Reality
Validate without fooling yourself, understand what a live implementation needs, and put it all together.
Honest Backtesting and Validation
The full validation scorecard: walk-forward, purged and embargoed cross-validation, the deflated Sharpe, the probability of backtest overfitting, a blocked bootstrap, and parameter-stability maps.
Implementation Pathways
What a live version actually needs: the leg vehicles, legging risk and the two-legged fill problem, impact estimated from intraday bars, participation caps, monitoring and the kill switch, and the research-to-live gap.
Capstone: The Honest Stat-Arb Workflow
Everything together as one disciplined workflow, from universe to a validated, cost-aware, research-grade market-neutral model, with the pre-live checklist and the companion notebooks to run it all yourself.
For education only - not investment advice. A research-grade model on NSE equity data, with honest notes on what it takes to trade it. 17 chapters, built on the OpenAlgo SDK.