NumPy for Traders
NumPy fundamentals for traders - creating arrays, indexing, vectorised math and the everyday functions, on simple price data.
- ·Creating arrays & helpers
- ·Shape, size & dtype
- ·Indexing & slicing
- ·Vectorised math & broadcasting
- ·Aggregations, masks & np.where
- ·2D arrays & the axis argument
Before you can compute returns, indicators or a backtest, you need the tool that does fast number-crunching in Python: NumPy (say "num-pie"). It's the engine underneath pandas, the indicator library and the backtester you'll meet later, so a little NumPy now pays off everywhere.
This chapter keeps it simple. We'll learn the core ideas on small, hand-typed arrays of Indian stock prices - no data download, no SDK, just import numpy as np and the basics every trader actually uses. Work through it once and the rest of the course will feel familiar.
Creating an array
An array is NumPy's version of a list - an ordered row of numbers - but built for maths. You make one from a Python list, and NumPy also gives you ready-made arrays for common needs.
# NumPy basics for traders -- everything starts with creating an array.
import numpy as np
# An array from a Python list: five daily closes of RELIANCE (in rupees).
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
print("From a list :", closes)
# Handy ready-made arrays:
print("arange :", np.arange(1, 6)) # 1..5, like range()
print("zeros :", np.zeros(5)) # five 0.0s (empty P&L slots)
print("ones :", np.ones(3)) # three 1.0s
print("full :", np.full(4, 100)) # four 100s (e.g. lot sizes)
print("linspace :", np.linspace(100, 200, 5)) # 5 evenly spaced, 100 to 200From a list : [1305. 1298.5 1310.4 1318.6 1312.7] arange : [1 2 3 4 5] zeros : [0. 0. 0. 0. 0.] ones : [1. 1. 1.] full : [100 100 100 100] linspace : [100. 125. 150. 175. 200.]
np.array([...]) wraps your own numbers; arange, zeros, ones, full and linspace build common patterns without typing every value.
Shape, size and dtype
Every array carries a few facts about itself. You'll check these constantly when something doesn't line up.
# Every array knows its own shape, size and data type.
import numpy as np
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
print("Array :", closes)
print("shape :", closes.shape) # (5,) -> 5 elements in one row
print("ndim :", closes.ndim) # 1 -> one dimension
print("size :", closes.size) # 5 -> total count
print("dtype :", closes.dtype) # float64 -> decimal numbers
qty = np.array([10, 25, 15])
print("int dtype:", qty.dtype) # int64 -> whole numbers (e.g. quantities)Array : [1305. 1298.5 1310.4 1318.6 1312.7] shape : (5,) ndim : 1 size : 5 dtype : float64 int dtype: int64
shape is its dimensions, ndim the number of them, size the total count, and dtype the kind of number inside - float64 for prices, int64 for whole quantities.
Indexing and slicing
You reach into an array by position, exactly like the lists from Chapter 2. The one you'll use most is [-1] - "the latest value".
# Reach into an array by position -- the same indexing you used for lists.
import numpy as np
# Ten days of NIFTY closing values.
nifty = np.array([25900, 25960, 25840, 25990, 26050,
26010, 25880, 25930, 26100, 26075])
print("First day :", nifty[0])
print("Latest day :", nifty[-1]) # -1 is "most recent" -- used constantly
print("Days 2 to 4 :", nifty[1:4]) # positions 1,2,3 (the stop is excluded)
print("Last 5 days :", nifty[-5:])
print("Every 2nd :", nifty[::2]) # a step of 2First day : 25900 Latest day : 26075 Days 2 to 4 : [25960 25840 25990] Last 5 days : [26010 25880 25930 26100 26075] Every 2nd : [25900 25840 26050 25880 26100]
A slice a[1:4] takes positions 1, 2 and 3 (the end is excluded), a[-5:] takes the last five, and a[::2] steps in twos.
Vectorised math and broadcasting
Here's the superpower. Do maths on a whole array in one stroke - no loop. Combine an array with a single number and NumPy "broadcasts" that number across every element; combine two equal-length arrays and it works element by element.
# Vectorisation: do the math on the WHOLE array at once -- no loops.
import numpy as np
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
# Broadcasting: combine an array with a single number.
print("After 5% rise :", np.round(closes * 1.05, 2))
print("Minus 10 rs :", closes - 10)
# Two arrays, multiplied element by element: price x quantity held.
qty = np.array([10, 10, 20, 20, 15])
value = closes * qty
print("Holding value :", value)
print("Total value :", value.sum())After 5% rise : [1370.25 1363.42 1375.92 1384.53 1378.34] Minus 10 rs : [1295. 1288.5 1300.4 1308.6 1302.7] Holding value : [13050. 12985. 26208. 26372. 19690.5] Total value : 98305.5
Whenever you're about to write a for loop over prices, pause - there's almost always a vectorised one-liner. On a year of data it can be hundreds of times faster, and it reads like the maths it represents.
The summary functions
These are the everyday questions you ask of any price series - the average, the extremes, the spread - each a single call.
# The summary functions a trader reaches for every day.
import numpy as np
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
print("mean :", round(closes.mean(), 2)) # average close
print("max :", closes.max()) # highest
print("min :", closes.min()) # lowest
print("std :", round(closes.std(), 2)) # spread -- a volatility proxy
print("median :", np.median(closes))
print("sum :", closes.sum())
print("cumsum :", np.cumsum(closes)) # running totalmean : 1309.04 max : 1318.6 min : 1298.5 std : 6.84 median : 1310.4 sum : 6545.2 cumsum : [1305. 2603.5 3913.9 5232.5 6545.2]
std (standard deviation) measures how spread-out the numbers are - your first taste of volatility - and cumsum gives a running total.
Boolean masks and np.where
Compare an array to something and you don't get one True/False - you get a mask, a True/False for every element. Use it to filter (keep only the True ones) or count, and use np.where to label each element in one go.
# Ask a yes/no question of every element, then filter or count.
import numpy as np
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
avg = closes.mean()
mask = closes > avg # True/False for each day
print("Above average?:", mask)
print("Those closes :", closes[mask]) # keep only the True ones
print("How many days :", mask.sum()) # True counts as 1
# np.where labels every element in one stroke (1 = above avg, 0 = not).
tags = np.where(closes > avg, 1, 0)
print("Tags :", tags)Above average?: [False False True True True] Those closes : [1310.4 1318.6 1312.7] How many days : 3 Tags : [0 0 1 1 1]
This filter-by-condition pattern is the literal seed of every trading signal you'll build later - an entry is just a mask that's True when your rule is met.
Popular functions to know
A short grab-bag you'll reach for again and again on price data.
# A grab-bag of popular NumPy functions you'll use on price data.
import numpy as np
closes = np.array([1305.0, 1298.5, 1310.4, 1318.6, 1312.7])
print("diff :", np.diff(closes)) # day-to-day change
print("abs :", np.abs(np.diff(closes))) # size of move, sign dropped
print("round :", np.round(closes, 0)) # round to whole rupees
print("sqrt :", np.round(np.sqrt(closes), 1))
print("maximum :", np.maximum(closes, 1310)) # floor each value at 1310
print("sort :", np.sort(closes)) # ascending
print("unique :", np.unique([5, 5, 10, 20, 20])) # distinct values, sorteddiff : [-6.5 11.9 8.2 -5.9] abs : [ 6.5 11.9 8.2 5.9] round : [1305. 1298. 1310. 1319. 1313.] sqrt : [36.1 36. 36.2 36.3 36.2] maximum : [1310. 1310. 1310.4 1318.6 1312.7] sort : [1298.5 1305. 1310.4 1312.7 1318.6] unique : [ 5 10 20]
diff gives day-to-day changes, abs drops the sign, round tidies decimals, maximum floors values, and sort/unique reorder and de-duplicate.
A quick look at 2D arrays
Real market data is often a table - several stocks over several days. That's a 2D array: rows and columns. The key new idea is axis: axis=1 works across each row, axis=0 down each column.
# A 2D array is a table: here three stocks (rows) over four days (columns).
import numpy as np
# rows = RELIANCE, TCS, INFY ; columns = Mon..Thu closes
prices = np.array([
[1305.0, 1298.5, 1310.4, 1312.7],
[2068.0, 2090.5, 2075.0, 2081.25],
[1052.0, 1041.3, 1055.8, 1048.5],
])
print("shape :", prices.shape) # (3, 4) -> 3 rows, 4 columns
print("INFY row :", prices[2]) # the third stock
print("Thu column :", prices[:, -1]) # last day, every stock
print("Per-stock avg:", np.round(prices.mean(axis=1), 1)) # across each row
print("Per-day avg :", np.round(prices.mean(axis=0), 1)) # down each column
print("Reshaped :", np.arange(6).reshape(2, 3))shape : (3, 4) INFY row : [1052. 1041.3 1055.8 1048.5] Thu column : [1312.7 2081.25 1048.5 ] Per-stock avg: [1306.6 2078.7 1049.4] Per-day avg : [1475. 1476.8 1480.4 1480.8] Reshaped : [[0 1 2] [3 4 5]]
prices[2] picks a row (one stock), prices[:, -1] picks a column (one day across all stocks), and reshape rearranges the same numbers into a new shape.
Try it yourself
- Change the numbers in
closesto a stock you follow and re-run the aggregations - is itsstdlarger or smaller than RELIANCE's? - In the mask example, switch
> avgto< avgto keep the below-average days instead. - In the 2D example, add a fourth stock as a new row and confirm the per-stock average still lines up.
Recap
- An array (
np.array) is a row of numbers built for maths;arange,zeros,ones,full,linspacebuild common ones. - Check
shape,ndim,sizeanddtypeto understand any array. - Index and slice by position -
[-1]is the latest,a[-5:]the last five,a[::2]every second. - Vectorised math and broadcasting apply an operation to the whole array at once - no loops.
- Summaries (
mean,max,min,std,median,cumsum) and handy functions (diff,abs,round,maximum,sort,unique) turn raw numbers into answers. - Boolean masks +
np.whereask a question of every element - the seed of every signal. - 2D arrays are tables; the
axisargument chooses rows (axis=1) or columns (axis=0).
That's the fast-math foundation. Next we wrap these arrays in pandas - labelled tables with dates, rolling windows and group-bys - where real market analysis gets comfortable.