A primer for the animal behaviour scientist
Tesler’s theorem:
“AI is whatever hasn’t been done yet.”
Figure from stackoverflow.com
Figure from Waymo
Figure from Deep Learning book
Figure from Deep Learning book
Figure from Deep Learning book
Variables that we are able to observe.
Extraction of increasingly abstract features.
“Hidden” (i.e., not observable)
Recognition of objects in the image
Figure modified from Deep Learning book
Digit recognition task as an intuitive problem
Multilayer perceptron
The simplest neural network
Input layer
Output layer
Hidden layers
→ np.reshape(28*28, 1)
→ 2
Hidden layers
\[
f(x_1, x_2, x_3, \ldots) = \mathbf{\color{rgb(225, 174, 65)}h}
\]
\[ \mathbf{\color{rgb(225, 174, 65)}h} \]
Compute weighted sum \[ {\color{rgb(225, 65, 185)}\Sigma} = {\color{rgb(137, 225, 65)}w_1} x_1 + {\color{rgb(137, 225, 65)}w_2} x_2 + {\color{rgb(137, 225, 65)}w_3} x_3 \]
Apply non-linearity \[ \mathbf{\color{rgb(225, 174, 65)}h} = max({\color{rgb(225, 65, 185)}\Sigma}, 0) \]
\[ \mathbf{\color{rgb(225, 174, 65)}h} \]
Compute weighted sum \[ {\color{rgb(225, 65, 185)}\Sigma} = {\color{rgb(137, 225, 65)}w_1} x_1 + {\color{rgb(137, 225, 65)}w_2} x_2 + {\color{rgb(137, 225, 65)}w_3} x_3 \]
Apply non-linearity \[ \mathbf{\color{rgb(225, 174, 65)}h} = max({\color{rgb(225, 65, 185)}\Sigma} + {\color{rgb(213, 24, 24)}b}, 0) \]
In an MLP:
\[ \small \qquad \color{rgb(0, 0, 255)}{h_0^1} = ReLU(\sum_{i=0}^{n} \color{rgb(137, 225, 65)}{w_i^{0,1}} \color{rgb(173, 216, 230)}{h_i^0} + \color{rgb(213, 24, 24)}{b_0^1}) \]
\[ \scriptstyle ReLU\left( \begin{bmatrix} w_{0,0} & \cdots & w_{0,n} \\ \vdots & \ddots & \vdots \\ w_{k,0} & \cdots & w_{k,n} \end{bmatrix} \begin{bmatrix} h_0^0 \\ \vdots \\ h_n^0 \end{bmatrix} + \begin{bmatrix} b_0^1 \\ \vdots \\ b_k^1 \end{bmatrix} \right) = \begin{bmatrix} h_0^1 \\ \vdots \\ h_k^1 \end{bmatrix} \]
\[ ReLU(\color{rgb(137, 225, 65)}{\mathbf{W}'} \color{rgb(173, 216, 230)}{h^{0^{\prime}}} ) = \color{rgb(0, 0, 255)}{h^1} \]
\[ \color{rgb(255, 176, 0)}{y} = \color{rgb(143, 204, 143)}{W_1} \color{rgb(92, 144, 224)}{ReLU({W_0}x)} \]
→
\[ \scriptstyle ReLU(W h^n) = h^{n+1} \]
\[ \scriptstyle ReLU(\color{rgb(255,0,0)}{W} h^n) = h^{n+1} \]
→ 2
How to choose them?
( , 2)
Labelled data
Supervised learning
( , …)
Training set
→
→ 7?
→ …
Test set
→
TRAINED
→
Accuracy
\[ \tiny \widehat{p} (z_{j}) = \frac{e^{z_j}}{\sum_{k} e^{z_k}} \]
\[ \tiny \text{H}(p, q) = -\sum_{k} p_k \log(\widehat{p}_{k}) \]
Loss: \[ \tiny \text{L}_{i} = - \log(\widehat{p} (z_{j=y_i}) ) \]
\[ \tiny \text{L} = \frac{1}{N} \sum_{i} \text{L}_{i} \]
The loss function as a high-dimensional “surface”.
The gradient is a vector that at any point in the loss “surface” gives us the direction of steepest ascent.
The negative gradient gives us the direction of steepest descent.
Gradient descent is an optimisation procedure that iteratively adjusts the parameters based on the gradient.
Until when?….
Figure from Kaggle tutorial: Overfitting and Underfitting
To update the parameters we take a small step in the direction of the negative gradient. \[ W_{new} = W_{old} - \alpha \nabla \text{L}_W \]
Stochastic gradient descent is a more efficient variant of gradient descent which computes the gradient on batches of training samples.
An epoch is a single pass through the complete training set. A training process will consist of multiple epochs.
In training: forward and backward pass
In testing and inference: only forward pass
Figure from CS4780 lecture notes
Figure from cs231n.github.io/convolutional-networks
Three common types:
Three common types:
Figure modified from CS231n
Figure modified from CS231n
Figure from Convolution arithmetic
Figure modified from CS231n
A few hyperparameters:
Stride = 1. Figure from Convolution arithmetic
Stride = 2. Figure from Convolution arithmetic
Three common types:
ImageNet 2014 challenge (1000 categories)
Figure from Neuralception
Figure from LearnOpenCV
Random shifts. Figure from https://github.com/Prachi-Gopalani13/Image-Augmentation-Using-Keras
Random rotations. Figure from https://github.com/Prachi-Gopalani13/Image-Augmentation-Using-Keras
Random flips. Figure from https://github.com/Prachi-Gopalani13/Image-Augmentation-Using-Keras
What are tasks?
Figure from Snapshot Serengeti dataset
Figure from https://dattalab.github.io/moseq2-website/
Figure from Happy Whale dataset
animals-in-motion.neuroinformatics.dev | 2025