<style>
.reveal { font-size: 32px; }
.reveal h1 { font-size: 1.8em; }
.reveal h2 { font-size: 1.3em; }
.reveal h3 { font-size: 1.1em; }
.reveal pre { font-size: 0.75em; }
.reveal table { font-size: 0.85em; }
.reveal blockquote { font-size: 0.9em; }
</style>
# The Perceptron in the AI Landscape
### Where today's session fits in modern AI
*Neural Networks - ARTI β Session 2*
---
## Today's roadmap
1. πΊοΈ **The AI landscape** β where does ML live?
2. π§ **Machine Learning** β learning from data
3. β‘ **The perceptron** β today's focus
4. π **Neural networks** β what comes next
5. π **The modern world** β LLMs, CNNs, RL
---
<!-- .slide: data-background="#0d1117" -->
## What is AI?
> Any technique that allows machines to **mimic** aspects of human intelligence
```
βββ AI βββββββββββββββββββββββββββββ
β βββ Machine Learning ββββββββββ β
β β βββ Deep Learning ββββββββ β β
β β β Foundation Models β β β
β β ββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββ
```
Note: Not all AI is ML β expert systems and logic solvers are AI too. But all modern breakthroughs are ML-based.
---
## A brief history
| Era | Paradigm | Example |
|-----|----------|---------|
| 1950sβ80s | Symbolic AI | Expert systems |
| **1957** | **Perceptron β** | **Rosenblatt** |
| 1980sβ2000s | Statistical ML | SVM, Trees |
| 2006+ | Deep Learning | CNNs, RNNs |
| 2017+ | Transformers | BERT, GPT |
| 2022+ | Foundation Models | ChatGPT, Claude |
> The perceptron is the **ancestor** of all neural approaches
---
<!-- .slide: data-background="#0d1117" -->
## ML: the core idea
**Classical programming**
```
Rules + Data β Output
```
**Machine Learning**
```
Data + Labels β Rules (learned automatically)
```
> Instead of *writing* the rules, we *learn* them from examples
Note: The perceptron is the simplest implementation of this idea.
---
## Three families of ML
| | Supervised | Unsupervised | Reinforcement |
|--|--|--|--|
| **Data** | Labeled | Unlabeled | Rewards |
| **Goal** | Predict | Find structure | Maximize score |
| **Example** | Spam filter | Clustering | Game agent |
| **Perceptron** | β
| β | β |
The perceptron is a **supervised binary classifier**
---
<!-- .slide: data-background="#0d1117" -->
## Where the perceptron fits
```
Supervised Learning
βββ Linear models β TODAY
β βββ Perceptron β SESSION 2
β βββ Logistic Regression
β βββ Linear SVM
β
βββ Non-linear models β LATER
βββ Neural Networks
βββ Random Forest
βββ XGBoost
```
Note: The perceptron is the simplest linear classifier. Its limits motivate everything that comes after.
---
## β‘ The perceptron β Session 2
A perceptron takes inputs, computes a weighted sum, and fires or not:
$$z = w_1 x_1 + w_2 x_2 + b$$
$$\hat{y} = f(z)$$
| Symbol | Role |
|--------|------|
| $w_i$ | weight β importance of input $i$ |
| $b$ | bias β shifts the threshold |
| $f$ | activation function |
Note: This is what you will implement today.
---
## Geometric view
A perceptron draws a **line** (or hyperplane) in the input space:
$$w_1 x_1 + w_2 x_2 + b = 0$$
- Points on one side β **output 1**
- Points on the other side β **output 0**
> This is called a **decision boundary**
> Changing $w$ rotates it β changing $b$ shifts it
---
<!-- .slide: data-background="#0d1117" -->
## Activation functions
| Name | Formula | Output |
|------|---------|--------|
| Step | $\mathbf{1}_{z \geq 0}$ | $\{0, 1\}$ |
| Sign | $\text{sgn}(z)$ | $\{-1, +1\}$ |
| Sigmoid | $\frac{1}{1+e^{-z}}$ | $(0, 1)$ |
| ReLU | $\max(0, z)$ | $[0, +\infty)$ |
You will use **all four** in today's notebook
Note: Step and sign are for classic perceptrons. Sigmoid and ReLU are what modern networks use β they're differentiable, which enables learning.
---
## The perceptron's limit
Some problems have **no** separating line β they are **not linearly separable**
```
XOR truth table:
(0,0) β 0 (0,1) β 1
(1,0) β 1 (1,1) β 0
No single line separates 0s from 1s
```
**Solution:** stack perceptrons β **neural network**
Note: This is the key motivation for everything in Modules 3β5. XOR is the historically famous example (Minsky & Papert, 1969) that stalled neural network research for 15 years.
---
<!-- .slide: data-background="#0d1117" -->
## From perceptron to network
```
1 perceptron 2-layer network
xβ ββ xβ ββ¬ββΆ hβ ββ
βββΆ Ε· β βββΆ Ε·
xβ ββ xβ ββ΄ββΆ hβ ββ
1 boundary 2 boundaries combined
```
Each hidden neuron learns **one boundary**
The output neuron **combines** them
> Same formula $z = \mathbf{w}^\top\mathbf{x} + b$, applied repeatedly
---
## How does a network learn?
Three steps, repeated:
1. **Forward pass** β compute $\hat{y}$
2. **Loss** β measure how wrong we are: $\mathcal{L}(y, \hat{y})$
3. **Backprop** β update weights via gradient:
$$W \leftarrow W - \eta\,\frac{\partial \mathcal{L}}{\partial W}$$
> This is why we need **differentiable** activations
> (sigmoid, ReLU β not the step function)
Note: Sessions 5 and 6 cover this in detail. Today you'll set weights manually β next sessions you'll learn them automatically.
---
<!-- .slide: data-background="#0d1117" -->
## Going deeper
| Depth | Architecture | Used for |
|-------|-------------|----------|
| 1 | **Perceptron** | Binary classification |
| 2β3 | MLP | Tabular data |
| 5β20 | CNN | Images |
| 100+ | Transformer | Text, code, speech |
> Depth allows **hierarchical** feature learning
> Early layers: edges β Middle: shapes β Deep: semantics
---
## Key architectures
----
### Convolutional Networks (CNN)
A filter **slides** across the image β shared weights
$$h_{i,j} = f\!\left(\sum_{m,n} W_{m,n} \cdot x_{i+m,\,j+n} + b\right)$$
**β Images, medical imaging, video** *(Session 10β11)*
----
### Recurrent Networks (RNN)
Previous output fed back as input β handles **sequences**
$$\mathbf{h}_t = f(W_h \mathbf{h}_{t-1} + W_x \mathbf{x}_t + \mathbf{b})$$
**β Time series, speech** *(historical context)*
----
### Transformers
Each token attends to all others β **attention mechanism**
$$\text{Attention}(Q,K,V) = \text{softmax}\!\left(\tfrac{QK^\top}{\sqrt{d_k}}\right)V$$
**β GPT, Claude, BERT, AlphaFold** *(frontier)*
Note: All three are perceptrons at their core β just different connectivity patterns.
---
## The modern landscape
| Modality | Architecture | Example |
|----------|-------------|---------|
| Text | Transformer | GPT-4, Claude |
| Images | CNN / ViT | ResNet, DALLΒ·E |
| Speech | CNN + RNN | Whisper |
| Proteins | Transformer | AlphaFold |
| Tabular | **MLP / perceptron** | **β you, today** |
| Games | RL + MLP | AlphaGo |
---
<!-- .slide: data-background="#0d1117" -->
## The unbroken chain
| | Your perceptron | GPT-4 |
|--|--|--|
| Parameters | 3 | ~$10^{12}$ |
| Layers | 1 | ~120 |
| Activation | Step | GELU |
| Training | Manual | Gradient descent |
$$\underbrace{z = w_1 x_1 + w_2 x_2 + b}_{\text{session 2}} \;\longrightarrow\; \underbrace{\hat{y} = \text{Transformer}(\mathbf{x})}_{\text{frontier}}$$
> **Every** weight in every LLM is a perceptron weight
---
## Course roadmap
```
[S2] Perceptron β TODAY
β
[S3] Learning rule β auto weight update
β
[S4] MLP β stacking perceptrons
β
[S5β6] Loss + Backprop β learning by gradient
β
[S7] Logistic + Softmax β probabilities
β
[S8β9] Generalization + PyTorch
β
[S10β11] CNNs β images
β
[S12] Final project
```
---
## What you'll do today
In the **Session 2 notebook**:
1. Implement the perceptron formula
2. Try all 4 activation functions
3. Compute outputs for logic gates (AND, OR, NOTβ¦)
4. Visualize decision boundaries in 2D
5. Discover why XOR breaks it β motivation for Session 4
> Everything you build today is the **foundation**
> of every architecture in this course
---
# Let's start
$$z = w_1 x_1 + w_2 x_2 + b \qquad \hat{y} = f(z)$$
*Update your files : ```git pull``` and open the notebook β*
Note: From here, switch to the Jupyter notebook for Session 2.
{"title":"The Perceptron in the AI Landscape","type":"slide","slideOptions":{"theme":"night","transition":"slide"}}