The Perceptron in the AI Landscape - CodiMD

<style> .reveal { font-size: 32px; } .reveal h1 { font-size: 1.8em; } .reveal h2 { font-size: 1.3em; } .reveal h3 { font-size: 1.1em; } .reveal pre { font-size: 0.75em; } .reveal table { font-size: 0.85em; } .reveal blockquote { font-size: 0.9em; } </style> # The Perceptron in the AI Landscape ### Where today's session fits in modern AI *Neural Networks - ARTI — Session 2* --- ## Today's roadmap 1. 🗺️ **The AI landscape** — where does ML live? 2. 🧠 **Machine Learning** — learning from data 3. ⚡ **The perceptron** — today's focus 4. 🔗 **Neural networks** — what comes next 5. 🚀 **The modern world** — LLMs, CNNs, RL ---  ## What is AI? > Any technique that allows machines to **mimic** aspects of human intelligence ``` ┌── AI ────────────────────────────┐ │ ┌── Machine Learning ─────────┐ │ │ │ ┌── Deep Learning ───────┐ │ │ │ │ │ Foundation Models │ │ │ │ │ └────────────────────────┘ │ │ │ └─────────────────────────────┘ │ └──────────────────────────────────┘ ``` Note: Not all AI is ML — expert systems and logic solvers are AI too. But all modern breakthroughs are ML-based. --- ## A brief history | Era | Paradigm | Example | |-----|----------|---------| | 1950s–80s | Symbolic AI | Expert systems | | **1957** | **Perceptron ←** | **Rosenblatt** | | 1980s–2000s | Statistical ML | SVM, Trees | | 2006+ | Deep Learning | CNNs, RNNs | | 2017+ | Transformers | BERT, GPT | | 2022+ | Foundation Models | ChatGPT, Claude | > The perceptron is the **ancestor** of all neural approaches ---  ## ML: the core idea **Classical programming** ``` Rules + Data → Output ``` **Machine Learning** ``` Data + Labels → Rules (learned automatically) ``` > Instead of *writing* the rules, we *learn* them from examples Note: The perceptron is the simplest implementation of this idea. --- ## Three families of ML | | Supervised | Unsupervised | Reinforcement | |--|--|--|--| | **Data** | Labeled | Unlabeled | Rewards | | **Goal** | Predict | Find structure | Maximize score | | **Example** | Spam filter | Clustering | Game agent | | **Perceptron** | ✅ | — | — | The perceptron is a **supervised binary classifier** ---  ## Where the perceptron fits ``` Supervised Learning ├── Linear models ← TODAY │ ├── Perceptron ← SESSION 2 │ ├── Logistic Regression │ └── Linear SVM │ └── Non-linear models ← LATER ├── Neural Networks ├── Random Forest └── XGBoost ``` Note: The perceptron is the simplest linear classifier. Its limits motivate everything that comes after. --- ## ⚡ The perceptron — Session 2 A perceptron takes inputs, computes a weighted sum, and fires or not: $$z = w_1 x_1 + w_2 x_2 + b$$ $$\hat{y} = f(z)$$ | Symbol | Role | |--------|------| | $w_i$ | weight — importance of input $i$ | | $b$ | bias — shifts the threshold | | $f$ | activation function | Note: This is what you will implement today. --- ## Geometric view A perceptron draws a **line** (or hyperplane) in the input space: $$w_1 x_1 + w_2 x_2 + b = 0$$ - Points on one side → **output 1** - Points on the other side → **output 0** > This is called a **decision boundary** > Changing $w$ rotates it — changing $b$ shifts it ---  ## Activation functions | Name | Formula | Output | |------|---------|--------| | Step | $\mathbf{1}_{z \geq 0}$ | $\{0, 1\}$ | | Sign | $\text{sgn}(z)$ | $\{-1, +1\}$ | | Sigmoid | $\frac{1}{1+e^{-z}}$ | $(0, 1)$ | | ReLU | $\max(0, z)$ | $[0, +\infty)$ | You will use **all four** in today's notebook Note: Step and sign are for classic perceptrons. Sigmoid and ReLU are what modern networks use — they're differentiable, which enables learning. --- ## The perceptron's limit Some problems have **no** separating line — they are **not linearly separable** ``` XOR truth table: (0,0) → 0 (0,1) → 1 (1,0) → 1 (1,1) → 0 No single line separates 0s from 1s ``` **Solution:** stack perceptrons → **neural network** Note: This is the key motivation for everything in Modules 3–5. XOR is the historically famous example (Minsky & Papert, 1969) that stalled neural network research for 15 years. ---  ## From perceptron to network ``` 1 perceptron 2-layer network x₁ ─┐ x₁ ─┬─▶ h₁ ─┐ ├─▶ ŷ │ ├─▶ ŷ x₂ ─┘ x₂ ─┴─▶ h₂ ─┘ 1 boundary 2 boundaries combined ``` Each hidden neuron learns **one boundary** The output neuron **combines** them > Same formula $z = \mathbf{w}^\top\mathbf{x} + b$, applied repeatedly --- ## How does a network learn? Three steps, repeated: 1. **Forward pass** — compute $\hat{y}$ 2. **Loss** — measure how wrong we are: $\mathcal{L}(y, \hat{y})$ 3. **Backprop** — update weights via gradient: $$W \leftarrow W - \eta\,\frac{\partial \mathcal{L}}{\partial W}$$ > This is why we need **differentiable** activations > (sigmoid, ReLU — not the step function) Note: Sessions 5 and 6 cover this in detail. Today you'll set weights manually — next sessions you'll learn them automatically. ---  ## Going deeper | Depth | Architecture | Used for | |-------|-------------|----------| | 1 | **Perceptron** | Binary classification | | 2–3 | MLP | Tabular data | | 5–20 | CNN | Images | | 100+ | Transformer | Text, code, speech | > Depth allows **hierarchical** feature learning > Early layers: edges → Middle: shapes → Deep: semantics --- ## Key architectures ---- ### Convolutional Networks (CNN) A filter **slides** across the image — shared weights $$h_{i,j} = f\!\left(\sum_{m,n} W_{m,n} \cdot x_{i+m,\,j+n} + b\right)$$ **→ Images, medical imaging, video** *(Session 10–11)* ---- ### Recurrent Networks (RNN) Previous output fed back as input — handles **sequences** $$\mathbf{h}_t = f(W_h \mathbf{h}_{t-1} + W_x \mathbf{x}_t + \mathbf{b})$$ **→ Time series, speech** *(historical context)* ---- ### Transformers Each token attends to all others — **attention mechanism** $$\text{Attention}(Q,K,V) = \text{softmax}\!\left(\tfrac{QK^\top}{\sqrt{d_k}}\right)V$$ **→ GPT, Claude, BERT, AlphaFold** *(frontier)* Note: All three are perceptrons at their core — just different connectivity patterns. --- ## The modern landscape | Modality | Architecture | Example | |----------|-------------|---------| | Text | Transformer | GPT-4, Claude | | Images | CNN / ViT | ResNet, DALL·E | | Speech | CNN + RNN | Whisper | | Proteins | Transformer | AlphaFold | | Tabular | **MLP / perceptron** | **← you, today** | | Games | RL + MLP | AlphaGo | ---  ## The unbroken chain | | Your perceptron | GPT-4 | |--|--|--| | Parameters | 3 | ~$10^{12}$ | | Layers | 1 | ~120 | | Activation | Step | GELU | | Training | Manual | Gradient descent | $$\underbrace{z = w_1 x_1 + w_2 x_2 + b}_{\text{session 2}} \;\longrightarrow\; \underbrace{\hat{y} = \text{Transformer}(\mathbf{x})}_{\text{frontier}}$$ > **Every** weight in every LLM is a perceptron weight --- ## Course roadmap ``` [S2] Perceptron ← TODAY │ [S3] Learning rule — auto weight update │ [S4] MLP — stacking perceptrons │ [S5–6] Loss + Backprop — learning by gradient │ [S7] Logistic + Softmax — probabilities │ [S8–9] Generalization + PyTorch │ [S10–11] CNNs — images │ [S12] Final project ``` --- ## What you'll do today In the **Session 2 notebook**: 1. Implement the perceptron formula 2. Try all 4 activation functions 3. Compute outputs for logic gates (AND, OR, NOT…) 4. Visualize decision boundaries in 2D 5. Discover why XOR breaks it → motivation for Session 4 > Everything you build today is the **foundation** > of every architecture in this course --- # Let's start $$z = w_1 x_1 + w_2 x_2 + b \qquad \hat{y} = f(z)$$ *Update your files : ```git pull``` and open the notebook →* Note: From here, switch to the Jupyter notebook for Session 2.

{"title":"The Perceptron in the AI Landscape","type":"slide","slideOptions":{"theme":"night","transition":"slide"}}