<style> .reveal { font-size: 32px; } .reveal h1 { font-size: 1.8em; } .reveal h2 { font-size: 1.3em; } .reveal h3 { font-size: 1.1em; } .reveal pre { font-size: 0.75em; } .reveal table { font-size: 0.85em; } .reveal blockquote { font-size: 0.9em; } </style> # The Perceptron in the AI Landscape ### Where today's session fits in modern AI *Neural Networks - ARTI β€” Session 2* --- ## Today's roadmap 1. πŸ—ΊοΈ **The AI landscape** β€” where does ML live? 2. 🧠 **Machine Learning** β€” learning from data 3. ⚑ **The perceptron** β€” today's focus 4. πŸ”— **Neural networks** β€” what comes next 5. πŸš€ **The modern world** β€” LLMs, CNNs, RL --- <!-- .slide: data-background="#0d1117" --> ## What is AI? > Any technique that allows machines to **mimic** aspects of human intelligence ``` β”Œβ”€β”€ AI ────────────────────────────┐ β”‚ β”Œβ”€β”€ Machine Learning ─────────┐ β”‚ β”‚ β”‚ β”Œβ”€β”€ Deep Learning ───────┐ β”‚ β”‚ β”‚ β”‚ β”‚ Foundation Models β”‚ β”‚ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` Note: Not all AI is ML β€” expert systems and logic solvers are AI too. But all modern breakthroughs are ML-based. --- ## A brief history | Era | Paradigm | Example | |-----|----------|---------| | 1950s–80s | Symbolic AI | Expert systems | | **1957** | **Perceptron ←** | **Rosenblatt** | | 1980s–2000s | Statistical ML | SVM, Trees | | 2006+ | Deep Learning | CNNs, RNNs | | 2017+ | Transformers | BERT, GPT | | 2022+ | Foundation Models | ChatGPT, Claude | > The perceptron is the **ancestor** of all neural approaches --- <!-- .slide: data-background="#0d1117" --> ## ML: the core idea **Classical programming** ``` Rules + Data β†’ Output ``` **Machine Learning** ``` Data + Labels β†’ Rules (learned automatically) ``` > Instead of *writing* the rules, we *learn* them from examples Note: The perceptron is the simplest implementation of this idea. --- ## Three families of ML | | Supervised | Unsupervised | Reinforcement | |--|--|--|--| | **Data** | Labeled | Unlabeled | Rewards | | **Goal** | Predict | Find structure | Maximize score | | **Example** | Spam filter | Clustering | Game agent | | **Perceptron** | βœ… | β€” | β€” | The perceptron is a **supervised binary classifier** --- <!-- .slide: data-background="#0d1117" --> ## Where the perceptron fits ``` Supervised Learning β”œβ”€β”€ Linear models ← TODAY β”‚ β”œβ”€β”€ Perceptron ← SESSION 2 β”‚ β”œβ”€β”€ Logistic Regression β”‚ └── Linear SVM β”‚ └── Non-linear models ← LATER β”œβ”€β”€ Neural Networks β”œβ”€β”€ Random Forest └── XGBoost ``` Note: The perceptron is the simplest linear classifier. Its limits motivate everything that comes after. --- ## ⚑ The perceptron β€” Session 2 A perceptron takes inputs, computes a weighted sum, and fires or not: $$z = w_1 x_1 + w_2 x_2 + b$$ $$\hat{y} = f(z)$$ | Symbol | Role | |--------|------| | $w_i$ | weight β€” importance of input $i$ | | $b$ | bias β€” shifts the threshold | | $f$ | activation function | Note: This is what you will implement today. --- ## Geometric view A perceptron draws a **line** (or hyperplane) in the input space: $$w_1 x_1 + w_2 x_2 + b = 0$$ - Points on one side β†’ **output 1** - Points on the other side β†’ **output 0** > This is called a **decision boundary** > Changing $w$ rotates it β€” changing $b$ shifts it --- <!-- .slide: data-background="#0d1117" --> ## Activation functions | Name | Formula | Output | |------|---------|--------| | Step | $\mathbf{1}_{z \geq 0}$ | $\{0, 1\}$ | | Sign | $\text{sgn}(z)$ | $\{-1, +1\}$ | | Sigmoid | $\frac{1}{1+e^{-z}}$ | $(0, 1)$ | | ReLU | $\max(0, z)$ | $[0, +\infty)$ | You will use **all four** in today's notebook Note: Step and sign are for classic perceptrons. Sigmoid and ReLU are what modern networks use β€” they're differentiable, which enables learning. --- ## The perceptron's limit Some problems have **no** separating line β€” they are **not linearly separable** ``` XOR truth table: (0,0) β†’ 0 (0,1) β†’ 1 (1,0) β†’ 1 (1,1) β†’ 0 No single line separates 0s from 1s ``` **Solution:** stack perceptrons β†’ **neural network** Note: This is the key motivation for everything in Modules 3–5. XOR is the historically famous example (Minsky & Papert, 1969) that stalled neural network research for 15 years. --- <!-- .slide: data-background="#0d1117" --> ## From perceptron to network ``` 1 perceptron 2-layer network x₁ ─┐ x₁ ─┬─▢ h₁ ─┐ β”œβ”€β–Ά Ε· β”‚ β”œβ”€β–Ά Ε· xβ‚‚ β”€β”˜ xβ‚‚ ─┴─▢ hβ‚‚ β”€β”˜ 1 boundary 2 boundaries combined ``` Each hidden neuron learns **one boundary** The output neuron **combines** them > Same formula $z = \mathbf{w}^\top\mathbf{x} + b$, applied repeatedly --- ## How does a network learn? Three steps, repeated: 1. **Forward pass** β€” compute $\hat{y}$ 2. **Loss** β€” measure how wrong we are: $\mathcal{L}(y, \hat{y})$ 3. **Backprop** β€” update weights via gradient: $$W \leftarrow W - \eta\,\frac{\partial \mathcal{L}}{\partial W}$$ > This is why we need **differentiable** activations > (sigmoid, ReLU β€” not the step function) Note: Sessions 5 and 6 cover this in detail. Today you'll set weights manually β€” next sessions you'll learn them automatically. --- <!-- .slide: data-background="#0d1117" --> ## Going deeper | Depth | Architecture | Used for | |-------|-------------|----------| | 1 | **Perceptron** | Binary classification | | 2–3 | MLP | Tabular data | | 5–20 | CNN | Images | | 100+ | Transformer | Text, code, speech | > Depth allows **hierarchical** feature learning > Early layers: edges β†’ Middle: shapes β†’ Deep: semantics --- ## Key architectures ---- ### Convolutional Networks (CNN) A filter **slides** across the image β€” shared weights $$h_{i,j} = f\!\left(\sum_{m,n} W_{m,n} \cdot x_{i+m,\,j+n} + b\right)$$ **β†’ Images, medical imaging, video** *(Session 10–11)* ---- ### Recurrent Networks (RNN) Previous output fed back as input β€” handles **sequences** $$\mathbf{h}_t = f(W_h \mathbf{h}_{t-1} + W_x \mathbf{x}_t + \mathbf{b})$$ **β†’ Time series, speech** *(historical context)* ---- ### Transformers Each token attends to all others β€” **attention mechanism** $$\text{Attention}(Q,K,V) = \text{softmax}\!\left(\tfrac{QK^\top}{\sqrt{d_k}}\right)V$$ **β†’ GPT, Claude, BERT, AlphaFold** *(frontier)* Note: All three are perceptrons at their core β€” just different connectivity patterns. --- ## The modern landscape | Modality | Architecture | Example | |----------|-------------|---------| | Text | Transformer | GPT-4, Claude | | Images | CNN / ViT | ResNet, DALLΒ·E | | Speech | CNN + RNN | Whisper | | Proteins | Transformer | AlphaFold | | Tabular | **MLP / perceptron** | **← you, today** | | Games | RL + MLP | AlphaGo | --- <!-- .slide: data-background="#0d1117" --> ## The unbroken chain | | Your perceptron | GPT-4 | |--|--|--| | Parameters | 3 | ~$10^{12}$ | | Layers | 1 | ~120 | | Activation | Step | GELU | | Training | Manual | Gradient descent | $$\underbrace{z = w_1 x_1 + w_2 x_2 + b}_{\text{session 2}} \;\longrightarrow\; \underbrace{\hat{y} = \text{Transformer}(\mathbf{x})}_{\text{frontier}}$$ > **Every** weight in every LLM is a perceptron weight --- ## Course roadmap ``` [S2] Perceptron ← TODAY β”‚ [S3] Learning rule β€” auto weight update β”‚ [S4] MLP β€” stacking perceptrons β”‚ [S5–6] Loss + Backprop β€” learning by gradient β”‚ [S7] Logistic + Softmax β€” probabilities β”‚ [S8–9] Generalization + PyTorch β”‚ [S10–11] CNNs β€” images β”‚ [S12] Final project ``` --- ## What you'll do today In the **Session 2 notebook**: 1. Implement the perceptron formula 2. Try all 4 activation functions 3. Compute outputs for logic gates (AND, OR, NOT…) 4. Visualize decision boundaries in 2D 5. Discover why XOR breaks it β†’ motivation for Session 4 > Everything you build today is the **foundation** > of every architecture in this course --- # Let's start $$z = w_1 x_1 + w_2 x_2 + b \qquad \hat{y} = f(z)$$ *Update your files : ```git pull``` and open the notebook β†’* Note: From here, switch to the Jupyter notebook for Session 2.
{"title":"The Perceptron in the AI Landscape","type":"slide","slideOptions":{"theme":"night","transition":"slide"}}