Linear algebra

--- title: Linear algebra author: Pierre Veron --- ![](https://hub.bio.ens.psl.eu/index.php/s/Ritc7Lggz9SLKkn/download/logos_67.png) Linear algebra memo :::info École normale supérieure -- PSL, Département de biologie :mortar_board: [Maths training 2024-2025](https://codimd.math.cnrs.fr/s/hmbX8GuA4#) :bust_in_silhouette: Pierre Veron :email: `pveron [at] bio.ens.psl.eu` ::: [TOC] # Preliminary definitions ## Vector space :::danger **Definintion: vector space** A $\mathbb{R}$-vector space $E$ is a set of elements with * an *additive law*: we can define $\vec x+\vec y$ as $\vec x$ an $\vec y$ are elements of $E$, * a *scalar multiplication*: we can define $\lambda \vec x \in E$ where $\lambda \in \mathbb{R}$ and $\vec x \in E$ * some requirements, e.g. * $\vec{x} + \vec{y} = \vec y+ \vec x$ * there is a null vector $\vec 0$ such that $\vec x+\vec 0 = \vec x$ * $\forall \vec x \in E, 1\vec x = \vec x$ * ... ::: :::spoiler All axioms * Associativity of vector addition $\vec x+(\vec y+\vec z)=(\vec x+\vec y)+\vec z$ * Commutativity of vector addition $\vec x+\vec y=\vec y+\vec x$ * Existence of zero vector $\vec x+\vec 0=\vec x$ * Additive inverse : for every $\vec x\in E$ there is an element $-\vec x$ such that $\vec x+(-\vec x)=\vec 0$ * Compatibility of scalar multiplication with regular multiplication $\lambda (\mu \vec x)= (\lambda \mu) \vec x$ * $\forall \vec x \in E, 1\vec x = \vec x$ * Distributivity of scalar multiplication with respect fo vector addition $\lambda (\vec x+\vec y) = \lambda \vec x + \lambda \vec y$ * Distributivity of scalar multiplication with respect to regular addition $(\lambda+\mu)\vec x = \lambda \vec x + \mu \vec x$. ::: *Examples* * $\mathbb{R}$ is a space vector * For $n\ge 1$ an integer, $\mathbb{R}^n = \{(x_1,...,x_n), x_i \in \mathbb{R}\}$ is a space vector with the following laws: * if $\vec x=(x_1,\dots,x_n)$, $\vec y = (y_1,...,y_n)$, then $\vec x + \vec y = (x_1+y_1, \dots, x_n+y_n)$, * for $\lambda \in \mathbb{R}$, $\lambda \vec x= (\lambda x_1, \dots , \lambda x_n)$. * The set of numeric functions $\mathbb{R}\to \mathbb{R}$ is a $\mathbb{R}$ space vector. ## Linear map A **map** is a function between two space vectors. :::danger **Definition: linear map** Let $f:E\to F$ a map. $f$ is a **linear map** if it satisfies these two properties $\forall x, y \in E, \forall \lambda \in \mathbb{R}$: * $f(x+y) = f(x)+f(y)$, * $f(\lambda x) = \lambda f(x)$. ::: Consequences, if $f$ is a linear map then: * $f(\vec 0) = \vec 0$, * $f(\lambda \vec x+\mu \vec y)=\lambda f(\vec x)+\mu f(\vec y)$, * $f(\vec{x_1}+...+\vec{x_n}) = f(\vec{x_1}) + ... + f(\vec{x_n})$. * If $g: F \to G$ is a linear map then the composition $g\circ f$ is also a linear map. *Examples* * Homothety, let $c\in \mathbb{R}$. The function \begin{align*} f\colon E & \longrightarrow E \\ \vec x & \longmapsto c\vec x \end{align*} is a linear map * Identity, the function \begin{align*} \mathrm{Id}\colon E & \longrightarrow E \\ \vec x & \longmapsto \vec x \end{align*} is a linear map (particular case of homothety with $c=1$). * Symetry, for instance in $\mathbb{R}^2$ the symmetry relative to the $x$-axis: \begin{align*} f\colon\quad &\mathbb{R}^2 & \longrightarrow &\mathbb{R}^2 \\ &(x,y) &\longmapsto &(x,-y) \end{align*} is a linear map > ![illustration_symmetry](https://codimd.math.cnrs.fr/uploads/upload_1f14399e06ca0345f9997d2ff02a7768.PNG =200x200) > The symmetry relative to the $x$ axis is a linear map What is **not** a linear map (counter examples): * Affine function with $\vec b\in E, \vec b\neq \vec 0$, the function $f: \vec x\mapsto c\vec x+\vec b$ is not a linear map * Square function in $\mathbb{R}$, the function $f: x \mapsto x^2$ is not linear * A norm function is not linear \begin{align*} f: &\mathbb{R}^n & \longrightarrow& \mathbb{R} \\ &(x_1, \dots x_n) &\longmapsto& \sqrt{x_1^2 + ... + x_n^2} \end{align*} # Finite dimension ## Definition In the rest of the course, we will moslty work with finite-dimensional vector spaces. If $E$ has a finite dimension $n$, it means that there is a **basis** $\mathcal{B} = (\vec e_1,...,\vec e_n)$ such that any vector $\vec x \in E$ can be decomposed in a unique way \begin{equation} \vec x = x_1 \vec e_1 + \cdots + x_n \vec e_n \end{equation} where $x_1,...,x_n$ are scalars. In this basis $\mathcal{B}$ the vector can be represented as a column of $n$ numbers: \begin{equation} \vec x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}_{\mathcal{B}}. \end{equation} :::warning A space vector with finite dimension has several bases, there is not one unique basis. However, all of them have the same size, which is the dimension of the space vector. ::: ## Calculations in a basis If $E$ is a finite-dimension vector space with a basis $\mathcal{B}$, if $\lambda \in \mathbb{R}$, \begin{equation} \vec x = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}_{\mathcal{B}} \quad \text{and} \quad \vec y = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix}_{\mathcal{B}} \end{equation} then \begin{equation} \vec x + \vec y = \begin{pmatrix} x_1 + y_1 \\ x_2 +y_2 \\ \vdots \\ x_n + y_n \end{pmatrix}_{\mathcal{B}} \quad \text{and} \quad \lambda \vec x = \begin{pmatrix} \lambda x_1 \\ \lambda x_2 \\ \vdots \\ \lambda x_n \end{pmatrix}_{\mathcal{B}}. \end{equation} ## Canonical basis of $\mathbb{R}^n$ $\mathbb{R}^n$ is a vector space of dimension $n$ and it has one particular base $\mathcal{B} = (\vec e_1,...,\vec e_n)$ called the **canonical basis** defined by \begin{equation} \vec e_1 = (1, 0, 0, \dots , 0),\; \vec e_2 = (0, 1, 0, \dots, 0),\; \dots \;,\; \vec e_n = (0, 0, 0, \dots , 1). \end{equation} # Matrix representation of linear maps ## Definition $E$ and $F$ are finite dimensional vector speces such that $\mathrm{dim}(E) = n$ and $\mathrm{dim}(F) = m$. Let $\mathcal{B}= (\vec e_1, ..., \vec e_n)$ be a basis of $E$ and $\mathcal{G} = (\vec g_1, ..., \vec g_n)$ be a basis of $F$. Let $f: E\to F$ a linear map. Each image of the base vector of $E$ by $f$ has a unique decomposition in the base $\mathcal{G}$ of $F$ can be decomposed in a unique way: \begin{equation} \left\lbrace \begin{array}{c} f(\vec e_1) &= &y_{11} \vec g_1 + y_{21} \vec g_2 + \dots + y_{m1} \vec g_m \\ f(\vec e_2) &= &y_{12} \vec g_1 + y_{22} \vec g_2 + \dots + y_{m2} \vec g_m \\ \vdots & & \\ f(\vec e_n) &= &y_{1n} \vec g_1 + y_{2n} \vec g_2 + \dots + y_{mn} \vec g_m \end{array}\right. \begin{equation} :::danger **Definition** The matrix representation of $f$ in the basis $\mathcal{B}$ to $\mathcal{G}$, denoted $\mathrm{Mat}_{\mathcal{B}, \mathcal{G}} (f)$ is the matrix with $m$ rows and $n$ columns: \begin{equation} \begin{pmatrix} y_{11} & y_{12}& \dots &y_{1n} \\ y_{21} & y_{22} & \dots & y_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \dots & y_{mn} \end{pmatrix}. \end{equation} The $i$-th column of the matrix is the representation by column of $f(\vec e_i)$ in the basis $\mathcal{G}$. ::: >_Example_ > $f: \mathbb{R}^3 \to \mathbb{R}^2$ defined as > \begin{equation} > f((x,y,z)) = (2x+z, -x+3y+5z) > \end{equation} > has a matrix representation in the canonical bases: > \begin{equation} > \begin{pmatrix}2 & 0 & 1 \\ -1 & 3 & 5 \end{pmatrix}. > \end{equation} > >:::spoiler Explanation > The canonical basis of $\mathbb{R}^3$ is $(\vec e_1, \vec e_2, \vec e_3) = ((1,0,0), (0,1,0), (0,0,1))$. The canonical basis of $\mathbb{R}^2$ is $(\vec g_1, \vec g_2) = ((1,0), (0,1))$. Let us compute the images of each of theses vectors by the linear map $f$: >* $f(\vec e_1) = f((1,0,0)) = (2, -1) = 2 \vec g_1 - 1 \vec g_2$ >* $f(\vec e_2) = f((0,1,0)) = (0, 3) = 0 \vec g_1 + 3 \vec g_2$ >* $f(\vec e_3) = f((0,0,1)) = (1, 5) = 1 \vec g_1 + 5 \vec g_2$. >Then we apply the definition of the matrix representation and stack these three decomposition by column. >::: ## Properties :::success 1. $M$ is the matrix representation of a linear map $f: E\to F$ in the bases $\mathcal{B}, \mathcal{G}$. Let $\vec x \in E$. Let us write \begin{equation} X = \begin{pmatrix} x_1 \\ \vdots \\ x_n \end{pmatrix}_{\mathcal B} \end{equation} the decomposition of the vector $\vec x$ in the base $\mathcal B$. Then the column vector \begin{equation} Y = MX \end{equation} is the column representing the decomposition of $f(\vec x)$ in $\mathcal G$. 2. Let $f:E \to E$ and $g: E\to E$ linear maps with matrix representations $M_1$ and $M_2$ respectively in a base $\mathcal B$. Then the matrix \begin{equation} M_2 M_1 \end{equation} is the matrix representation of the composition $g\circ f$ in the base $\mathcal B \ne$. 3. \begin{equation} M_1 + M_2 \end{equation} is the matrix representation of $f+g$. 4. The identity matrix \begin{equation} I_n = \begin{pmatrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & 1 \end{pmatrix} \end{equation} is the representation of the identity function $\mathrm{Id}: E\to E, \vec x \mapsto \vec x$ in any basis. ::: # Inversion of a square matrix ## Definition and properties :::danger **Definition** A square matrix $A$ of size $n\times n$ is said to be **invertible** if there exists another square matrix of same size $B$ such that: \begin{equation} AB = BA = I_n \end{equation} with $I_n$ the identity matrix of size $n$. In this case, the matrix $B$ is the only one to satisfied this property and is called the **inverse of matrix** $A$, denoted $A^{-1}$ (or, rarely $\mathrm{inv}(A)$). ::: :::warning **Properties of the inverse of a matrix** Let $A$ be an invertible matrix of size $n$. * $A^{-1}$ is invertible and $(A^{-1})^{-1} = A$ * For any $k \ge 0$, $A^k$ is invertible and $(A^k)^{-1} = (A^{-1})^k$ * If $B$ is an invertible matrix, then $AB$ is invertible and $(AB)^{-1} = B^{-1} A^{-1}$ * The transpose of $A$ is invertible and $(A^{\intercal})^{-1} = (A^{-1})^{\intercal}$. * For any real $c \ne 0$, the matrix $cA$ is invertible and $(cA)^{-1} = \frac{1}{c} A^{-1}$. ::: :::warning **Characterisation of invertible matrices** The square matrix $A$ of size $n\times n$ is invertible $\Leftrightarrow \det A \ne 0$ $\Leftrightarrow$ for any $Y \in \mathbb R ^n$ the system $AX = Y$, with the unknown $X \in \mathbb R ^n$ has a unique solution (in this case the solution is $X = A^{-1} Y$) $\Leftrightarrow$ the system $AX = 0$ with with the unknown $X \in \mathbb R ^n$ has the only solution being the null vector $\Leftrightarrow$ 0 is not an eigenvalue of $A$. $\Leftrightarrow$ the columns of $A$ are linearly independant, i.e. they form a base of $\mathbb R^n$ $\Leftrightarrow$ the rows of $A$ are linearly independant $\Leftrightarrow$ the linear map associated to the matrix $A$ the canonical base of $\mathbb R^n$ is bijective (in this case $A^{-1}$ is the representation of $f^{-1}$ in the same base). ::: ## How to calculate the inverse of a matrix? ### Case of 2,2 matrices Let \begin{equation} A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \end{equation} be a square matrix of size 2 Then the determinant of $A$ has an explicit expression: \begin{equation} \det A = ad-bc. \end{equation} $A$ is invertible if and only if $\det A \ne 0$, and in this case the expression of the inverse of $A$ is: \begin{equation} A = \frac{1}{ad-bc} \begin{pmatrix} d & -b \\ -c & a \end{pmatrix}. \end{equation} ### General case The general way to calculate the inverse of a square matrix \begin{equation} A = \begin{pmatrix} a_{11} & a_{12} & ... & a_{1n} \\ a_{21} & a_{22} & ... & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & ... & a_{nn} \end{pmatrix} \end{equation} of size $n$ is to solve the linear system: \begin{equation} AX = Y \quad \text{with} \quad X = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} \quad \text{and} \quad X = \begin{pmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{pmatrix}. \end{equation} The explicit expression of this linear system is: \begin{equation} \begin{cases} a_{11} x_1 + a_{12} x_2 + \cdots + a_{1n} x_n &= y_1 \\ a_{21} x_1 + a_{22} x_2 + \cdots + a_{2n} x_n &= y_2 \\ & \vdots \\ a_{n1} x_1 + a_{n2} x_2 + \cdots + a_{nn} x_n &= y_n \end{cases} \end{equation} For instance, the Gaussian elimination can be used to solve it. If the solution is unique, it means that this matrix is invertible. In this case the values $x_i$ are linear compositions of the $y_j$: \begin{equation} \begin{cases} x_1 &= b_{11} y_1 + b_{12} y_2 + \cdots + b_{1n} y_n \\ x_2 &= b_{21} y_1 + b_{22} y_2 + \cdots + b_{2n} y_n \\ & \vdots \\ x_n &= b_{n1} y_1 + b_{n2} y_2 + \cdots + b_{nn} y_n \end{cases} \end{equation} Then the inverse of $A$ is the matrix formed by the coefficients of these expressions: \begin{equation} A^{-1} = \begin{pmatrix} b_{11} & b_{12} & ... & b_{1n} \\ b_{21} & b_{22} & ... & b_{nn} \\ \vdots & \vdots & \ddots & \vdots \\ b_{n1} & b_{n2} & ... & b_{nn} \end{pmatrix}. \end{equation} # Eigenvector and eigenvalues, matrix diagonalization ## Definition :::danger **Definition: eigenvector and eigenvalues of a linear map** Let $f:E \to E$ be a linear map. An **eigenvector** is a non-zero vector $\vec u \ne \vec 0$ of $E$ such that there is a scalar $\lambda \in \mathbb R$ that satisfies the condition: \begin{equation} f(\vec u) = \lambda \vec u \end{equation} The value $\lambda$ is called the **eigenvalue** associated with $\vec u$. ::: Because of the matrix representation, we can also define eigenvalues and eigenvector for matrices: an eigenvector for the square matrix $A$ is a column vector $X$ such that there is a umber $\lambda$ (the eigenvalue) such that $AX = \lambda X$. All the properties for the two definitions are basically the same. The set of all eigenvalues of a linear map/matrix is called the **spectrum** of this map/matrix. :::warning **Alternative definition for a matrix** Let $A$ be a square matrix and $\lambda \in \mathbb R$ \begin{align*} \lambda \text{is an eigenvalue of } A \quad & \Leftrightarrow A - \lambda I_n \text{is not invertible} \\ &\Leftrightarrow \det (A - \lambda I_n) = 0 \end{align*} ::: :::warning **Properties of eigenvectors and eigenvalues** Let $A$ be a square matrix of size $n$. * If $X$ is an eigenvector for the value $\lambda$ and $c\ne0$ a number, then $cX$ is also eigenvector for the value $\lambda$. * If $X$ and $Y$ are two eigenvectors for the value $\lambda$ then any combination $cX+dY$ is also an eigenvector for the value $\lambda$ (unless $cX+dY = 0$) * There are **at most** $n$ distinct eigenvalues. There can be less eigenvalues or even no eigenvalues. ::: ## Characteristic polynomial :::danger **Definition: characteristic polynomial of a matrix** Let $A$ ba a square matrix. The function: \begin{equation} \chi_A : \left(\begin{array}{lll} \mathbb R &\to &\mathbb R \\ x &\mapsto &\det (x I_n -A) \end{array}\right) \end{equation} is a polynomial and is called the **characteristic polynomial**. ::: :::warning **Properties of the characteristic polynomial** * $\chi_A(0) = \det (A)$ * the roots of $\chi_A$ are the eigenvalues of $A$ * the transpose of a matrix has the same characteristic polynomial. ::: ## Diagonalization of a matrix, eigendecomposition **Notation** let $x_1, ..., x_n$ be $n$ numbers. We note $\mathrm{diag}(x_1, ..., x_n)$ the matrix: \begin{equation} \mathrm{diag} (x_1, x_2, ..., x_n) = \begin{pmatrix} x_1 & 0 & ... & 0 \\ 0 & x_2 & ... & 0 \\ \vdots & \ddots & \ddots & 0 \\ 0 & ... & 0 & x_n \end{pmatrix}. \end{equation} For instance $\mathrm{diag}(1,1,...,1) = I_n$. The interesting property of diagonal matrices is that $(\mathrm{diag} (x_1, ..., x_n))^k = \mathrm{diag} (x_1^k, ..., x_n^k)$ and, if all $x_i$ are non zero, $(\mathrm{diag} (x_1, ..., x_n))^{-1} = \mathrm{diag} (1/x_1, ..., 1/x_n)$. :::danger **Diagonalizable matrix** Let $A$ be a square matrix of size $n$. We say that $A$ is diagonalizable if there exists an invertible matrix $P$ and a diagolal matrix $D$ such that \begin{equation} A = P D P^{-1} \end{equation} ::: The diagonalization is the process of finding a $P$ and $D$ matrix that satisfies this equation. Note that they are not unique. The concept of diagonalization is strongly linked to the concept of eigenvalues. :::warning **Eigenvalues of a diagonalizable matrix** Let $A$ be a square matrix with its diagonalization $A = P D P^{-1}$ with $D = \mathrm{diag}(\lambda_1,...,\lambda_n)$. Then all $\lambda_i$ are **eigenvalues** of the matrix $A$ and an eigenvector for this eigenvalue is given by the $i$th column of the matrix $P$. ::: :::spoiler proof The $i$th column of the matrix $P$ is the column given by $P E_i$ where $E_i = (0,0,...,0,1,0,...,0)^{\intercal}$ is the $i$th element of the canonical base (a column with a 1 on the $i$th row and 0 elsewhere). Let us calculate $A P E_i$: \begin{align} A (P E_i) &= P \, \mathrm{diag} (\lambda_1, ..., \lambda_n) \, P^{-1} P E_i \\ &= P \, \mathrm{diag} (\lambda_1, ..., \lambda_n) E_i \\ &= P \lambda_i E_i \\ &= \lambda_i (P E_i) \end{align} since $P E_i$ is a non-zero vector (because $P$ is invertible), $\lambda_i$ is an eigenvalue of the matrix $A$ with an eigenvector $P E_i. \quad \square$ ::: The interest of diagonalizing a matrix is to calculate its exponents. :::warning **Exponents and inverse of a diagonalizable matrix** Let $A$ be a square matrix with its diagonalization $A = P D P^{-1}$ with $D = \mathrm{diag}(x_1,...,x_n)$. Then 1. for any $k \in \mathbb N$, \begin{equation} A^k = P \, \mathrm{diag}(x_1^k, ..., x_n^k)\, P^{-1} \end{equation} 2. $A$ is invertible if and only if all $x_i$ are non zero. In this case: \begin{equation} A^{-1} = P \, \mathrm{diag}\left(\frac{1}{x_1}, ..., \frac{1}{x_n}\right)\, P^{-1} \end{equation} ::: **Particular case of matrices with $n$ distinct eigenvalues** Let us assume that $A$ has $n$ distinct eigenvalues $\lambda_1, ..., \lambda_n$. Let us note $X_i$ an eigenvector associated with the eigenvalue $\lambda_i$. Then (*admitted*) the set of the eigenvector $X_1,..., X_n$ is a base of $\mathbb R^n$. So the matrix $P$ composed of the columns $X_i$ appended to each other is invertible. We therefore have: \begin{align} A = P \, \mathrm{diag}( \lambda_1, ..., \lambda_n) \, P^{-1}. \end{align} :::warning A square matrix of size $n$ with $n$ distinct eigenvalues is diagonalizable. ::: The converse is not true, a matrix can be diagonalisable and have less than $n$ eigenvalues. In such a case, some eigenvalues are repeated in the diagonal decomposition. The number of times each eigenvalue appear in the decomposition is the dimension of the eigenspace associated to this eigenvalue. :::danger **Definition** The eigenspace associated with an eigen value $\lambda$ is the set of all vectors $X$ such that $AX = \lambda X$. The dimension of this eigenspace is the maximum number of linearly independant vectors that can be found in this space, i.e. the number of linearly independant eigenvectors associated with this eigenvalue. ::: :::warning **Characterization of a diagonalizable matrix** A square matrix $A$ of size $n$ is diagonalizable if and only if the sum of eigenspaces associated with its eigenvalues is equal to $n$. There is a base of $\mathbb R^n$ that is composed of eigenvectors of $A$. ::: # Numerical linear algebra ## With `Python` ```python= import numpy as np A = np.array([[1,2,3],[-1,2,-1],[0,4,1]]) # create a matrix B = np.diag([-2,1,3]) # create a diagonal matrix R = np.array([1,1,2]) # create a row vector C = np.array([[1],[2],[3]]) # create a column vector A+2*B # linear combination of same-sized matrices np.transpose(A) # transpose a matrix np.trace(A) # calculate the trace of a matrix np.dot(A,B) # multiply matrices or vectors np.linalg.inv(A) # invert a matrix np.linalg.det(A) # calculate the determinant of a matrix np.linalg.eig(A) # calculate eigenvalues and find eigenvectors np.linalg.matrix_power(A, 5) # calculate the exponent of a matrix # Warning A * B # is not the matrix product but the term-to-term product of same-size matrices ``` ## With `R` ```r= A <- matrix(c(1,-1,0,2,2,4,3,-1,1), ncol = 3) # create a matrix B <- diag(c(2,1,3)) # create a diagonal matrix R <- c(1,1,2) # create a row vector C <- matrix(c(1,2,3), ncol = 1) # create a column vector A+2*B # linear combination of same-sized matrices t(A) # transpose a matrix A %*% B # matrix multiplication sum(diag(A)) # trace of a matric solve(A) # invert of a matrix det(A) # determinant of a matrix eigen(A) # find eigenvalues and eigenvectors of a matrix # Warning A * B # is not the matrix product but term-to-term product ``` *[composition]: The composition of two functions f and g is the function noted f ∘ g such that f ∘ g (x) = f(g(x)). Note that a priori f ∘ g ≠ g ∘ f. *[bijective]: A function f:E → F is bijective if for any y∈F there is a unique element x∈E such that f(x) = y. In this case the value x is noted f⁻¹(y), and the function f⁻¹ is called the inverse of f.