Linear-Algebra-for-AI

Linear Algebra for AI

Linear algebra is a fundamental mathematical discipline that is critical for understanding and implementing various AI algorithms. This guide covers essential linear algebra concepts and illustrates their applications in AI with examples.

Vectors

A vector is a quantity that has both magnitude and direction. It can be represented as an ordered list of numbers.

Operations

  • Addition
\mathbf{u} + \mathbf{v} = \begin{pmatrix} u_1 \\ u_2 \end{pmatrix} + \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} u_1 + v_1 \\ u_2 + v_2 \end{pmatrix}

Example:

\mathbf{u} = \begin{pmatrix} 1 \\ 2 \end{pmatrix}, \mathbf{v} = \begin{pmatrix} 3 \\ 4 \end{pmatrix} \\
\mathbf{u} + \mathbf{v} = \begin{pmatrix} 1 \\ 2 \end{pmatrix} + \begin{pmatrix} 3 \\ 4 \end{pmatrix} = \begin{pmatrix} 1 + 3 \\ 2 + 4 \end{pmatrix} = \begin{pmatrix} 4 \\ 6 \end{pmatrix}
  • Subtraction
\mathbf{u} - \mathbf{v} = \begin{pmatrix} u_1 \\ u_2 \end{pmatrix} - \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} u_1 - v_1 \\ u_2 - v_2 \end{pmatrix}

Example:

\mathbf{u} = \begin{pmatrix} 5 \\ 7 \end{pmatrix}, \mathbf{v} = \begin{pmatrix} 2 \\ 3 \end{pmatrix} \\
\mathbf{u} + \mathbf{v} = \begin{pmatrix} 5 \\ 7 \end{pmatrix} - \begin{pmatrix} 2 \\ 3 \end{pmatrix} = \begin{pmatrix} 5 - 2 \\ 7 - 3 \end{pmatrix} = \begin{pmatrix} 3 \\ 4 \end{pmatrix}
  • Scalar Multiplication
c \mathbf{v} = c \begin{pmatrix} v_1 \\ v_2 \end{pmatrix} = \begin{pmatrix} c v_1 \\ c v_2 \end{pmatrix}

Example:

\mathbf{c} = 3, \mathbf{v} = \begin{pmatrix} 4 \\ 5 \end{pmatrix} \\
c \begin{pmatrix} 4 \\ 5 \end{pmatrix} = \begin{pmatrix} 3 \cdot 4 \\ 3 \cdot 5 \end{pmatrix} = \begin{pmatrix} 12 \\ 15 \end{pmatrix}

Vectors are used to represent data points in machine learning. For instance, a data point in a feature space can be represented as a vector X;

x = \begin{pmatrix} x_1 \\ x_2 \\ ... \\ x_n \end{pmatrix}

Matrices

A matrix is a rectangular array of numbers arranged in rows and columns.

  • Addition

If A A\mathbf{A}A and BB\mathbf{B}B are two matrices of the same size, their sum C\mathbf{C}C is defined as:

C = A + B \quad where \, each \, element \, c_{ij} \, of \, C \, is \, given \, by; \, c_{ij} = a_{ij} + b_{ij}

Example:

\mathbf{A} = \begin{pmatrix}
1 & 2 \\
3 & 4 
\end{pmatrix}, \quad \mathbf{B} = \begin{pmatrix}
5 & 6 \\
7 & 8 
\end{pmatrix} \\
A+B = \begin{pmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{pmatrix} = \begin{pmatrix} 6 & 8 \\ 10 & 12 \end{pmatrix}
  • Subtraction

If A and B are two matrices of the same size, their difference C is defined as:

C = A - B \quad where \, each \, element \, c_{ij} \, of \, C is \, given \, by; c_{ij} = a_{ij} - b_{ij}

Example:

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} , B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \\
A - B = \begin{pmatrix} 1-5 & 2-6 \\ 3-7 & 4-8 \end{pmatrix} = \begin{pmatrix} -4 & -4 \\ -4 & -4 \end{pmatrix}
  • Scalar Multiplication

If A is a matrix and c is a scalar, the product B is given by;

B = cA \quad where \, each \, element \, b_{ij} \, of \, B \, is \, given \, by; b_{ij} = c  \cdot a_{ij}

Example:

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} , c = 2 \\
2A = 2 \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} = \begin{pmatrix} 2 \cdot 1 & 2 \cdot 2 \\ 2 \cdot 3 & 2 \cdot 4 \end{pmatrix} = \begin{pmatrix} 2 & 4 \\ 6 & 8 \end{pmatrix}
  • Matrix multiplication

If A is an m x n matrix and B is an n x p matrix, their product C is an m x p matrix defined by;

C = A \cdot B \quad where \, each \, element \, c_{ij} \, of \, C \, is \, given \, by; 
c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Example:

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} , B = \begin{pmatrix} 5 & 6 \\ 7 & 8 \end{pmatrix} \\
AB = \begin{pmatrix} 1 \cdot 5 + 2 \cdot 7 & 1 \cdot 6 + 2 \cdot 8 \\ 3 \cdot 5 + 4 \cdot 7 & 3 \cdot 6 + 4 \cdot 8 \end{pmatrix} = \begin{pmatrix} 19 & 22 \\ 43 & 50 \end{pmatrix}
  • Transpose of Matrix

If A is an m x n matrix, AT is an n x m matrix defined by;

( A^T )_{ij} = a_{ji}

Example:

A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix} \\
A^T = \begin{pmatrix} 1 & 3 \\ 2 & 4 \end{pmatrix}
  • Inverse of Matrix

The inverse of a matrix A is a matrix A-1 such that;

AA^{-1} = I \quad where \, I \, is \, the \, identity \, matrix.

Example:

A = \begin{pmatrix} 4 & 7 \\ 2 & 6 \end{pmatrix} \\
A^{-1} = { 1 \over 4 \cdot 6 - 7 \cdot 2 } \begin{pmatrix} 6 & -7 \\ -2 & 4 \end{pmatrix} = { 1 \over 10 } \begin{pmatrix} 6 & -7 \\ -2 & 4 \end{pmatrix} = \begin{pmatrix} 0.6 & -0.7 \\ -0.2 & 0.4 \end{pmatrix}

Matrices are used to represent datasets, transformations, and weights in neural networks. For example, in image processing, an image can be represented as a matrix of pixel values.

Matrix-Vector Multiplication

If A is a matrix with element aij and x is a vector with elements xj, the product y is a vector with elements yi, defined as follow.

y_i =  \sum_{j=1}^{n} a_{ij}x_j \quad for \, i= 1,2,3,......, m

Example:

A = \begin{pmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{pmatrix}, x = \begin{pmatrix} 7 \\ 8 \\ 9 \end{pmatrix} \\
y = Ax \\
y = \begin{pmatrix} 1 \cdot 7 + 2 \cdot 8 + 3 \cdot 9 \\ 4 \cdot 7 + 5 \cdot 8 + 6 \cdot 9 \end{pmatrix} = \begin{pmatrix} 50 \\ 122 \end{pmatrix}

Matrix-vector multiplication is ubiquitous in AI and machine learning. Here are a few key applications:

  • Linear Transformations: In neural networks, weights are represented as matrices, and inputs are vectors. Multiplying the weight matrix by the input vector applies a linear transformation to the data.
  • Linear Regression: The hypothesis function in linear regression is typically a matrix-vector product.
  • Feature Extraction: Transforming data vectors using matrices to extract or combine features.
  • Dimensionality Reduction: Techniques like PCA (Principal Component Analysis) involve matrix-vector multiplications to project data onto lower-dimensional spaces.

Systems of Linear Equations

A system of linear equations consists of multiple linear equations involving the same set of variables. Solving these systems is a fundamental task in linear algebra, with applications ranging from engineering to computer science and, notably, AI.

A system of linear equations can be written in the general form;

\begin{cases}
a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n = b_1 \\
a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n = b_2 \\
\vdots \\
a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n = b_m
\end{cases} \\
here, \, a_{ij} are \, coefficients, \,  x_j \, are \, variables, \, and \, b_i \, are constants. 

The system of linear equations can be represented in matrix form as;

Ax = b \\
Where \, A \, is \, coefficient \, matrix \, A = \begin{pmatrix}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{pmatrix} \\
x \, is \, the \, column \, vector \, of \, variables: x = \begin{pmatrix}
x_1 \\
x_2 \\
\vdots \\
x_n
\end{pmatrix} \\
b \, is \, the \, column \, vector \, of \, constants: \begin{pmatrix}
b_1 \\
b_2 \\
\vdots \\
b_m
\end{pmatrix} 

Methods of Solving Systems of Linear Equations

  1. Gaussian Elimination

Gaussian elimination transforms the system of equations into an upper triangular form, making it easier to solve by back substitution. The steps are:

  1. Forward Elimination: Convert the system to an upper triangular matrix.
  2. Back Substitution: Solve for the variables starting from the last equation.

Example:

Consider the system;

\begin{cases}
2x + 3y = 5 \\
4x + y = 11
\end{cases}

In matrix form;

\begin{pmatrix} 2 & 3 \\ 4 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = \begin{pmatrix} 5 \\ 11 \end{pmatrix}

Step 1: Forward Elimination

\begin{pmatrix} 2 & 3 & | & 5 \\ 4 & 1 & | & 11 \end{pmatrix}

Divide the first row by 2;

\begin{pmatrix} 1 & {3 \over 2} & | & {5 \over 2 } \\ 4 & 1 & | & 11 \end{pmatrix}

Subtract 4 times the first row from the second row;

\begin{pmatrix} 1 & {3 \over 2} & | & {5 \over 2} \\ 0 & -5 & | & 1 \end{pmatrix}

Step 2: Back Substitution

\begin{cases}
-5y = 1 => y = -{1 \over 5} \\
x + {3 \over 2}y = {5 \over 2} => x+ {3 \over 2} (-{1 \over 5} ) = {5 \over 2} => x = 3
\end{cases} \\
The \, solution \, is \, x = 3 \, and \, y= -{1 \over 5}

Solving systems of linear equations is essential in optimization problems, such as finding the weights of a linear regression model.

  • Linear Regression: Solving normal equations in linear regression models involves solving systems of linear equations.
  • Neural Networks: Backpropagation involves solving systems of equations to adjust weights.
  • Optimization: Many optimization problems in machine learning involve solving systems of linear equations.
  • Computer Vision: Image reconstruction and other tasks often reduce to solving systems of linear equations.

Determinants

The determinant is a scalar value that can be computed from the elements of a square matrix. It provides important information about the matrix, including whether it is invertible and certain geometric properties. Determinants are widely used in linear algebra, particularly in solving systems of linear equations, analyzing matrix properties, and understanding linear transformations.

For an n x n square matrix A, the determinant, denoted as det⁡(A) or |A|, is a scalar value that is computed using a specific formula depending on the size of the matrix.

for 2 x 2 matrix;

A = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \\
det(A) = ad - bc

for 3 x 3 matrix;

A = \begin{pmatrix} a & b & c \\ d & e & f \\ g & h & i \end{pmatrix} \\
det(A) = a(ei -fh) -b(di - fg) + c(dh-eg)

Eigenvalues and Eigenvectors

Eigenvalues and eigenvectors are fundamental concepts in linear algebra, playing a crucial role in various fields such as physics, engineering, and particularly in AI and machine learning. They provide insight into the properties of linear transformations and are key in simplifying many complex matrix operations.

To find eigenvalues of a matrix A, we should solve the characteristic equation;

det(A - \lambda I) = 0

Here I is the identity matrix of the same size as A, and λ represent the eigenvalues.

Example

A = \begin{pmatrix} 4 & 1 \\ 2 & 3 \end{pmatrix} \\
det( A - \lambda I) = det \begin{pmatrix} 4-\lambda & 1 \\ 2 & 3- \lambda \end{pmatrix} = 0 \\
this \, expands \, to; ( 4-\lambda )(3-\lambda) - 2 \cdot 1 = 0 \\
\lambda ^2 - 7\lambda + 10 = 0 \\
\lambda = 2 \, and \, \lambda = 5

We use eigenvalues and eigenvectors in

  • Principal Component Analysis (PCA): Eigenvalues and eigenvectors are used to reduce the dimensionality of data, capturing the most significant features.
  • Spectral Clustering: Eigenvalues and eigenvectors of similarity matrices are used to identify clusters in data.
  • Stability Analysis: In neural networks and control systems, eigenvalues are used to analyze stability.
  • Markov Chains: The steady-state distribution of a Markov chain is found using eigenvectors.

Orthogonality and Projections

Orthogonality and projections are fundamental concepts in linear algebra with significant applications in various fields, including AI and machine learning. They provide the basis for understanding vector spaces, optimizing algorithms, and performing dimensionality reduction techniques like Principal Component Analysis (PCA).

Orthogonality

Two vectors u and v in a vector space are said to be orthogonal of their dot product is zero;

u \cdot v = 0

Example:

u = \begin{pmatrix} 1 \\ 2 \end{pmatrix} \, and \, v=\begin{pmatrix} 2 \\ -1 \end{pmatrix} \\
u \cdot v = 1 \cdot 2 + 2 \cdot (-1) = 2 -2 =0

Since the dot product is zero, u and v are orthogonal.

Projections

Projections allow us to project one vector onto another, effectively decomposing the original vector into components that are parallel and perpendicular to the second vector.

proj_ba = { a \cdot b \over b \cdot b } b

Example:

a = \begin{pmatrix}  3 \\ 4\end{pmatrix} \, and \, b = \begin{pmatrix} 1 \\ 2 \end{pmatrix} \\
a \cdot b = 3 \cdot 1 + 4 \cdot 2 = 3 + 8 = 11 \\
b \cdot b = 1 \cdot 1 + 2 \cdot 2 = 1 + 4 = 5 \\
proj_ba = { 11 \over 5 } \begin{pmatrix} 1 \\ 2 \end{pmatrix} = \begin{pmatrix} { 11 \over 5 } \\ { 22 \over 5 } \end{pmatrix} = \begin{pmatrix} 2.2 \\ 4.4 \end{pmatrix}

Orthogonality and Projections are widely apply in

  • Principal Component Analysis (PCA): PCA uses orthogonal projections to transform data into a set of linearly uncorrelated variables called principal components, which are used for dimensionality reduction.
  • Least Squares Method: In regression analysis, projections are used to minimize the error between the observed values and the values predicted by the model.
  • Orthogonalization: Algorithms like the Gram-Schmidt process use orthogonality to create orthonormal bases, simplifying matrix operations and improving numerical stability.

Linear algebra is indispensable for understanding and implementing AI algorithms. From representing data to performing transformations and solving systems of equations, its concepts form the backbone of many AI techniques. Mastery of linear algebra equips you with the tools to comprehend and apply advanced AI algorithms effectively.

Leave a Reply

Your email address will not be published. Required fields are marked *