Putting it in simple term, the norms
in Linear algebra, lets us measure how big a tensor is. In Machine learning, tensors (vectors, matrices, etc.) are heavily used as a basic unit of representation. The learning algorithms work on numbers and need a proper representation of input data (text, image, audio, etc.) in the form of tensors.
In the case of scalar quantity, there is a single number. The value itself lets us know its significance. Comparisons can be easily made on scalar quantities. For example, if the mass of person A is 79kg and the mass of person B is 80kg, we can easily deduce that person B’s mass is greater and the difference of their masses is small. However, in the case of higher-order of tensor, there are a bunch of numbers that collectively form the tensor. And in such a case, we need some mathematical operation to know how big a tensor is.
In short, the norm
of a tensor tells us how big the tensor is. The norm
can also be simply understood as a mapping of a tensor to some scalar. Besides estimating the size of a tensor, the same idea can be generalized to find the distance between two tensors as well (i.e. how close two tensors are).
Mathematical view
A norm
is a function f that maps a tensor to a scalar while satisfying a few properties.
The first property says that the norm should always be non-negative. For a zero-vector \overset{\rightarrow}{x}, the norm
is zero. A zero vector has all elements zero and is basically positioned at the origin. This means that if all the elements of a vector are 0, then the norm of that vector should be 0.
The second property is the property of triangle inequality. Let’s recall the school days: triangle inequality says that the sum of lengths of two sides of a triangle will always be greater than the third side. Bringing the same rule here, any norm
function must also follow the triangle inequality rule.
The third property is linearity. If the elements of a vector are scaled by a constant factor \alpha, then its norm
also scales by |\alpha| (absolute value of the factor).
Standard Norms
Working with real examples makes it easy to understand what exactly is happening behind the scenes. Let’s consider a vector \overset{\rightarrow}{x} = \begin{bmatrix} -2\\ 5\\ 3\\ \end{bmatrix}, as an example.
Euclidean Norm (L2 norm)
We’ve been using this norm since our school days. It is the same “distance formula”. It is also known as the L2 norm (we’ll see later, why it is called so). The Euclidean norm of a vector \overset{\rightarrow}{x} is given by the square root of the sum of the squares of the vector elements.
The mathematical representation:
||\overset{\rightarrow}{x}||_{2} = \sqrt{\sum_{i=1}^{n} {x} _{i}^{2}}L2 norm of the \overset{\rightarrow}{x} is:
||\overset{\rightarrow}{x}||_{2} = (x_{1}^{2}+x_{2}^{2}+x_{3}^{2})^{\frac{1}{2}} = (-2^{2}+5^{2}+3^{2})^{\frac{1}{2}} = 6.164Sometimes the subscript 2 is omitted, and written simply as ||x||.
1-norm (L1 norm)
In 1-norm, we simply add the absolute values of individual elements of the vector. The mathematical representation of 1-norm is given below:
||\overset{\rightarrow}{x}||_{1} = \sum_{i=1}^{n}|{x} _{i}|1-norm of the \overset{\rightarrow}{x} is:
||\overset{\rightarrow}{x}||_{1} = (|x_{1}|+|x_{2}|+|x_{3}|) = ( |-2|+|5|+|3|) = 10p-norm (Lp norm)
The Euclidean norm and 1-norm which we discussed above are two special cases of Lp (p-norm). The mathematical representation of Lp norm is:
||\overset{\rightarrow}{x}||_{p} = \left ( \sum_{i=1}^{n}{|x_{i}|}^{p} \right ) ^{1/p} ||\overset{\rightarrow}{x}||_{p} = (|x_{1}|^{p}+|x_{2}|^{p}+|x_{3}|^{p})^{1/p}Lp norm is defined for p\geq 1.
max-norm (\infty-norm)
Using this norm, we find out the largest value from the absolute values of the vector elements.
||\overset{\rightarrow}{x}||_{\infty } = max(|x_{1}|,|x_{2}|,|x_{3}|,…, |x_{n}|)In our exmaple, the max-norm of \overset{\rightarrow}{x} is 5.
If we look closely at the Lp norm’s mathematical representation, we can see that when the value of p becomes infinity, only the maximum value block remains (other elements become very insignificant with respect to the largest value’s power). Therefore, in the end, the square of the largest absolute value deals with the overall square root producing the largest absolute value as the final result.
In a similar way, we can have L1 norm, L2 norm, .., for matrices as well. There is however a commonly used norm in the case of matrices: Frobenius norm.
The Frobenius norm is very similar to the Euclidean norm, where a matrix is unrolled to form a vector (just for ease of representation) and then the computation is the same as that of the Euclidean norm. That means the square root of the sum of the squares of all the elements is computed. Frobenius norm satisfies all the cases of vector norms.
Okay, that’s it for today. Some more linear algebra will show up within the next week. I’ll see you in the next one. Good day.