Measuring distance is an important task for many applications like preprocessing, clustering or classification of data. In general, the distance between two points can be calculated as

$$\label{eq:EuclideanStandardizationMahalanobis_Distance} \operatorname{d}(\fvec{x}, \fvec{y}) = \sqrt{\left( \fvec{x} - \fvec{y} \right)^T S^{-1} \left( \fvec{x} - \fvec{y} \right)}$$

where $$S$$ is a $$n \times n$$ matrix which defines the distance type. It starts with the common Euclidean distance which only calculates the length of the line between two points. If the variance of the feature dimensions is very different (e.g. if the first dimension measures in meter and the second measures in kilogram, then the ranges will very likely be different), then we can compensate by standardizing each dimension first. To also compensate for the correlation between feature dimensions, the Mahalanobis distance is useful. The following table summarizes for each distance type the basic properties (for the 2D case).

Method $$S$$ Distance pattern Information from the dataset
Euclidean \begin{equation*} \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ \end{pmatrix} \end{equation*} Equal circles None
Standardization \begin{equation*} \begin{pmatrix} \sigma_1^2 & 0 \\ 0 & \sigma_2^2 \\ \end{pmatrix} \end{equation*} Axis-aligned ellipses Variance in each dimension: $$\sigma_1^2, \sigma_2^2$$
Mahalanobis \begin{equation*} \begin{pmatrix} \sigma_1^2 & \sigma_{12} \\ \sigma_{21} & \sigma_2^2 \\ \end{pmatrix} \end{equation*} Data-aligned ellipses Variance in each dimension: $$\sigma_1^2, \sigma_2^2$$, Variance across dimensions: $$\sigma_{12} = \sigma_{21}$$

Note that the definition of the Euclidean distance is indeed the same as the L2-norm of the difference vector (shown for the 2D case)

\begin{align*} \sqrt{\left( \fvec{x} - \fvec{y} \right)^T S_E^{-1} \left( \fvec{x} - \fvec{y} \right)} &= \sqrt{ \begin{pmatrix} x_1 - y_1 & x_2 - y_2 \end{pmatrix} \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x_1 - y_1 \\ x_2 - y_2 \end{pmatrix} } \\ &= \sqrt{ \begin{pmatrix} x_1 - y_1 & x_2 - y_2 \end{pmatrix} \begin{pmatrix} x_1 - y_1 \\ x_2 - y_2 \end{pmatrix} } \\ &= \sqrt{ (x_1 - y_1)^2 + (x_2 - y_2)^2 } \\ &= \left\| \fvec{x} - \fvec{y} \right\|_2. \end{align*}

To visualize the different distances consider the following example where we have two sets of data points which were both randomly generated from a normal distribution but with different variances. Additionally, they are translated and rotated to different positions in the coordinate frame. Consider this as a two-class problem where all points from the first dataset belong to the first and all points from the second dataset belong to the second class. We can also define the cluster centre for each class as

\begin{equation*} \fvec{c}_i = \frac{1}{\left| C_i \right|} \sum_{\fvec{x} \in C_i} \fvec{x} \end{equation*}

where $$C_i$$ is the set which stores every data from class $$i$$.

Suppose a new data point should be classified, e.g. assigned to one of the two classes ($$\omega_1$$ or $$\omega_2$$). A very simple thing to do is to calculate the distance to the centre of each class and assign the new point to the class where this distance is minimized

$$\label{eq:EuclideanStandardizationMahalanobis_Classifier} \omega_* = \operatorname{argmin}_{i\in\{1,2\}} \sqrt{\left( \fvec{x} - \fvec{c}_i \right)^T S_i^{-1} \left( \fvec{x} - \fvec{c}_i \right)}.$$

The classifier basically utilizes \eqref{eq:EuclideanStandardizationMahalanobis_Distance}, but instead of calculating the distance between arbitrary points it always calculates the distance to the centre of the current class. Note also that each class has its own (co)-variance matrix $$S_i$$. We want to take account for the different variances of the data points of each class so that the natural structure of the data points is reflected in the distance calculation as well. Hence, the points of the two classes are considered separately leading to covariance matrices of

\begin{equation*} S_1 = \begin{pmatrix} 11.4278 & 6.73963 \\ 6.73963 & 4.91336 \\ \end{pmatrix} \quad \text{and} \quad S_2 = \begin{pmatrix} 7.163 & -33.9277 \\ -33.9277 & 228.504 \\ \end{pmatrix}. \end{equation*}

These matrices also define the distance pattern around each centre which visualizes how a distance function behaves relative to a fixed point (the cluster centre $$\fvec{c}_i$$ in this case). Take a look at the following animation to explore the different methods and their distance pattern.

As you can see, using the Euclidean distance results in equal circles around each centre. The structure of the data is not taken into consideration; we only search for the nearest red point.

Using standardized variables, we incorporate that the variance of the feature dimensions ($$x_1, x_2$$) is not the same. To take this into account, the equal circles are transformed to ellipses. But note that these ellipses are still aligned with the coordinate frame.

Finally, the Mahalanobis distance also analyses the correlation between the feature dimensions in the dataset (e.g. is there a linear dependency between $$x_1$$ and $$x_2$$?) and uses this information in the distance calculation. The distance pattern consists still of ellipses but compared to standardized variables they are now transformed to the shape of the data. More precisely: they are aligned with the principal components of the data.

List of attached files:

← Back to the overview page