The correlation coefficient is an important metric to measure the linear dependency between two variables $$X$$ and $$Y$$. It is defined as

\begin{equation*} r_{XY} = \frac{s_{XY}}{s_{X} \cdot s_{Y}} \in [-1;1] \end{equation*}

where $$s_{XY}$$ denotes the covariance and $$s_{X}, s_{Y}$$ the standard deviations for both variables. High magnitudes $$\left| r_{XY} \right|$$ indicate a strong linear relationship between the variables. Another way of seeing this is that we start from a strong relationship and with increasing noise in our variables $$\left| r_{XY} \right|$$ gets smaller.

This is illustrated in the animation below. For $$X$$, points are generated from -4 to 4 in steps of 0.001 and a direct linear relationship is forced on the second variable

\begin{equation*} Y = 2X. \end{equation*}

Hence, without further changes, all points lie exactly on a line leading to $$r_{XY} = 1$$. To analyse the influence of noise on the variables, Gaussian noise is added to the variables separately

\begin{align*} \tilde{X} &= X + N(0, \sigma_x) \\ \tilde{Y} &= Y + N(0, \sigma_y). \end{align*}

The noise parameters $$\sigma_x$$ and $$\sigma_y$$ can both be controlled in the animation.

List of attached files:

← Back to the overview page