## Introduction to kernel density estimation (Parzen window method)

In probability theory, it is common to work with certain distributions which describe a stochastic process and reveal information of how the process may behave. In reality, it is not unusual to deal only with data directly without any knowledge about any formal distribution. However, even though we don't know the distribution, it is still valid to assume that the data arise from a hidden distribution. This means we acknowledge that there is a distribution which produced our data but we don't have a clue which one it is and there is probably no way to find out for sure. But there are techniques which can estimate a distribution based on the observed data. One is known as kernel density estimation (also known as Parzen window density estimation or Parzen-Rosenblatt window method). This article is dedicated to this techniques and tries to convey the basics to understand it.

Parzen window is a so-called non-parametric estimation since we don't even know the type of the underlying probability distribution. In contrast, when we estimate the PDF $$\hat{p}(x)$$1 in a parametric way, we know (or assume) the type of the PDF (e.g. a Gaussian) and only have to estimate the parameters of the assumed distribution. But in our case, we really know nothing and hence need to find a different approach which is not based on finding the best-suited parameters.

So, to formulate the problem: suppose we have a set of $$N$$ data points $$X = (x_1, x_2, \ldots, x_N)^T$$. Then, our goal is to find a PDF which satisfies the following conditions:

1. The PDF should resemble the data. This means that the general structure of the data should still be visible. We don't require an exact mapping but groups in the dataset should also be noticeable as high-density values in the PDF.
2. We are interested in a smooth representation of the PDF. Naturally, $$X$$ contains only a discrete list of points but the underlying distribution is most likely continuous. This should be covered by the PDF as well. More importantly, we want something better than a simple step function as offered by a histogram (even though histograms will be the starting point of our discussion). Hence, the PDF should approximate the distribution beyond the evidence from the data points.
3. The last point is already implied by the word PDF itself: we want the PDF to be valid. Mathematically, the most important properties (among others) which the estimation $$\hat{p}(x)$$ must obey are $$\label{eq:ParzenWindow_PDFConstraints} \int_{-\infty}^{\infty} \! \hat{p}(x) \, \mathrm{d}x = 1 \quad \text{and} \quad \hat{p}(x) \geq 0.$$

Why the effort to find $$\hat{p}(x)$$ in the first place? Many applications rely on information about the underlying distribution. This is especially true for classifiers which e.g. can use $$\hat{p}(x)$$ to get the likelihood of an unknown $$x$$ not present in the data. But it gives also insight into the structure of the data; a little bit like a better histogram (broadly speaking). This is, of course, application-specific, but I guess when you search for a PDF estimator than you already know why you do so.

The rest of this article is structured as follows: I start with one-dimensional data and even though I don't intend to give a full introduction to histograms, it is IMHO a very good starting point for our discussion. We will get to know about the problems of classical histograms and the solution ends indeed in the Parzen window estimator. We will look at the definition of the estimator and try to get some intuition about its components and parameters. It is also possible to reach the estimator from a different way; namely from the signal theory by using convolution and we will take a look at this, too. In the last part, I generalize the estimator to the multi-dimensional case and provide an example in 2D.

### It starts with histograms

Histograms are a very easy way to get information about the distribution of the data. They show us how often certain data points occur in the whole dataset. A common example is an image histogram which shows the distribution of the intensity values in an image. In this case, we can simply count how often each of the 256 intensity values occurs and draw a line for each intensity value with the height corresponding to the number of times that intensity value is present in the image. Another nice property of the histogram is that it can be scaled to be a valid discrete probability distribution itself (though, not a PDF yet2). This means we reduce the height of each line so that the total sum is 1. In case of the image histogram, we can simply make the count a relative frequency by dividing by the total number of pixels and we end up with a valid distribution. To make it more specific, let $$b_i$$ denote the number of pixels with intensity $$i$$ and $$N$$ the total number of pixels in the image so that we can calculate the height of each line as

$$\label{eq:ParzenWindow_ImageHistogram} \hat{p}_i = \frac{b_i}{N}.$$

$$\hat{p}_i$$ is then an estimate of the probability of the $$i$$-th intensity value (the relative frequency).

An image histogram is a very simple example because we have only a finite set of values to deal with. In other cases, the border values (e.g. 0 and 255 in the image example) are not that clear. What is more, the values might not even be discrete. If every value $$x \in \mathbb{R}$$ is imaginable, the histogram is not very interesting since most of the values will occur only once. E.g. if we have the values 0.1, 0.11 and 0.112, they end up as three lines in our histogram.

Obviously, we need something more sophisticated. The solution is, instead of counting the occurrences of each possible value, we collect them in so-called bins. Each bin has a certain width and we assign each value to the bin it falls into. Instead of lines, we draw now bars and the height of each bar is proportional to the number of values which fell into the bin. For simplicity, I will assume that each bin has the same width, but note that this is not a requirement.

Let's redefine $$b_i$$ to denote the number of values which fall into the $$i$$-th bin. Can we still use \eqref{eq:ParzenWindow_ImageHistogram} to calculate the height of each bar when the requirement is now to obtain a valid PDF3? No we can't, since the histogram estimate $$\hat{p}(x)$$ would violate the first constraint of \eqref{eq:ParzenWindow_PDFConstraints}, i.e. the area of all bars integrate not up to 1. This is due to the binning where we effectively expand each line horizontally to the bin width. This influences the occupied area and we need some kind of compensation. But the solution is very simple: we just scale the height by the bin width4 $$h > 0$$

$$\label{eq:ParzenWindow_Histogram} \hat{p}(x) = \begin{cases} \frac{1}{h} \cdot \frac{b_i}{N} & \text{if $$x$$ falls into bin $$b_i$$} \\ 0 & \text{else} \\ \end{cases}$$

I want to give an example which will also be used later when we dive into the Parzen window approach. We consider a short data set $$X$$ consisting only of four points (or more precisely: values)

$$\label{eq:ParzenWindow_ExampleData} X = \begin{pmatrix} 1 \\ 1.5 \\ 3 \\ 5 \\ \end{pmatrix}.$$

Suppose, we want to create a histogram with bins of width $$h=2$$. The first question is: where do we start binning, i.e. what is the left coordinate of the first bin? 1? 0? -42? This is actually application-specific since sometimes we have a natural range where it could be valid to start e.g. with 0, but in this case, there is no application context behind the data so I decide to start at $$\min(X) = 1$$. With this approach, we can cover the complete dataset with three bins ($$B_i$$ denotes the range of the $$i$$-th bin)

\begin{equation*} B_1 = [1;3[, B_2 = [3;5[ \quad \text{and} \quad B_3 = [5;7]. \end{equation*}

The first two data points fall into $$B_1$$ and the other two in $$B_2$$ respectively $$B_3$$. The heights of the bins according to \eqref{eq:ParzenWindow_Histogram} are given by

\begin{equation*} \hat{p}(B_1) = \frac{1}{2} \cdot \frac{2}{4} = 0.25 \quad \text{and} \quad \hat{p}(B_2) = \hat{p}(B_3) = \frac{1}{2} \cdot \frac{1}{4} = 0.125. \end{equation*}

It is easy to see that this sums indeed up to $$2 \cdot 0.25 + 2 \cdot 2 \cdot 0.125 = 1$$. The following figure shows a graphical representation of the histogram.

So, why do we need something else when we can already obtain a valid PDF via the histogram method? The problem is that histograms provide only a rough representation of the underlying distribution. And as you can see from the last example, $$\hat{p}(x)$$ results only in a step function and usually we want something smoother. This is where the Parzen window estimator enters the field.

### From histograms to the Parzen window estimator

Our goal is to improve the histogram method by finding a function which is smoother but still a valid PDF. The general idea of the Parzen window estimator is to use multiple so-called kernel functions and place such a function at each position where a data point is located. We are superposing all of these kernels and scale the result to our needs. The resulting function from this process is our PDF. We implement this idea by replacing the cardinality of the bins $$b_i$$ in \eqref{eq:ParzenWindow_Histogram} with something more sophisticated (still for the one-dimensional case):

Kernel density estimation (Parzen window estimation) [1D]

Let $$X = (x_1, x_2, \ldots, x_N)^T \in \mathbb{R}^{N \times 1}$$ be the column vector containing the data points $$x_i \in \mathbb{R}$$. Then, a valid PDF, which returns the probability density for an arbitrary $$x$$-value, can be estimated via the Parzen window estimator as

$$\label{eq:ParzenWindow_KernelDensity} \hat{p}(x) = {\color{Orange}\frac{1}{h \cdot N}} \cdot {\color{Aquamarine}\sum_{i=1}^N} {\color{Red}K}\left( \frac{{\color{Mahogany}x_i -} x}{{\color{LimeGreen}h}} \right).$$

The data is collected in bins of width $$h$$ and the estimation is created via superposition of kernels $$K$$. The kernels have to be valid PDFs itself.

To calculate an arbitrary value $$x$$, we now need to iterate over the complete dataset and sum up the result of the evaluated kernel functions $$K(x)$$. The nice thing is that $$\hat{p}(x)$$ inherits the properties of the kernel functions. If $$K(x)$$ is smooth, $$\hat{p}(x)$$ tends to be smooth as well. Therefore, the kernel dictates how $$\hat{p}(x)$$ behaves, how smooth it is and what the contribution of each $$x_i$$ is to any $$x$$ which we might plug into $$\hat{p}(x)$$. A common choice is to use standard normal Gaussians which we will also use in the example later. What is more, it can be shown that if $$K(x)$$ is a valid PDF, i.e. satisfying, among others, \eqref{eq:ParzenWindow_PDFConstraints}, $$\hat{p}(x)$$ is a valid PDF as well!

Two operations are performed in the argument of the kernel function. The kernel is shifted to the position $$x_i$$ and stretched horizontally by $$h$$ to determine the main range of the contribution. Similar to before, $$h$$ specifies how many points contribute to the bins – only that the bins are now more complicated. We don't have a clear bin-structure anymore since the bins are the result of the superposed kernel functions.

And the last part is to scale the whole function by $$h \cdot N$$. As before in \eqref{eq:ParzenWindow_Histogram}, this is necessary so that the area enclosed by $$\hat{p}(x)$$ equals 1 (requirement for a valid PDF). This is also intuitively clear: as more points we have in our dataset, as more we sum up and to compensate for this we need to scale-down $$\hat{p}(x)$$. Likewise, as large the bin width $$h$$, as wider our kernels get and as more points they consider. In the end, we have to compensate for this as well.

We could now directly start and use this approach to retrieve a smooth PDF. But I first want to give an example, and this is easier when we stick to uniform kernels for now. This will not give us a smooth function in the end but hopefully some insights in the approach itself. The uniform kernels are of length 1 and centred around the origin

$$\label{eq:ParzenWindow_KernelUniform} K_U(x) = \begin{cases} 1, & -0.5 \leq x \leq 0.5 \\ 0, & \text{else} \\ \end{cases}$$

When we want to know how this kernel behaves, we must first talk about the kernel argument $$\frac{x_i - x}{h}$$. How does the kernel change with this argument? As mentioned before, it stretches and shifts the kernel to the position of $$x_i$$:

\begin{align*} -0.5 \leq x \leq 0.5 &\Rightarrow -0.5 \leq \frac{x_i - x}{h} \leq 0.5 \\ &\Leftrightarrow -0.5 \cdot h \leq x_i - x \leq 0.5 \cdot h \\ &\Leftrightarrow -0.5 \cdot h \leq -(-x_i + x) \leq 0.5 \cdot h \\ &\Leftrightarrow 0.5 \cdot h \geq -x_i + x \geq -0.5 \cdot h \\ &\Leftrightarrow -0.5 \cdot h \leq -x_i + x \leq 0.5 \cdot h \\ &\Leftrightarrow -0.5 \cdot h + x_i \leq x \leq 0.5 \cdot h + x_i \\ \end{align*}

I basically moved everything except $$x$$ to the borders so that we see how each side is scaled by $$h$$ and shifted to the position $$x_i$$. Note also that this kernel is symmetric and hence has the same absolute values on both sides. The rest of \eqref{eq:ParzenWindow_KernelDensity} is straightforward since it only involves summation and a scaling at the end (to ensure that the integral over the complete range is 1). Let's move on and apply \eqref{eq:ParzenWindow_KernelDensity} to example values $$X$$ (\eqref{eq:ParzenWindow_ExampleData}) by still using $$h=2$$

\begin{align*} \hat{p}(x) &= \frac{1}{8} \left( \begin{cases} 1, & 0 \leq x \leq 2 \\ 0, & \text{else} \\ \end{cases} + \begin{cases} 1, & 0.5 \leq x \leq 2.5 \\ 0, & \text{else} \\ \end{cases} + \begin{cases} 1, & 2 \leq x \leq 4 \\ 0, & \text{else} \\ \end{cases} + \begin{cases} 1, & 4 \leq x \leq 6 \\ 0, & \text{else} \\ \end{cases} \right) \\ &\approx \frac{1}{8} \begin{cases} 1, & 0 \leq x < 0.5 \\ 2, & 0.5 \leq x < 2.5 \\ 1, & 2.5 \leq x \leq 6 \\ 0, & \text{else} \\ \end{cases} \end{align*}

As you can see, each piecewise defined function is centred around one data point $$x_i$$ and after the summation of such functions we end up with a step function5. It is also easy to show that $$\hat{p}(x)$$ integrates indeed up to

\begin{equation*} \int_{-\infty}^{\infty} \! \hat{p}(x) \, \mathrm{d}x = \frac{1}{8} \cdot \left( 0.5 \cdot 1 + 2 \cdot 2 + 3.5 \cdot 1 \right) = 1. \end{equation*}

The next plot shows our estimate $$\hat{p}(x)$$ together with the previous histogram.

Even though the two PDFs are not identical, they are very similar in the way that both result in a step function, and you probably won't call that a smooth function. That $$\hat{p}(x)$$ is not smooth is a manifestation that \eqref{eq:ParzenWindow_KernelDensity} inherits its properties from the underlying kernels: uniform kernels are not smooth and the same is true for $$\hat{p}(x)$$.

So much for the detour over uniform kernels. I hope it has helped to understand the intuition behind \eqref{eq:ParzenWindow_KernelDensity}. Things get even more interesting when we start using a more sophisticated kernel; a Gaussian kernel for instance (with mean $$\mu$$ and standard deviation $$\sigma$$)

$$\label{eq:ParzenWindow_Gaussian} g(x) = \frac{1}{\sqrt{2 \pi} \sigma} \cdot e^{-\frac{\left(x - \mu\right)^2}{2 \sigma^2}}.$$

This makes the Parzes estimator more complex, but this is required to achieve our smoothness constraint. The good thing is though that our recent findings remain valid. More concretely, we are using standard normal distributed kernels, i.e. $$\mu=0, \sigma=1$$

\begin{equation*} K_g(x) = \frac{1}{\sqrt{2 \pi}} \cdot e^{-\frac{x^2}{2}}. \end{equation*}

If we use this kernel in \eqref{eq:ParzenWindow_KernelDensity} with our example data $$X$$ (\eqref{eq:ParzenWindow_ExampleData}), we end up with

$$\label{eq:ParzenWindow_EstimateGaussian} \hat{p}(x) = \frac{1}{4 \sqrt{2 \pi } h} \cdot \left( e^{-\frac{1}{2 h^2} \cdot (x-5)^2}+e^{-\frac{1}{2 h^2} \cdot (x-3)^2}+e^{-\frac{1}{2h^2} \cdot (x-1.5)^2}+e^{-\frac{1}{2 h^2} \cdot (x-1)^2} \right).$$

Even though the function is now composed of more terms, we can still see the individual Gaussian functions and how they are placed at the locations of the data points. The interesting thing with the standard Gaussian kernel $$K_g(x)$$ is that the shift $$x_i - x$$ and the parameter $$h$$ correspond directly to the mean $$\mu$$ and the standard deviation $$\sigma$$ of the non-standard Gaussian function $$g(x)$$. You can see this by comparing \eqref{eq:ParzenWindow_Gaussian} with \eqref{eq:ParzenWindow_EstimateGaussian} which basically leads to the following assignments:

\begin{align*} \mu &= x_i \\ \sigma &= h \end{align*}

i.e. the Gaussian kernel is scaled by $$h$$ and shifted to the position of $$x_i$$. How does $$\hat{p}(x)$$ now look like and what is the global effect of the bin width $$h$$? You can find out in the following animation.

Note, how (as before) each kernel is centred at the position of the data points and that the bin width $$h$$ still influences how many points are aggregated. But compared to the uniform kernels before, we don't have the clear endings of the bins anymore. It is only noticeable that the Gaussians get wider. However, this is exactly what makes our resulting function smooth and this is, after all, precisely what we wanted to achieve in the first place.

Before I show you an example in the two-dimensional case, I first want to make a detour and reach the Parzen window estimator from a different point of view: convolution.

### From convolution to the Parzen window estimator

If you would ask a specialist from the signal theory field about the Parzen window estimator, you might get a different explanation which includes the words “convolution” and “delta functions”. Even though this approach is completely different from what we have so far, the result will still cohere with our previous findings. The first thing we need to do is to convert our data points to a continuous function $$s(x)$$ (the input signal). We can do so by placing a Dirac delta function $$\delta(x)$$ at each data point

\begin{equation*} s(x) = \delta (x-5)+\delta (x-3)+\delta (x-1.5)+\delta (x-1). \end{equation*}

Then, we simply convolve our signal $$s(x)$$ with the kernel we want to use

\begin{equation*} \hat{p}(x) = s(x) * \frac{1}{h \cdot N} \cdot K_g\left(\frac{x}{h}\right) = \frac{1}{4 \sqrt{2 \pi } h} \cdot \left( e^{-\frac{1}{2 h^2} \cdot (x-5)^2}+e^{-\frac{1}{2 h^2} \cdot (x-3)^2}+e^{-\frac{1}{2h^2} \cdot (x-1.5)^2}+e^{-\frac{1}{2 h^2} \cdot (x-1)^2} \right). \end{equation*}

And the response is already the estimate $$\hat{p}(x)$$! If you compare the result with the previous estimate in \eqref{eq:ParzenWindow_EstimateGaussian}, you can see that they are indeed identical. Note that the shifting $$x_i - x$$ is not necessary anymore since this part is already handled by the delta functions. An important aspect to mention here is the identity property of the convolution operation: if you convolve a function with the delta function, you get again the input function as the return value. This property applied to the kernel means

$$\label{eq:ParzenWindow_ConvolutionIdentity} K_g(x) * \delta(x_i - x) = K_g(x_i - x)$$

and that explains how the kernels are shifted to the positions of the data points. To see how the response emerges “over time”, you can check out the following animation.

Of course, the responses are also visually identical. This is not surprising since the same function is plotted. If you take the time, you can see why the resulting function looks like it looks. E.g. around $$t=1.5$$ the PDF $$\hat{p}(x)$$ reaches its maximum. This becomes clear if you consider the kernel functions around that position: when approaching $$t=1.5$$ from the left side, two Gaussians are rising (orange and green) and one is falling (blue). After $$t=1.5$$ only one function is rising (green) and two are falling (blue and orange). Hence, the summed kernel responses must also decrease after $$t=1.5$$.

So much for the detour into the convolution approach. It is interesting to see that both approaches lead to the same result and in essence cover the same idea (but implemented differently): use a set of kernel functions to generate a PDF which reflects the distribution of the data points. Before finishing this article, I want to generalize \eqref{eq:ParzenWindow_KernelDensity} to arbitrary dimensions and give an example for the two-dimensional case.

### Multi-dimensional data

Extending \eqref{eq:ParzenWindow_KernelDensity} to work in arbitrary dimensions is actually pretty simple. The idea is that our bin, which we assumed so far as a structure which expands only in one dimension, is now a hypercube7 expanding to arbitrary dimensions. This results in a quad in 2D and a cube in 3D. But the general idea is always the same: collect all the data which falls into the hypercube, i.e. lies inside the boundaries of the hypercube. Note that this is also true if you want to calculate a histogram of higher dimensional data.

The factor $$\frac{1}{h}$$ in \eqref{eq:ParzenWindow_KernelDensity} (or \eqref{eq:ParzenWindow_Histogram}) compensated for the bin width so fare. When we want to use hypercubes, we must compensate by the hypervolume instead (area in 2D, volume in 3D etc.). So, the only adjustment in order to work in the $$d$$-dimensional space is to divide by $$\frac{1}{h^d}$$ instead. This leads to the generalization of \eqref{eq:ParzenWindow_KernelDensity}:

Kernel density estimation (Parzen window estimation) [$$d$$-dimensional]

Let $$X = \left( \vec{x}_1, \vec{x}_2, \ldots, \vec{x}_N \right)^T \in \mathbb{R}^{N \times d}$$ be the matrix containing the $$d$$-dimensional data points $$\vec{x}_i \in \mathbb{R}^d$$ ($$i$$-th row of $$X$$). Each column corresponds to a variable (or feature) and each row to an observation. Then, a valid PDF, which returns the probability density for $$d$$ independent variables $$\vec{x} = \left( x_1, x_2, \ldots, x_d\right)^T$$, can be estimated via the Parzen window estimator as

$$\label{eq:ParzenWindow_KernelDensity2D} \hat{p}(\vec{x}) = \frac{1}{h{\color{Mulberry}^d} \cdot N} \cdot \sum_{i=1}^N K \left( \frac{\vec{x}_i - \vec{x}} {h} \right)$$

The data is collected in hypercubes of hypervolume $$h^d$$ and the estimation is created via superposition of kernels $$K$$. The kernels have to be valid PDFs itself.

The only new parameter is the dimension $$d$$. The other change is that the independent variable $$\vec{x} = (x_1, x_2, \ldots, x_d)^T$$ is now a vector instead of a scalar since $$\hat{p}(\vec{x})$$ is now a multi-dimensional PDF. In the following tab pane, you can see an example for the two-dimensional case.

Did you notice that the range of $$h$$ was $$[0.2;2.5]$$ in the first and $$[0.05;0.5]$$ in the second example? As you have also seen in the animations, the same value of $$h$$ has a different effect. This means that $$h$$ actually depends on the scaling of the data which is also intuitively clear: the width (or hypervolume) of the bins is spanned in the space of the data and hence inherits its scaling. It is therefore not possible to give general recommendations for the parameter $$h$$. However, a good starting point might be to use the average distance $$\bar{d}$$ between all point pairs. The measure $$\bar{d}$$ directly incorporates the scaling of the data, e.g. if we scale the data, we will get greater distances between the points as well. More concretely, I was told that it is common to set the bin width to

\begin{equation*} h = \frac{1}{2} \bar{d}. \end{equation*}

Not sure why we have the factor $$\frac{1}{2}$$ here, but for the one-dimensional example from above, this leads to a value of $$h = 1.125$$ which is indeed a pretty good guess here. For other automatic bandwidth selection methods, take e.g. a look at these slides (p. 17 seqq.).

That was it from my side regarding the Parzen window estimator. We have started to define what a good PDF is composed of (resemble the data, smoothness and validity) and then approached the goal in two ways. One way led over histograms which can be simplistic PDFs by themselves but can be extended to also fulfil the smoothness constraint. But even the extension (the Parzen window estimator) inherits the basic behaviour and ideas of the histograms; namely the bins which collect the data. We also saw how this width has a strong influence on the resulting function $$\hat{p}(x)$$ and that it is a parameter which depends on the scaling of the data. The other way led to the signal theory where we transferred our data points to a superposition of delta functions which we then simply convolved with the kernel. And as we saw, the two ways lead to identical results. In the last part, we generalized the approach to arbitrary dimensions and took a look at a two-dimensional example.

List of attached files:

1. Statisticians like to put a $$\hat{}$$ on their variables when they want to make clear that the variable is only an estimation of the real PDF $$p(x)$$.
2. We can't integrate to 1 over a function which consists only of discrete values. Luckily, it is sufficient for a discrete probability distribution to sum the values up to 1 instead.
3. Note that we work with continuous data now and hence need the density instead of the direct probability.
4. In the $$d$$-dimensional case we would need to scale by $$\frac{1}{h^d}$$ instead.
5. I used $$\approx$$ instead of $$=$$ in the last step since at the position $$x=2$$ we have, precisely speaking, $$\hat{p}(2) = 3$$ but this makes the function only more complex and the distinction between $$\leq$$ and $$<$$ is not relevant to a PDF anyway.
6. Precisely speaking, it is not possible to draw $$\delta(x)$$. Even though $$\delta(0)\neq 0$$, the value at $$x=0$$ is still undefined. However, it es very common to draw a single line at this position instead (as also done here).
7. Hyperbin would also be a neat name :-).

## Conductivity matrix for isotropic diffusion problems in FED

In isotropic diffusion, the conductivity is adaptive to the local structure of the signal. But in comparison to anisotropic diffusion, the conductivity is still the same in all directions. Mathematically, the conductivity factor in the diffusion equation depends now on the signal $$u$$ instead of being a constant

$$\frac{\partial u(x,t)}{\partial t} = g(u(x, t)) \cdot \frac{\partial^2 u(x,t)}{\partial^2 x^2}. \label{eq:FEDIsotropic_PDE}$$

In case of images and isotropic diffusion, the conductivity function may be related to the image gradient

$$g(u(x, t)) = \frac{1}{\left(\frac{\left\| \nabla u(x,t) \right\|}{\lambda }\right)^2+1} \label{eq:FEDIsotropic_ConductivityFunction}$$

with already a concrete function being used (more on this later) and where $$\lambda$$ controls how strong the conductivity function reacts1. To use Fast Explicit Diffusion (FED) for isotropic problems the matrix $$A$$ in

$$\vec{u}_{k+1} = \left( I + \tau \cdot A(\vec{u}_k) \right) \cdot \vec{u}_k \label{eq:FEDIsotropic_Discrete}$$

now depends on the discrete signal vector $$\vec{u}_k$$2 evolved to the diffusion time $$k$$. To achieve isotropic diffusion in this scheme $$A(\vec{u}_k)$$ must somehow encode the information from \eqref{eq:FEDIsotropic_ConductivityFunction}. This article deals with this questions and gives a definition of $$A(\vec{u}_k)$$ like described in this paper.

### Conductivity function

Before diving into the matrix definition, I first want to analyse the conductivity function from \eqref{eq:FEDIsotropic_ConductivityFunction} a bit more. Even though only this special definition of $$g(u(x, t))$$ will be used here, let me note that others are possible, too (see e.g. Digital Image Processing by Burger and Burge, p. 437 for an overview) and which may behave slightly different.

So, what is the point of the conductivity function? Like in the homogeneous diffusion this parameter controls how fast the diffusion proceeds. But since $$g(u(x, t))$$ now depends on the signal value this means that the diffusion behaves differently depending on the current structure described by the signal. E.g. if $$\vec{u}$$ describes an image diffusion means blurring an image and $$g(u(x, t))$$ would make this blurring adaptive to the current image structure. More precisely, since the image gradient is used in \eqref{eq:FEDIsotropic_ConductivityFunction} this means that the blurring behaves differently at edges (high gradient magnitude) and homogeneous areas (low gradient magnitude). Often it is desired to suppress blurring around edges and enhance it in homogeneous areas3.

To achieve this behaviour, $$g(u(x, t))$$ should return low values (resulting in low blurring) for higher gradient magnitudes and high values (resulting in high blurring) for lower gradient magnitudes. This is exactly the behaviour which is modelled by \eqref{eq:FEDIsotropic_ConductivityFunction}. The following animation shows this function with control over the parameter $$\lambda$$ which scales the magnitude value. Larger $$\lambda$$ means higher magnitude values conduct and get blurred as well; or in other words: more and more edges are included in the blurring process. One additional important note about this function is that it returns values in the range $$]0;1]$$.

The goal is to define $$A(\vec{u}_k)$$ in a way that \eqref{eq:FEDIsotropic_Discrete} behaves as a good discretization of \eqref{eq:FEDIsotropic_PDE}. The first thing we do is that we apply the conductivity function in \eqref{eq:FEDIsotropic_ConductivityFunction} to all signal values

\begin{equation*} \vec{g}_k = g(\vec{u}_k). \end{equation*}

These conductivity values will then later be used in $$A(\vec{u}_k)$$. Now, let me remind that $$A$$ from the homogeneous diffusion process is defined to be an approximation of the second order derivative and each row can basically be seen as a discrete filter in the form $$\left( 1, -2, 1 \right)$$, or equivalent: $$\left( 1, -(1+1), 1 \right)$$. You can think of this as a weighted process which compares the centre element with its left neighbour by weighting the left side positive and the centre negative and analogous on the right side. $$A(\vec{u}_k)$$ should also encode this information somehow, but the weighting process will be a little bit more sophisticated. The idea is to define a filter like $$\left( a, -(a+b), b \right)$$ with4

\begin{align*} a &= \frac{1}{2} \cdot \left( g_{i-1} + g_{i} \right) \\ b &= \frac{1}{2} \cdot \left( g_{i} + g_{i+1} \right) \end{align*}

encoding the comparison process of the centre element $$i$$ with its neighbours in a similar way as it was the case in the homogeneous diffusion, but rather using not only the signal values but the conductivity information instead. Here, the conductivity information is directly encoded with $$a$$ being the average conductivity between the centre element and its left neighbour and $$b$$ being the average conductivity between the centre element and its right neighbour.

Let's see how this filter expands for an $$i$$ which is not the border element:

\begin{equation*} \begin{pmatrix} a \\ -(a+b) \\ b \\ \end{pmatrix} = \frac{1}{2} \cdot \begin{pmatrix} g_{i-1} + g_{i} \\ -(g_{i-1} + g_{i} + g_{i} + g_{i+1}) \\ g_{i} + g_{i+1} \\ \end{pmatrix} = \frac{1}{2} \cdot \begin{pmatrix} g_{i-1} + g_{i} \\ -g_{i-1} - 2\cdot g_{i} - g_{i+1} \\ g_{i} + g_{i+1} \\ \end{pmatrix} \end{equation*}

As you can see, the weighting process is well defined since in total the elements sum up to zero. The next step is to use this pattern in the definition of $$A(\vec{u}_k)$$. Let's start with a one-dimensional signal vector.

### 1D example

I want to use a signal vector $$\vec{u} \in \mathbb{R}^{9 \times 1}$$ containing nine discrete elements. I don't use concrete values here because I think it won't be helpful for understanding the general computation pattern. Providing an example on the index level is more informative in my opinion. The reason for a $$9 \times 1$$ vector lies in the fact that it is easily possible to use the same example later for the 2D case.

In this example, the matrix $$A(\vec{u}_k)$$ has the dimension $$9 \times 9$$ and is defined as

\begin{equation*} A(\vec{u}_k) = \frac{1}{2} \cdot \begin{pmatrix} -g_{0}-g_{1} & g_{0}+g_{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g_{0}+g_{1} & -g_{0}-2 g_{1}-g_{2} & g_{1}+g_{2} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & g_{1}+g_{2} & -g_{1}-2 g_{2}-g_{3} & g_{2}+g_{3} & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & g_{2}+g_{3} & -g_{2}-2 g_{3}-g_{4} & g_{3}+g_{4} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & g_{3}+g_{4} & -g_{3}-2 g_{4}-g_{5} & g_{4}+g_{5} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & g_{4}+g_{5} & -g_{4}-2 g_{5}-g_{6} & g_{5}+g_{6} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & g_{5}+g_{6} & -g_{5}-2 g_{6}-g_{7} & g_{6}+g_{7} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & g_{6}+g_{7} & -g_{6}-2 g_{7}-g_{8} & g_{7}+g_{8} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & g_{7}+g_{8} & -g_{7}-g_{8} \\ \end{pmatrix} \end{equation*}

Only the first and last row is a bit special since these are the border cases (signal values outside the border are set to zero). The rest of the matrix encodes the discussed weighting process between the centre element (diagonal components of the matrix) and its neighbours (left and right column of the diagonal components).

Remember that in \eqref{eq:FEDIsotropic_Discrete} the actual computation is $$A(\vec{u}_k) \cdot \vec{u}_k$$, so the matrix must be multiplied with the signal vector. For the second row of $$A(\vec{u}_k)$$ this looks like

$$\begin{split} \frac{1}{2} \cdot \begin{pmatrix} g_{0}+g_{1} & -g_{0}-2 g_{1}-g_{2} & g_{1}+g_{2} \\ \end{pmatrix} \cdot \begin{pmatrix} u_{0} \\ u_{1} \\ u_{2} \\ \end{pmatrix} = \\ \frac{1}{2} \cdot \left( (g_{0}+g_{1}) \cdot u_{0} + (-g_{0}-2 g_{1}-g_{2}) \cdot u_{1} + (g_{1}+g_{2}) \cdot u_{2} \right). \end{split} \label{eq:FEDIsotropic_1DComputationLine}$$

In total, this results in

\begin{equation*} A(\vec{u}_k) \cdot \vec{u}_k = \frac{1}{2} \cdot \begin{pmatrix} \left(-g_{0}-g_{1}\right) u_{0}+\left(g_{0}+g_{1}\right) u_{1} \\ \left(g_{0}+g_{1}\right) u_{0}+\left(-g_{0}-2 g_{1}-g_{2}\right) u_{1}+\left(g_{1}+g_{2}\right) u_{2} \\ \left(g_{1}+g_{2}\right) u_{1}+\left(-g_{1}-2 g_{2}-g_{3}\right) u_{2}+\left(g_{2}+g_{3}\right) u_{3} \\ \left(g_{2}+g_{3}\right) u_{2}+\left(-g_{2}-2 g_{3}-g_{4}\right) u_{3}+\left(g_{3}+g_{4}\right) u_{4} \\ \left(g_{3}+g_{4}\right) u_{3}+\left(-g_{3}-2 g_{4}-g_{5}\right) u_{4}+\left(g_{4}+g_{5}\right) u_{5} \\ \left(g_{4}+g_{5}\right) u_{4}+\left(-g_{4}-2 g_{5}-g_{6}\right) u_{5}+\left(g_{5}+g_{6}\right) u_{6} \\ \left(g_{5}+g_{6}\right) u_{5}+\left(-g_{5}-2 g_{6}-g_{7}\right) u_{6}+\left(g_{6}+g_{7}\right) u_{7} \\ \left(g_{6}+g_{7}\right) u_{6}+\left(-g_{6}-2 g_{7}-g_{8}\right) u_{7}+\left(g_{7}+g_{8}\right) u_{8} \\ \left(g_{7}+g_{8}\right) u_{7}+\left(-g_{7}-g_{8}\right) u_{8} \\ \end{pmatrix} \end{equation*}

If we would use this in real applications, we would not really set up the matrix $$A(\vec{u}_k)$$ and perform matrix multiplication. The matrix is way to sparse (containing too many zero elements) in order to make this computational efficient. Instead, it is more efficient to focus on the computation in \eqref{eq:FEDIsotropic_1DComputationLine} and abstract from that. If you look close, you see that the computation consists of six additions/subtractions and three multiplications. With a little bit of restructuring

\begin{align*} &(g_{0}+g_{1}) \cdot u_{0} + (-g_{0}-2 g_{1}-g_{2}) \cdot u_{1} + (g_{1}+g_{2}) \cdot u_{2} \\ &= (g_{0}+g_{1}) \cdot u_{0} + (-g_{0}-g_{1}) \cdot u_{1} + (-g_{1}-g_{2}) \cdot u_{1} + (g_{1}+g_{2}) \cdot u_{2} \\ &= (g_{0}+g_{1}) \cdot u_{0} - (g_{0}+g_{1}) \cdot u_{1} - (g_{1}+g_{2}) \cdot u_{1} + (g_{1}+g_{2}) \cdot u_{2} \\ &= (g_{0}+g_{1}) \cdot (u_{0} - u_{1}) + (g_{1}+g_{2}) \cdot (u_{2} - u_{1}) \end{align*}

we only have 5 additions/subtractions and 2 multiplications left. The following figure depicts the general computation pattern.

Remember that in order to calculate $$\tau_i$$ in the FED scheme (\eqref{eq:FEDIsotropic_Discrete}) the maximum step size $$\tau_{\text{max}}$$ which is still stable in the explicit scheme is needed and can be approximated by the maximum eigenvalue of $$A(\vec{u}_k)$$. Since the maximum return value of $$g(u(x,t))$$ is 1 we now know that a row of $$A(\vec{u}_k)$$ has the values $$0.5 \cdot \left( 2, -4, 2 \right)$$ in the extreme case resulting in a maximum absolute eigenvalue of 4, hence $$\tau_{\text{max}} = 0.25$$ can be used5.

### 2D example

Adding an additional dimension to the diffusion process basically means that the second order derivative now needs to be approximated in two directions: $$A_x(\vec{u}_k)$$ for the horizontal and $$A_y(\vec{u}_k)$$ for the vertical direction. Both matrices are additively combined, meaning the FED scheme from \eqref{eq:FEDIsotropic_Discrete} changes to

$$\vec{u}_{k+1} = \left( I + \tau_i \cdot \left( A_x(\vec{u}_k) + A_y(\vec{u}_k) \right) \right) \cdot \vec{u}_k. \label{eq:FEDIsotropic_2DDiscrete}$$

Unfortunately, the second dimension brings also more effort for the border handling since more border accesses need to be considered. As an example, I want to analyse the $$3 \times 3$$ image

\begin{equation*} \vec{u} = \begin{pmatrix} u_{0} & u_{1} & u_{2} \\ u_{3} & u_{4} & u_{5} \\ u_{6} & u_{7} & u_{8} \\ \end{pmatrix}. \end{equation*}

In the $$x$$-dimension only the vector components $$u_{1}, u_{4}$$ and $$u_{7}$$ are inside of the matrix, respectively the components $$u_{3}, u_{4}$$ and $$u_{5}$$ in the $$y$$-dimension. But this is only so drastic because the example is very small. If the matrix is larger (e.g. represents an image) the border cases may be negligible. To apply the matrices $$A_x(\vec{u}_k)$$ and $$A_y(\vec{u}_k)$$ in a similar way, as it was the case in the 1D example, we assume that the signal matrix is reshaped to a one-dimensional vector. This allows us to use similar computations as in the 1D example. First the definition of

\begin{equation*} A_x(\vec{u}_k) = \frac{1}{2} \cdot \begin{pmatrix} -g_{0}-g_{1} &g_{0}+g_{1} & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ g_{0}+g_{1} & -g_{0}-2g_{1}-g_{2} &g_{1}+g_{2} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 &g_{1}+g_{2} & -g_{1}-g_{2} & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & -g_{3}-g_{4} &g_{3}+g_{4} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 &g_{3}+g_{4} & -g_{3}-2g_{4}-g_{5} &g_{4}+g_{5} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 &g_{4}+g_{5} & -g_{4}-g_{5} & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & -g_{6}-g_{7} &g_{6}+g_{7} & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 &g_{6}+g_{7} & -g_{6}-2g_{7}-g_{8} &g_{7}+g_{8} \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 &g_{7}+g_{8} & -g_{7}-g_{8} \\ \end{pmatrix} \end{equation*}

which now includes two more border cases (third and sixth row) but otherwise did not changed from the 1D case. The calculation in \eqref{eq:FEDIsotropic_2DDiscrete} is easiest done if $$A_x(\vec{u}_k) \cdot \vec{u}_k$$ and $$A_y(\vec{u}_k) \cdot \vec{u}_k$$ are applied separately. For the first matrix, this results in

\begin{equation*} A_x(\vec{u}_k) \cdot \vec{u}_k = \frac{1}{2} \cdot \begin{pmatrix} \left(-g_{0}-g_{1}\right) u_0+\left(g_{0}+g_{1}\right) u_1 \\ \left(g_{0}+g_{1}\right) u_0+\left(-g_{0}-2 g_{1}-g_{2}\right) u_1+\left(g_{1}+g_{2}\right) u_2 \\ \left(g_{1}+g_{2}\right) u_1+\left(-g_{1}-g_{2}\right) u_2 \\ \left(-g_{3}-g_{4}\right) u_3+\left(g_{3}+g_{4}\right) u_4 \\ \left(g_{3}+g_{4}\right) u_3+\left(-g_{3}-2 g_{4}-g_{5}\right) u_4+\left(g_{4}+g_{5}\right) u_5 \\ \left(g_{4}+g_{5}\right) u_4+\left(-g_{4}-g_{5}\right) u_5 \\ \left(-g_{6}-g_{7}\right) u_6+\left(g_{6}+g_{7}\right) u_7 \\ \left(g_{6}+g_{7}\right) u_6+\left(-g_{6}-2 g_{7}-g_{8}\right) u_7+\left(g_{7}+g_{8}\right) u_8 \\ \left(g_{7}+g_{8}\right) u_7+\left(-g_{7}-g_{8}\right) u_8 \\ \end{pmatrix} \end{equation*}

and reveals a very similar computation pattern as in the 1D case. For the $$y$$-dimension the matrix is defined as

\begin{equation*} A_y(\vec{u}_k) = \frac{1}{2} \cdot \begin{pmatrix} -g_{0}-g_{3} & 0 & 0 & g_{0}+g_{3} & 0 & 0 & 0 & 0 & 0 \\ 0 & -g_{1}-g_{4} & 0 & 0 & g_{1}+g_{4} & 0 & 0 & 0 & 0 \\ 0 & 0 & -g_{2}-g_{5} & 0 & 0 & g_{2}+g_{5} & 0 & 0 & 0 \\ g_{0}+g_{3} & 0 & 0 & -g_{0}-2 g_{3}-g_{6} & 0 & 0 & g_{3}+g_{6} & 0 & 0 \\ 0 & g_{1}+g_{4} & 0 & 0 & -g_{1}-2 g_{4}-g_{7} & 0 & 0 & g_{4}+g_{7} & 0 \\ 0 & 0 & g_{2}+g_{5} & 0 & 0 & -g_{2}-2 g_{5}-g_{8} & 0 & 0 & g_{5}+g_{8} \\ 0 & 0 & 0 & g_{3}+g_{6} & 0 & 0 & -g_{3}-g_{6} & 0 & 0 \\ 0 & 0 & 0 & 0 & g_{4}+g_{7} & 0 & 0 & -g_{4}-g_{7} & 0 \\ 0 & 0 & 0 & 0 & 0 & g_{5}+g_{8} & 0 & 0 & -g_{5}-g_{8} \\ \end{pmatrix} \end{equation*}

which looks very different at the first glance. But the computation is actually the same, we only need to combine elements vertically now, e.g. $$u_{0}$$ (first row) and $$u_{3}$$ (fourth row), and the reshaped signal value offers no consecutive access here. For the combination with the signal vector as implied by \eqref{eq:FEDIsotropic_2DDiscrete} we get

\begin{equation*} A_y(\vec{u}_k) \cdot \vec{u}_k = \frac{1}{2} \cdot \begin{pmatrix} \left(-g_{0}-g_{3}\right) u_{0}+\left(g_{0}+g_{3}\right) u_{3} \\ \left(-g_{1}-g_{4}\right) u_{1}+\left(g_{1}+g_{4}\right) u_{4} \\ \left(-g_{2}-g_{5}\right) u_{2}+\left(g_{2}+g_{5}\right) u_{5} \\ \left(g_{0}+g_{3}\right) u_{0}+\left(-g_{0}-2 g_{3}-g_{6}\right) u_{3}+\left(g_{3}+g_{6}\right) u_{6} \\ \left(g_{1}+g_{4}\right) u_{1}+\left(-g_{1}-2 g_{4}-g_{7}\right) u_{4}+\left(g_{4}+g_{7}\right) u_{7} \\ \left(g_{2}+g_{5}\right) u_{2}+\left(-g_{2}-2 g_{5}-g_{8}\right) u_{5}+\left(g_{5}+g_{8}\right) u_{8} \\ \left(g_{3}+g_{6}\right) u_{3}+\left(-g_{3}-g_{6}\right) u_{6} \\ \left(g_{4}+g_{7}\right) u_{4}+\left(-g_{4}-g_{7}\right) u_{7} \\ \left(g_{5}+g_{8}\right) u_{5}+\left(-g_{5}-g_{8}\right) u_{8} \\ \end{pmatrix} \end{equation*}

Note how e.g. the middle line combines $$u_{1}$$ (top) with $$u_{4}$$ (middle) and $$u_{7}$$ (bottom) and how these elements are aligned vertically in the signal matrix $$\vec{u}$$. Like in the 1D case, it is possible to optimize the computation for a matrix line, e.g. for the middle line in $$A_y(\vec{u}_k) \cdot \vec{u}_k$$

\begin{align*} & \left(g_{1}+g_{4}\right) \cdot u_{1} + \left(-g_{1}-2 g_{4}-g_{7}\right) \cdot u_{4} +\left(g_{4}+g_{7}\right) \cdot u_{7} \\ &= \left(g_{1}+g_{4}\right) \cdot u_{1} - \left(g_{1}+g_{4}\right) \cdot u_{4} - \left(g_{4} + g_{7}\right) \cdot u_{4} +\left(g_{4}+g_{7}\right) \cdot u_{7} \\ &= \left(g_{1}+g_{4}\right) \cdot \left( u_{1} - u_{4}\right) +\left(g_{4}+g_{7}\right) \cdot \left( u_{7} - u_{4} \right). \\ \end{align*}

The optimization alongside the $$x$$-direction can directly be copied from the 1D case. To sum up, the following figure illustrates the general computation pattern for the 2D case.

You can move this pattern around to any image location. If some of the computations touch the border, this part is set to zero. E.g. when centred around the element with index 3, the left subtraction/addition is neglected. This is in agreement with the definition of $$A(\vec{u}_k)$$ where the computation is also reduced if out of border values are involved.

Given all this, it is relatively straightforward to implement a FED step (one iteration of a FED cycle). The following listing shows an example implementation in C++ using the OpenCV library6. The function takes the image from the previous iteration, the conductivity image for the current FED cycle (the conductivity image only changes after the FED cycle completes) and the current step size as input. The full source code is available on GitHub.


static cv::Mat FEDInnerStep(const cv::Mat& img, const cv::Mat& cond, const double stepsize)
{
// Copy input signal vector
cv::Mat imgCopy = img.clone();

// Apply the computation pattern to each image location
for (int row = 0; row < img.rows; row++)
{
for (int col = 0; col < img.cols; col++)
{
double xLeft = 0;
double xRight = 0;
double yTop = 0;
double yBottom = 0;

if (col > 0)
{
// 3 <--> 4
xLeft = (cond.at<double>(row, col - 1) + cond.at<double>(row, col)) * (img.at<double>(row, col - 1) - img.at<double>(row, col));
}
if (col < img.cols - 1)
{
// 4 <--> 5
xRight = (cond.at<double>(row, col) + cond.at<double>(row, col + 1)) * (img.at<double>(row, col + 1) - img.at<double>(row, col));
}
if (row > 0)
{
// 1 <--> 4
yTop = (cond.at<double>(row - 1, col) + cond.at<double>(row, col)) * (img.at<double>(row - 1, col) - img.at<double>(row, col));
}
if (row < img.rows - 1)
{
// 4 <--> 7
yBottom = (cond.at<double>(row, col) + cond.at<double>(row + 1, col)) * (img.at<double>(row + 1, col) - img.at<double>(row, col));
}

// Update the current pixel location with the conductivity based derivative information and the varying step size
imgCopy.at<double>(row, col) = 0.5 * stepsize * (xLeft + xRight + yTop + yBottom);
}
}

// Update old image
return img + imgCopy;
}



Note that this is basically the computation pattern from the figures with some additional logic for the border handling. If you have read my article about FED, you might have been a bit disappointed that no real application was shown. But without a definition of the conductivity matrix, it is hard to show something useful. Nevertheless, we have now all ingredients together so that it is possible to apply isotropic diffusion to an example image. The following animation lets you control this process starting from $$T=0$$ (original image) up to the diffusion time $$T=100$$. If you switch to the right tab, you can see the result of the corresponding homogeneous diffusion process (applied via Gaussian convolution).

As you can see, the isotropic diffusion results in blurred regions but the main edges stay intact. E.g. the fine left background structure is nearly completely lost at the end but the stripes of the clothes are still visible. This effect is especially noticeable when the result is compared with the Gaussian filter response, which blurs everything equally regardless of its content.

List of attached files:

1. Note that for anisotropic diffusion the conductivity function would depend on a tensor (e.g. the structure tensor) instead of the scalar valued image gradient resulting in a response which is selective for different orientations.
2. See the first footnote of my FED article for a description of the used notation.
3. I refer to Digital Image Processing by Burger and Burge, p. 433 for a detailed introduction to edge-preserving smoothing in the image processing domain.
4. Note that I omit the evolution index $$k$$ when accessing elements in the vector for simplicity reasons. So, $$g_i$$ refers to the $$i^{\text{th}}$$ component of the vector $$\vec{g}_k$$ (with the current relevant evolution level $$k$$). But this is really only for notational simplicity, the conductivity still needs to be calculated for each evolution step $$k$$ and the signal vector still evolves over the diffusion time $$k$$ (c.f. the explanation of the notation).
5. The Gershgorin circle theorem gives $$\left|\lambda_{\text{max}}\right|=4$$ based on the centre value $$\left|0.5 \cdot (-2)\right| = 2$$ with radius $$0.5 \cdot \left(2 + 2\right) = 2$$ and the relevant equation of the FED article states that in this case $$\tau_{\text{max}} = \frac{1}{\left|\lambda_{\text{max}}\right|} = \frac{1}{4}$$ is the maximum stable step size.
6. Note that this implementation is optimized for readability and not performance. E.g. you may want to change the matrix accesses to the pointer method before using the code in production environments.

## Introduction to Fast Explicit Diffusion (FED)

Diffusion, in general, is a concept which describes the propagation of particles over time within some substance. One good example is temperature diffusing in a closed body leading to the heat equation. The question hereby is: how does the temperature change over time at the considered spatial locations. Assuming for simplicity only 1D spatial locations, this can mathematically be described by a partial differential equation (PDE)

$$\frac{\partial u(x,t)}{\partial t} = g \cdot \frac{\partial^2 u(x,t)}{\partial^2 x^2}. \label{eq:FEDIntroduction_PDE}$$

with $$g$$ being a constant describing the conductivity of the homogeneous diffusion (e.g. how fast levels the temperature to equality). I am not going into details here why this equation looks like it looks (check the linked video for a great introduction). My concern is more about the discretization of this PDE. By using the Euler method, the discretization is achieved by

\begin{align*} t_{k+1} &= t_k + \tau \\ \vec{u}_{k+1} &= \vec{u}_k + \tau \cdot (g \cdot \Delta \vec{u}_k) \\ \end{align*}

which proceeds over time in steps of $$\tau$$. The part in the brackets is basically the discrete version of \eqref{eq:FEDIntroduction_PDE} where $$\Delta$$ denotes an operator for the central finite difference approximation of the second order derivative $$\frac{\partial^2 u(x,t)}{\partial^2 x^2}$$ applied to the vector of signal values $$\vec{u}_k$$1 at the time $$t_k$$ and the constant $$g$$ is re-used as is. Note that in this explicit scheme the time step $$\tau$$ is fixed over the complete process. But one step size might not be appropriate for every step; it may be better if some are larger and others are smaller. This problem is addressed by the Fast Explicit Diffusion (FED) algorithm. Basically, it introduces varying time steps $$\tau_i$$ resulting in more accuracy and higher performance (hence the “Fast” in the name). This article has the aim to introduce FED and to provide examples of how it can be used.

Before going into the details, let me note that FED can not only be applied to homogeneous diffusions (like the heat equation) but also to inhomogeneous, anisotropic or isotropic and even multi-dimensional diffusion problems. More precisely, FED can be applied to any finite scheme of the form [12-13]2

$$\vec{u}_{k+1} = \left( I + \tau \cdot A \right) \cdot \vec{u}_k \label{eq:FEDIntroduction_Arbitrary}$$

where $$A$$ is some symmetric and negative semidefinite matrix embedding the conductivity information. This includes the approximation of the second order derivative $$\Delta$$ and the constant $$g$$. I am coming to this point later, but first dive into what FED actually does.

The rest of this article is structured as follows: first, the concept behind FED is introduced by an example showing the basic rationale behind it. Next, the notation is simplified allowing for generalization of the concept behind FED to a wider range of problems. Then, a summary of the used parameters is given, and in the last section, some elaboration of the role of box filters in FED is provided.

### How does it work?

For simplicity, I stick to one-dimensional homogeneous diffusion problems and set the constant to $$g = 1$$. First, two important notes

• If a homogeneous diffusion is applied to a signal, it is equivalent to applying a Gaussian to that signal. More precisely, the total diffusion time $$T$$ maps to a Gaussian with standard deviation $$\sigma = \sqrt{2 \cdot T}$$ (e.g. Digital Image Processing by Burger and Burge, p. 435).
• If a filter whose coefficients are non-negative and sum up to 1 (i.e. $$\sum w_i = 1$$) is applied multiple times to a signal, it approximates a Gaussian convolution with that signal. This is known from the central limit theorem (CLT).

What has FED now to do with this? In essence, FED introduces an alternative way of applying such a filter by using iterations of explicit diffusion steps. The main finding is that filters (with the discussed properties) can be defined as

$$B_{2n+1} = \prod_{i=0}^{n-1} \left( I + \tau_i \Delta \right). \label{eq:FEDIntroduction_Cycle}$$

I directly try to be a bit more concrete here: $$B_{2n+1}$$ denotes a box filter of size $$2n+1$$ and $$I$$ the identity (e.g. 1). $$\Delta$$ is, again, the operator for an approximation of the second order derivative. If applied to a signal $$\vec{u}$$, it may be defined as3

$$\Delta \vec{u} = u(x-1, t) - 2 \cdot u(x, t) + u(x+1, t), \label{eq:FEDIntroduction_Delta}$$

so the spatial difference at some fixed time $$t$$. You can also think of this operation as a kernel $$\left( 1, -2, 1 \right)$$ convolved with the signal $$\vec{u}$$. All iterations of \eqref{eq:FEDIntroduction_Cycle} together are called a FED cycle. Given this, there is only $$\tau_i$$ left. This is actually the heart of FED since it denotes the varying step sizes, and in this case, is defined as

$$\tau_i = \tau_{\text{max}} \cdot \frac{1}{2 \cos ^2\left(\frac{\pi (2 i+1)}{4 n+2}\right)} \label{eq:FEDIntroduction_TimeStepsBox}$$

with $$\tau_{\text{max}} = \frac{1}{2}$$ here (more on this parameter later). Unfortunately, I can't tell you why this works (the authors provide proves though [46 ff. (Appendix)]) but I can give an example.

We want to show that the FED cycle applied to the signal is indeed the same as convolving the signal with a box filter. Let's start by defining a small signal and a box filter of size $$n=1$$

\begin{equation*} \vec{u} = \begin{pmatrix} 1 \\ 4 \\ 2 \\ 6 \\ \end{pmatrix} \quad \text{and} \quad \tilde{B}_3 = \frac{1}{3} \cdot \begin{pmatrix} 1 \\ 1 \\ 1 \\ \end{pmatrix}. \end{equation*}

The convolution with reflected boundaries4 results in

$$\vec{u} * \tilde{B}_3 = \left( 2.,2.33333,4.,4.66667 \right)^T. \label{eq:FEDIntroduction_ExampleConvolution}$$

For the FED approach, we first need to calculate the time step (only one in this case because of $$n=1$$)

\begin{equation*} \tau_0 = \frac{1}{2} \cdot \frac{1}{2 \cos ^2\left(\frac{\pi (2 \cdot 0+1)}{4 1+2}\right)} = \frac{1}{3} \end{equation*}

and then \eqref{eq:FEDIntroduction_Cycle} can be applied to $$u$$ by multiplication resulting in the following explicit diffusion step

\begin{equation*} B_3 \cdot \vec{u} = \prod_{i=0}^{1-1} \left( I + \tau_i \Delta \right) \cdot \vec{u} = \vec{u} + \tau_0 \cdot \Delta \vec{u}. \end{equation*}

How is $$\Delta \vec{u}$$ defined here? It is basically \eqref{eq:FEDIntroduction_Delta} but extended to handle out of border accesses correctly

\begin{equation*} \Delta u_{j} = \begin{cases} &-& u_{j} &+& u_{j+1}, & j=1 \\ u_{j-1} &-& u_{j}, & & & j=N \\ u_{j-1} &-& 2 u_{j} &+& u_{j+1}, & \text{else} \\ \end{cases} \end{equation*}

and is applied to every vector component $$u_{j}$$. In this case, we have

\begin{equation*} \Delta \vec{u} = \begin{pmatrix} -1+4 \\ 1-2\cdot 4+2 \\ 4-2\cdot 2 + 6 \\ 2-6 \\ \end{pmatrix} = \begin{pmatrix} 3 \\ -5 \\ 6 \\ -4 \\ \end{pmatrix} \end{equation*}

and then everything can be combined together

$$\vec{u} + \tau_0 \cdot \Delta \vec{u} = \begin{pmatrix} 1 \\ 4 \\ 2 \\ 6 \\ \end{pmatrix} + \frac{1}{3} \cdot \begin{pmatrix} 3 \\ -5 \\ 6 \\ -4 \\ \end{pmatrix} = \begin{pmatrix} 2 \\ 2.33333 \\ 4 \\ 4.66667 \\ \end{pmatrix}. \label{eq:FEDIntroduction_ExampleFED}$$

As you can see, \eqref{eq:FEDIntroduction_ExampleConvolution} and \eqref{eq:FEDIntroduction_ExampleFED} produce indeed identical results.

If larger filters and hence more iterations per FED cycle are needed, the same technique would be recursively applied. E.g. for $$n=2$$ we get

\begin{equation*} B_5 \cdot \vec{u}= \prod_{i=0}^{2-1} \left( I + \tau_i \Delta \right) \cdot \vec{u} = \vec{u} + \tau_0 \cdot \Delta \vec{u} + \tau_1 \cdot \Delta \vec{u} + \tau_0 \cdot \tau_1 \cdot \Delta (\Delta \vec{u}), \end{equation*}

where $$\Delta (\Delta \vec{u})$$ effectively encodes an approximation for the fourth order derivative (it calculates the second order derivative of something which is already the second order derivative of the input signal). The concept stays the same for larger filters. The iterations of the FED cycle gradually reach the same state of the corresponding box filter response. To visualize this, the following animation shows the result of a filter with $$n=10$$ each time plotting the state up to the current iteration (i.e. the previous and current multiplications).

So far we only discussed exactly one FED cycle (which consists of iterations). It is also possible to apply multiple cycles just by repeating the procedure. This is then equal to iteratively applying multiple filters to the signal.

Recapitulate what we now have:

• multiple iterations (products in \eqref{eq:FEDIntroduction_Cycle}) form a FED cycle,
• a FED cycle is the same like convolving the signal with a (box) filter,
• multiple FED cycles are the same as the iterative convolution of the signal with (box) filters,
• applying multiple filters approximate a Gaussian filter response and
• the response of a Gaussian filter is the same as the result of the diffusion process.

Therefore, as a conclusion of this chain of logic: FED can be used as explicit diffusion scheme for homogeneous diffusion equations. But how is FED used for diffusion problems which are usually formulated like in \eqref{eq:FEDIntroduction_Arbitrary}? It basically includes \eqref{eq:FEDIntroduction_Cycle}:

$$\vec{u}_{k+1} = \prod_{i=0}^{n-1} \left( I + \tau_i \cdot A \right) \cdot \vec{u}_k. \label{eq:FEDIntroduction_DiscreteDiffusion}$$

This means that one diffusion step (from $$k$$ to $$k+1$$) applies a box filter with length $$n$$ to the signal. And with further diffusion steps, a Gaussian and hence the diffusion itself is approximated. The notion of $$A$$ is introduced in the next section.

You can also think of this as breaking apart the proceeding in diffusion time into multiple blocks (= FED cycles). Each FED cycle has a fixed step size of $$\theta = \sum_{i=0}^{n-1}\tau_i$$ and this size is the same for all cycles (like the usual step size in an explicit scheme). But each FED cycle is now divided into multiple inner steps (factors of \eqref{eq:FEDIntroduction_DiscreteDiffusion}) and each inner step has its own step size $$\tau_i$$. Additionally, all FED cycles share the same inner steps, e.g. each cycle starts with an inner step with step size $$\tau_0$$ and ends with $$\tau_{n-1}$$. The following figure visualizes this for the previous example.

We have now seen how FED generally works. The next part focuses on simplifying the notation.

### Matrix notation

Calculating FED like in the previous section may become unhandy for larger signals and/or larger filters. Luckily, there is an easier way to express the same logic using matrices. It turns out that this is also useful in generalising FED to arbitrary problems (as implied in the beginning).

The example so far is based on a PDE with some change of function value at each position over time. In \eqref{eq:FEDIntroduction_PDE} therefore the derivative definition depended on two variables. It is possible to transform the PDE to a system of ODEs by fixing space. The derivative can then be defined as

\begin{equation*} \frac{\mathrm{d} \vec{u}(t)}{\mathrm{d} t} = A \cdot \vec{u}(t). \end{equation*}

Where $$\vec{u}(t)$$ is the vector containing the discrete set of spatial locations but which are still continuous in time $$t$$. You can think of this as each vector component fixing some $$x$$ position of the original continuous function $$u(x, t)$$ but the function value of this position (e.g. the temperature value) still varies continuously over time. The benefit of this approach is that the operator $$\Delta$$ is discarded and replaced by the matrix $$A \in \mathbb{R}^{N \times N}$$ which returns us to \eqref{eq:FEDIntroduction_Arbitrary}. $$A$$ is now responsible for approximating the second order derivative. To be consistent with $$\Delta$$ it may be defined for the previous example as

\begin{equation*} A = \begin{pmatrix} -1 & 1 & 0 & 0 \\ 1 & -2 & 1 & 0 \\ 0 & 1 & -2 & 1 \\ 0 & 0 & 1 & -1 \\ \end{pmatrix}. \end{equation*}

Note that the pattern encoded in this matrix is the same as in \eqref{eq:FEDIntroduction_Delta}: left and right neighbour weighted once and the centre value compensates for this. Hence, $$A \vec{u}$$ is the same as $$\Delta \vec{u}$$. By calculating

\begin{equation*} (I + \tau_1\cdot A) \vec{u} = \begin{pmatrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 \\ 4 \\ 2 \\ 6 \\ \end{pmatrix} + \tau_1 \cdot \begin{pmatrix} -1 & 1 & 0 & 0 \\ 1 & -2 & 1 & 0 \\ 0 & 1 & -2 & 1 \\ 0 & 0 & 1 & -1 \\ \end{pmatrix} \begin{pmatrix} 1 \\ 4 \\ 2 \\ 6 \\ \end{pmatrix} = \begin{pmatrix} 2 \\ 2.33333 \\ 4 \\ 4.66667 \\ \end{pmatrix} \end{equation*}

we obtain exactly the same result as in \eqref{eq:FEDIntroduction_ExampleFED}. A FED cycle with multiple iterations is done by repeatedly multiplying the brackets (c.f. \eqref{eq:FEDIntroduction_Cycle})

\begin{equation*} \vec{u}_{k+1} = \left( I + \tau_0 \cdot A \right) \left( I + \tau_1 \cdot A \right) \cdots \left( I + \tau_{n-1} \cdot A \right) \vec{u}_k. \end{equation*}

This is actually the basis to extend FED to arbitrary diffusion problems as denoted in \eqref{eq:FEDIntroduction_Arbitrary} [9-11]. The trick is to find an appropriate matrix $$A$$. If you, for example, encode the local conductivity information, i.e. let $$A$$ depend on the signal values: $$A(\vec{u}_{k})$$, it is possible to use FED for inhomogeneous diffusion problems.

In the next section, I want to give an overview of the relevant parameters when using FED.

### Parameters

I already used some parameters, like $$\tau_{\text{max}}$$, without explaining what they do. The following list contains all previously discussed parameters plus introducing new ones which were there only implicitly.

• $$T$$: is the total diffusion time. A longer diffusion time means in general higher blurring of the signal. I already stated that the relation to the standard deviation is $$T = \frac{1}{2} \cdot \sigma$$. Usually, the desired amount of blurring $$\sigma$$ is given.
• $$M$$: a new parameter denoting the number of FED cycles used. So far exactly one cycle was used in the examples but usually multiple cycles are desired. Since multiple cycles correspond to multiple filters applied to the signal and more filters mean a better Gaussian approximation, this parameter controls the quality. Larger $$M$$ means better quality (better Gaussian approximation). This parameter must also be given.
• $$n$$: is the length of one FED cycle, i.e. the number of iteration it contains. It corresponds to the length of an equivalent kernel with size $$2\cdot n + 1$$. Usually, the total diffusion time $$T$$ should be accomplished by $$M$$ cycles of equal length. $$n$$ is therefore the same for all cycles and can be calculated as [11] \begin{equation*} n = \left\lceil -\frac{1}{2} + \frac{1}{2} \cdot \sqrt{1 + \frac{12 \cdot T}{M \cdot \tau_{\text{max}}}} \right\rceil. \end{equation*}
• $$\theta$$: a new parameter denoting the diffusion time for one FED cycle, i.e. how much further in time one cycle approaches. This is basically the sum of all steps $$\tau_i$$ \begin{equation*} \theta = \sum_{i=0}^{n-1} \tau_i \end{equation*} but it is also possible to calculate it explicitly for a given kernel, e.g. for a box filter [10] \begin{equation*} \theta_{\text{box}}(n) = \tau_{\text{max}} \cdot \frac{n^2 + n}{3}. \end{equation*}
• $$\tau_{\text{max}}$$: is the maximum possible time step from the Euler scheme which does not violate stability. It is defined by the stability region and is based on the eigenvalues of $$A$$. More precisely, the relation [12] $$\tau_{\text{max}} \leq \frac{1}{\left| \max_i{(\lambda_i)} \right|} \label{eq:FEDIntroduction_TauMax}$$ for the eigenvalues $$\lambda_i$$ must hold. It is possible to make a worst-case estimation of the eigenvalues with the Gershgorin circle theorem. This explains why $$\tau_{\text{max}} = \frac{1}{2}$$ is used here: the largest eigenvalue of $$A$$ is5 $$\lambda_{\text{max}} = 4$$ and $$\frac{2}{4} = \frac{1}{2}$$ is valid according to \eqref{eq:FEDIntroduction_TauMax}.

After giving an overview of the relevant parameters, I want to elaborate a little bit more on the relation between filters and FED.

### Why box filters?

You may have noticed that I kind of silently used \eqref{eq:FEDIntroduction_Cycle} as a representation of a box filter. Actually, this is even more general since any symmetric filter with its coefficients summing up to 1 can be used. So, why using a box filter? Is the box filter not the filter with such a destructive behaviour in the frequency domain? You could argue that, but let me remind you that in FED we usually use several iterations (i.e. $$M>1$$) and therefore applying multiple box filters which behave more like a Gaussian filter in the end. But there are even better reasons to use a box filter here. They are best understood when looking at how other filters behave [6-8].

In what do varying filters differ? Most importantly, the varying time steps $$\tau_i$$ used in \eqref{eq:FEDIntroduction_Cycle} depend on the concrete filter. The steps in \eqref{eq:FEDIntroduction_TimeStepsBox} e.g. correspond to a box filter. Other filters result in different step sizes [7]. If the $$\tau_i$$ are different, also their sum $$\theta(n)$$ is not the same for all filters. This means if we want to proceed with all filters a fixed amount of diffusion time per FED cycle, $$n$$ must also be defined differently. Therefore, different filters result in different long iterations per FED cycle.

Three filters will now be analysed under two basic criteria:

• Quality: the general goal is to approximate a Gaussian function since this is what the (homogeneous) diffusion equation implies. So the question is: how well do multiple FED cycles approximate a Gaussian?
• Performance: it is good to have high quality but it must also be achieved in a reasonable amount of time. In terms of FED, performance is defined by $$n$$, the number of iterations per cycle. As more iterations needed, as worse is the performance. So the question is: how many iterations are needed in total?

The test setup is now as follows: a total diffusion time of $$T=6$$ should be achieved in $$M=3$$ FED cycles to a signal of length 101 with a single peak in the middle

\begin{equation*} s(i) = \begin{cases} 1, & i = 51 \\ 0, & i \neq 51 \end{cases} \end{equation*}

The behaviour of three different filters will now be analysed: box, binomial and the so-called maximum variance (MV) kernel. The 1st column of the following table shows a graph for each kernel (for concrete definition see [6]).

Filter Cycle time $$\theta(n)$$ Iterations $$n$$ Performance Quality
$$\theta_{\text{box}}(n) = \frac{n^2+n}{6}$$ $$n_{\text{box}}=3$$ Good: $$\mathcal{O}_{\theta_{\text{box}}}(n^2)$$ Good
$$\theta_{\text{bin}}(n) = \frac{n}{4}$$ $$n_{\text{bin}}=8$$ Poor: $$\mathcal{O}_{\theta_{\text{bin}}}(n)$$ Very good
$$\theta_{\text{box}}(n) = \frac{n^2}{2}$$ $$n_{\text{MV}}=2$$ Very good: $$\mathcal{O}_{\theta_{\text{MV}}}(n^2)$$ Poor

The 2nd column shows the cycle time $$\theta(n)$$ for each filter as a variable of $$n$$. As faster this function growths as better because then the diffusion proceeds faster. The following figure shows how the MV kernel is superior in this task. But note also that the box filter, even though growing slower, increases still in a quadratic magnitude of $$n$$ (hence $$\mathcal{O}_{\theta_{\text{box}}}(n^2)$$ in the 4th column of the table).

In the test scenario, each filter is supposed to make a total step of $$\frac{T}{M} = 2$$, so $$n$$ is adjusted for each filter in order to have proceeded the same diffusion time after each cycle. For the box filter, this leaves $$n_{\text{box}}=3$$ since $$\frac{3^2 + 3}{6} = 2$$. Therefore, over three iterations $$2 \cdot 3 = 6$$ FED multiplications are necessary. Similarly for the other values in the 3rd column. It is also worth looking how the time steps $$\tau_i$$ accumulate over the multiplications (in only one FED cycle since the steps are equal among all cycles). The following figure shows how the $$\tau_i$$ accumulate to 2 for each filter revealing that the MV kernel wins this race.

So far for on the performance point of view, but what are the results regarding approximation quality of a Gaussian? To answer this question an error value is calculated. More concretely, the total of absolute differences between the filter response and the corresponding Gaussian after each FED cycle is used

$$E = \sum_{i=1}^{101} = \left| \operatorname{FED}_{T, \text{filter}}(s(i)) - G_{\sigma}(i) \right| \label{eq:FEDIntroduction_Error}$$

where $$\operatorname{FED}_{T, \text{filter}}(s(i))$$ denotes the result after applying the FED cycle for the specific filter up to the total diffusion time $$T$$ and $$G_{\sigma}(i)$$ is the Gaussian with the same standard deviation and mean as the filter response data.

Each of the filters is now applied three times to $$s(i)$$ by using three FED cycles. Since the signal is just defined as a single peak of energy 1, the first iteration will produce the filter itself in each case.

Even though it is visible that all tend to take the shape of a Gaussian, it is also clear that the binomial filter performs best on this task. The visual representation together with the error value now also explains the 5th column of the table.

Summarising, we can now see that no filter is perfect in both disciplines. But if we would need to choose one filter, we would probably select the box filter. It offers reasonably approximation quality and still good performance and this is exactly the reason why in FED the cycle times resulting from the box filter are used.

As a last note, let me show how the step sizes $$\tau_i$$ behave for larger $$n$$. The following figure shows the individual step sizes $$\tau_i$$ and their cumulative sum

\begin{equation*} \sum_{j=0}^{i} \tau_j \end{equation*}

for $$n=50$$. With higher cycle lengths the steps get even larger [12].

As you can see, in higher iterations the step sizes get larger and larger resulting in a great speedup (comparison between using a fixed time step of 0.5 and using the varying step sizes). What is more, there are many steps which actually exceed the stability limit from the explicit scheme. It can be shown though that at the end of a full cycle the result stays stable [10].

List of attached files:

1. A short remark on the notation I use here: \begin{align*} u(x,t) &: \text{continuous function in space and time.} \\ \vec{u} &: \text{discrete signal vector at the start time $$t_0$$.} \\ \vec{u}_k &: \text{discrete signal vector evolved to the diffusion step $$k$$.} \\ u_j &: \text{$$j^{\text{th}}$$ component of the vector $$\vec{u}$$ (or $$\vec{u}_k$$, I usually don't repeat the evolution index $$k$$} \\ &\phantom{:} \text{when noting individual vector elements) with $$j \in \left\{ 1, \ldots, N\right\}$$ and $$N$$ the number of} \\ &\phantom{:} \text{elements in the vector.} \\ \vec{u}(t)&: \text{vector with discrete set of spatial locations which are still continuous in time.} \end{align*}
2. Numbers in square brackets are the page numbers of the journal version of the FED article: Cyclic schemes for PDE-based image analysis.
3. With $$h=1$$ here.
4. Meaning the signal values repeat at the boundaries: $$\left(\ldots, 4, 1, 1, 4, 2, 6, 6, 2, \ldots\right)$$.
5. $$\left| -2 \right| = 2$$ as the centre with radius $$1+1=2$$ leaving a total range of 4. The eigenvalues must also be real since $$A$$ is symmetric.

## Estimate eigenvalues with the Gershgorin circle theorem

Eigenvalues are properly one of the most important metrics which can be extracted from matrices. Together with their corresponding eigenvector, they form the fundamental basis for many applications. Calculating eigenvalues from a given matrix is straightforward and implementations exist in many libraries. But sometimes the concrete matrix is not known in advance, e.g. when the matrix values are based on some bounded input data. In this case, it may be good to give at least some estimation of the range in which the eigenvalues can lie. As the name of this article suggests, there is a theorem intended for this use case and which will be discussed here.

### What it does

For a square $$n \times n$$ matrix $$A$$ the Gershgorin circle theorem returns a range in which the eigenvalues must lie by simply using the information from the rows of $$A$$. Before looking into the theorem though, let me remind the reader that eigenvalues may be complex valued (even for a matrix which contains only real numbers). Therefore, the estimation lives in the complex plane, meaning we can visualize the estimation in a 2D coordinate system with the real part as $$x$$- and the imaginary part as the $$y$$-axis. Note also that $$A$$ has a maximum of $$n$$ distinct eigenvalues.

For the theorem, the concept of a Gershgorin disc is relevant. Such a disk exists for each row of $$A$$, is centred around the diagonal element $$a_{ii}$$ (which may be complex as well) and the sum of the other elements in the row $$r_i$$ constraint the radius. The disk is therefore defined as

$$\label{eq:GershgorinCircle_Disc} C_i = \left\{c \in \mathbb{C} : \left| c - a_{ii} \right| \leq r_i\right\}$$

with the corresponding row sum

$$\label{eq:GershgorinCircle_Disc_RowSum} r_i = \sum_{j=1,\\ j\not=i}^n \left|a_{ij}\right|$$

(absolute sum of all row values except the diagonal elements itself). As an example, let's take the following definition for

\begin{equation*} A = \begin{pmatrix} 4 & 3 & 15 \\ 1 & 1+i & 5 \\ -8 & -2 & 22 \end{pmatrix}. \end{equation*}

There are three Gershgorin discs in this matrix:

• $$C_1$$ with the centre point $$a_{11} = 4$$ and radius $$r_1 = \left|3\right| + \left|15\right| = 18$$
• $$C_2$$ with the centre point $$a_{22} = 1+i$$ and radius $$r_2 = \left|1\right| + \left|5\right| = 6$$
• $$C_3$$ with the centre point $$a_{33} = 22$$ and radius $$r_3 = \left|-8\right| + \left|-2\right| = 10$$

We have now all ingredients for the statement of the theorem:

Gershgorin circle theorem

Every eigenvalue $$\lambda \in \mathbb{C}$$ of a square matrix $$A \in \mathbb{R}^{n \times n}$$ lies in at least one of the Gershgorin discs $$C_i$$ (\eqref{eq:GershgorinCircle_Disc}). The possible range of the eigenvalues is defined by the outer borders of the union of all discs

\begin{equation*} C = \bigcup_{i=1}^{n} C_i. \end{equation*}

The union, in the case of the example, is $$C = C_1 \cup C_2 \cup C_3$$ and based on the previous information of the discs we can now visualize the situation in the complex plane. In the following figure, the discs are shown together with their disc centres and the actual eigenvalues (which are all complex in this case)

\begin{equation*} \lambda_1 = 13.4811 - 7.48329 i, \quad \lambda_2 = 13.3749 + 7.60805 i \quad \text{and} \quad \lambda_3 = 0.14402 + 0.875241 i. \end{equation*}

Indeed, all eigenvalues lie in the blue area defined by the discs. But you also see from this example that not all discs have to contain an eigenvalue (the theorem does not state that each disc has one eigenvalue). E.g. $$C_3$$ on the right side does not contain any eigenvalue. This is why the theorem makes only a statement about the complete union and not each disc independently. Additionally, you can also see that one disc can be completely contained inside another disc as it is the case with $$C_2$$ which lies inside $$C_1$$. In this case, $$C_2$$ does not give any useful information at all, since it does not expand the union $$C$$ (if $$C_2$$ would be missing, nothing changes regarding the complete union of all discs, i.e. $$C=C_1 \cup C_2 \cup C_3 = C_1 \cup C_3$$).

If we want to estimate the range in which the eigenvalues of $$A$$ will lie, we can use the extrema values from the union, e.g.

\begin{equation*} \left[4-18; 22+10\right]_{\operatorname{Re}} = \left[-14; 32\right]_{\operatorname{Re}} \quad \text{and} \quad \left[0 - 18; 0 + 18\right]_{\operatorname{Im}} = \left[-18; 18 \right]_{\operatorname{Im}} \end{equation*}

for the real and complex range respectively. This defines nothing else than a rectangle containing all discs. Of course, the rectangle is an even more inaccurate estimation as the discs already are, but the ranges are easier to handle (e.g. to decide if a given point lies inside the valid range or not). Furthermore, if we have more information about the matrix, like that it is symmetric and real-valued and therefore contains only real eigenvalues, we can discard the complex range completely.

In summary, with the help of the Gershgorin circle theorem, it is very easy to give an estimation of the eigenvalues of some matrix. We only need to look at the diagonal elements and corresponding sum of the rest of the row and get a first estimate of the possible range. In the next part, I want to discuss why this estimation is indeed correct.

### Why it works

Let's start again with a 3-by-3 matrix called $$B$$ but now I want to use arbitrary coefficients

\begin{equation*} B = \begin{pmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{pmatrix}. \end{equation*}

Any eigenvalue $$\lambda$$ with corresponding eigenvector $$\vec{u} = (u_1,u_2,u_3)^T$$ for this matrix is defined as

\begin{align*} B\vec{u} &= \lambda \vec{u} \\ \begin{pmatrix} b_{11} & b_{12} & b_{13} \\ b_{21} & b_{22} & b_{23} \\ b_{31} & b_{32} & b_{33} \end{pmatrix} \begin{pmatrix} u_{1} \\ u_{2} \\ u_{3} \end{pmatrix} &= \lambda \begin{pmatrix} u_{1} \\ u_{2} \\ u_{3} \end{pmatrix} \end{align*}

Next, see how the equation for each component of $$\vec{u}$$ looks like. I select $$u_1$$ and also assume that this is the largest absolute1 component of $$\vec{u}$$, i.e. $$\max_i{\left|u_i\right|} = \left|u_1\right|$$. This is a valid assumption since one component must be the maximum and there is no restriction on the component number to choose for the next discussion. For $$u_1$$ it results in the following equation which will be directly transformed a bit

\begin{align*} b_{11}u_1 + b_{12}u_2 + b_{13}u_3 &= \lambda u_1 \\ b_{12}u_2 + b_{13}u_3 &= \lambda u_1 - b_{11}u_1 \\ \left| u_1 (\lambda - b_{11}) \right| &= \left| b_{12}u_2 + b_{13}u_3 \right| \\ \end{align*}

All $$u_1$$ parts are placed on one side together with the diagonal element and I am only interested in the absolute value. For the right side, there is an estimation possible

\begin{equation*} \left| b_{12}u_2 + b_{13}u_3 \right| \leq \left| b_{12}u_2 \right| + \left| b_{13}u_3 \right| \leq \left| b_{12}u_1 \right| + \left| b_{13}u_1 \right| = \left| b_{12} \right| \left| u_1 \right| + \left| b_{13} \right| \left| u_1 \right| \end{equation*}

First, two approximations: with the help of the triangle inequality for the $$L_1$$ norm2 and with the assumption that $$u_1$$ is the largest component. Last but not least, the product is split up. In short, this results to

\begin{align*} \left| u_1 \right| \left| (\lambda - b_{11}) \right| &\leq \left| b_{12} \right| \left| u_1 \right| + \left| b_{13} \right| \left| u_1 \right| \\ \left| \lambda - b_{11} \right| &\leq \left| b_{12} \right| + \left| b_{13} \right| \\ \left| \lambda - b_{11} \right| &\leq r_1 \\ \end{align*}

where $$\left| u_1 \right|$$ is thrown away completely. This states that the eigenvalue $$\lambda$$ lies in the radius of $$r_1$$ (c.f. \eqref{eq:GershgorinCircle_Disc_RowSum}) around $$b_{11}$$ (the diagonal element!). For complex values, this defines the previously discussed discs.

Two notes on this insight:

• The result is only valid for the maximum component of the eigenvector. Note also that we usually don't know which component of the eigenvector is the maximum (if we would now, we probably don't need to estimate the eigenvalues in the first place because we already have them).
• In the explanation above only one eigenvector was considered. But usually, there are more (e.g. usually three in the case of matrix $$B$$). The result is therefore true for each maximum component of each eigenvector.

This also implies that not every eigenvector gives new information. It may be possible that for multiple eigenvectors the first component is the maximum. In this case, one eigenvector would have been sufficient. As an example, let's look at the eigenvectors of $$A$$. Their absolute value is defined as (maximum component highlighted)

\begin{equation*} \left| \vec{u}_1 \right| = \begin{pmatrix} {\color{Aquamarine}1.31711} \\ 0.40112 \\ 1 \end{pmatrix}, \quad \left| \vec{u}_2 \right| = \begin{pmatrix} {\color{Aquamarine}1.33013} \\ 0.431734 \\ 1 \end{pmatrix} \quad \text{and} \quad \left| \vec{u}_3 \right| = \begin{pmatrix} 5.83598 \\ {\color{Aquamarine}12.4986} \\ 1 \end{pmatrix}. \end{equation*}

As you can see, the third component is never the maximum. But this is coherent to the example from above: the third disc $$C_3$$ did not contain any eigenvalue.

What the theorem now does is some kind of worst-case estimate. We now know that if one component of some eigenvector is the maximum, the row corresponding to this component defines a range in which the eigenvalue must lie. But since we don't know which component will be the maximum the best thing we can do is to assume that every component was the maximum in some eigenvector. In this case, we need to consider all diagonal elements and their corresponding absolute sum of the rest of the row. This is exactly what is done in the example from above. There is another nice feature which can be derived from the theorem when we have disjoint discs. This will be discussed in the next section.

### The story about disjoint discs

Additional statements can be extracted from the theorem when we deal with disjoint disc areas3. Consider another example with the following matrix

\begin{equation*} D= \begin{pmatrix} 1 & -2 & 3 \\ 0 & 6 & 1 \\ 3 & 0 & 9+10 i \\ \end{pmatrix}. \end{equation*}

Using Geshgorin discs this results in a situation like shown in the following figure.

This time we have one disc (centred at $$d_{33}=9+10i$$) which does not share a common area with the other discs. With other words: we have two disjoint areas. The question is: does this gives us additional information?. Indeed, it is possible to state that there is exactly one eigenvalue in the third disc.

Gershgorin circle theorem with disjoint discs

Let $$A \in \mathbb{R}^{n \times n}$$ be a square matrix with $$n$$ Gershgorin discs (\eqref{eq:GershgorinCircle_Disc}). Then, each joined area defined by the discs contains so many eigenvalues as discs contributed to the area. If the set $$\tilde{C}$$ contains $$k$$ discs which are disjoint from the other $$n-k$$ discs, then $$k$$ eigenvalues lie in the range defined by the union

\begin{equation*} \bigcup_{C \in \tilde{C}} C \end{equation*}

of the discs in $$\tilde{C}$$.

In the example, we have exactly one eigenvalue in the third disc and exactly two eigenvalues somewhere in the union of disc one and two4. Why is it possible to restrict the estimation when we deal with disjoint discs? To see this, let me first remind you that the eigenvalues of any diagonal matrix are exactly the diagonal elements itself. Next, I want to define a new function which separates the diagonal elements from the off-diagonals

\begin{align*} \tilde{D}(\alpha) &= \begin{pmatrix} 1 & 0 & 0 \\ 0 & 6 & 0 \\ 0 & 0 & 9+10 i \end{pmatrix} + \alpha \begin{pmatrix} 0 & -2 & 3 \\ 0 & 0 & 1 \\ 3 & 0 & 0 \end{pmatrix} \\ \tilde{D}(\alpha) &= D_1 + \alpha D_2 \end{align*}

With $$\alpha \in [0;1]$$ this smoothly adds the off-diagonal elements in $$D_2$$ to the matrix $$D_1$$ containing only the diagonal elements by starting from $$\tilde{D}(0) = \operatorname{diag}(D) = D_1$$ and ending at $$\tilde{D}(1) = D_1 + D_2 = D$$. Before we see why this step is important, let us first consider the same technique for a general 2-by-2 matrix

\begin{align*} F &= \begin{pmatrix} f_{11} & f_{12} \\ f_{21} & f_{22} \end{pmatrix} \\ \Rightarrow \tilde{F}(\alpha) &= F_1 + \alpha F_2 \end{align*}

If we now want to calculate the eigenvalues for $$\tilde{F}$$, we need to find the roots of the corresponding characteristic polynomial, meaning

\begin{align*} \left| \tilde{F} - \lambda I \right| &= 0 \\ \left| F_1 + \alpha F_2 - \lambda I \right| &= 0 \\ \left| \begin{pmatrix} f_{11} & 0 \\ 0 & f_{22} \end{pmatrix} + \alpha \begin{pmatrix} 0 & f_{12} \\ f_{21} & 0 \end{pmatrix} - \lambda \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix} \right| &= 0. \end{align*}

The solution for the first root of this polynomial and therefore the first eigenvalue is defined as

\begin{equation*} \lambda(\alpha) = \frac{1}{2} \left(-\sqrt{{\color{Aquamarine}4 \alpha ^2 f_{12} f_{21}} +f_{11}^2+f_{22}^2-2 f_{11} f_{22}}+f_{11}+f_{22}\right). \end{equation*}

The thing I am driving at is the fact that the eigenvalue $$\lambda(\alpha)$$ changes only continuously with increasing value of $$\alpha$$ (highlighted position) and as closer $$\alpha$$ gets to 1 as more off-diagonals are added. Especially, $$\lambda(\alpha)$$ does not jump suddenly somewhere completely different. I chose a 2-by-2 matrix because this point is easier to see here. Finding roots of higher dimensional matrices can suddenly become much more complicated. But the statement of continuously increasing eigenvalues stays true, even for matrices with higher dimensions.

No back to the matrix $$\tilde{D}(\alpha)$$ from the example. We will now increase $$\alpha$$ and see how this affects our discs. The principle is simple: just add both matrices together and apply the circle theorem on the resulting matrix. The following animation lets you perform the increase of $$\alpha$$.

As you can see, the eigenvalues start at the disc centres because here only the diagonal elements remain, i.e. $$\tilde{D}(0) = D_1$$. With increasing value of $$\alpha$$ more and more off-diagonal elements are added letting the eigenvalues move away from the centre. But note again that this transition is smooth: no eigenvalue suddenly jumps to a completely different position. Note also that at some point the discs for the first and second eigenvalue merge together.

Now the extended theorem becomes clear: if the eigenvalues start at the disc centres, don't jump around and if the discs don't merge at $$\alpha=1$$, then each union must contain as many eigenvalues as discs contributed to this union. In the example, this gives us the proof that $$\lambda_3$$ must indeed lie in the disc around $$d_{33}$$.

List of attached files:

1. The absolute value is necessary here because the components may be complex.
2. For two vectors $$\vec{x}$$ and $$\vec{y}$$ the inequality $$\left\| \vec{x} + \vec{y} \right\| \leq \left\| \vec{x} \right\| + \left\| \vec{y} \right\|$$ holds (in this case $$\left\|\cdot\right\|_1$$). Intuitively, this means that the way over a detour is always longer or equal to the direct way.
3. The following discussion is based on the statements of these slides.
4. To apply the notation of the theorem to the example, $$\tilde{C}$$ could be defined as $$\tilde{C} = \{C_1, C_2\}$$ and the set of other $$n-k$$ discs contains only the disc $$C_3$$. But defining it the other way around, $$\tilde{C} = \{C_3\}$$, is also possible (and doesn't matter).

## Solving a homogeneous ODE by first-order substitution methods

Calculating analytically solutions to differential equations can be hard and sometimes even impossible. Methods exist for special types of ODEs. One method is to solve the ODE by separation of variables. The idea of substitution is to replace some variable so that the resulting equation has the form of such a special type where a solution exists. In this scenario, I want to look at homogeneous ODEs which have the form

$$y'(x) = F\left( \frac{y(x)}{x} \right). \label{eq:homogeneousODE}$$

They can be solved by replacing $$z=\frac{y}{x}$$ followed by separation of variables. Equations of this kind have the special property of being invariant against uniform scaling ($$y \rightarrow \alpha \cdot y_1, x \rightarrow \alpha \cdot x_1$$):

\begin{align*} \frac{\alpha \cdot \partial y_1}{\alpha \cdot \partial x_1} &= F\left( \frac{\alpha \cdot y_1}{\alpha \cdot x_1} \right) \\ \frac{\partial y_1}{\partial x_1} &= F\left( \frac{y_1}{x_1} \right) \end{align*}

Before analysing what this means, I want to introduce the example from the corresponding Lecture 4: First-order Substitution Methods (MIT OpenCourseWare), which is the source of this article. I derive the substitution process for this example later.

Imagine a small island with a lighthouse built on it. In the surrounding sea is a drug boat which tries to sail silently around the sea raising no attention. But the lighthouse spots the drug boat and targets the boat with its light ray. Panic-stricken the boat tries to escape the light ray. To introduce some mathematics to the situation, we assume that the boat always tries to escape the light ray in a 45° angle. Of course, the lighthouse reacts accordingly and traces the boat back. The following image depicts the situation.

We now want to know the boat's curve, when the light ray always follows the boat directly and the boat, in turn, evades in a 45° angle. We don't model the situation as a parametric curve where the position would depend on time (so no $$(x(t), y(t))$$ here). This also means that we don't set the velocity of the curve explicitly. Instead, the boat position just depends on the angle of the current light ray. Mathematically, this means that in the boat's curve along the x-y-plane the tangent of the curve is always enclosed in a 45° angle with the light ray crossing the boat's position. $$\alpha$$ is the angle of the light ray and when we assume that the lighthouse is placed at the origin so that the slope is just given by the fraction $$\frac{y}{x}$$, $$\alpha$$ is simply calculated as

\begin{equation*} \tan(\alpha) = \frac{y}{x}. \end{equation*}

Now we can define the tangent of the boat's curve, which is given by its slope value

$$y'(x) = f(x,y(x)) = \tan(\alpha + 45°) = \frac{\tan(\alpha) + \tan(45°)}{1 - \tan(\alpha) \cdot \tan(45°)} = \frac{\frac{y(x)}{x} + 1}{1 - \frac{y(x)}{x}} = \frac{x + y(x)}{x - y(x)}. \label{eq:slopeBoat}$$

In the first simplification step, a trigonometric addition formula is used. This again can be simplified so that the result fulfils the definition of \eqref{eq:homogeneousODE}. This means that the ODE can be solved by separation of variables if the substitution $$z(x) = \frac{y(x)}{x}$$ is made. We want to replace $$y'(x)$$, so we first need to calculate the derivative of the substitution equation

\begin{align*} y(x) &= z(x) \cdot x \\ y'(x) &= \frac{\partial y(x)}{\partial x} = z'(x) \cdot x + z(x). \end{align*}

Note that we calculate the derivative with respect to $$x$$ and not $$y(x)$$ (which is a function depending on $$x$$ itself). Therefore the product rule was used. Next we substitute and try to separate variables.

\begin{align*} y'(x) &= \frac{x + y(x)}{x - y(x)} \\ z'(x) \cdot x + z(x) &= \frac{x + z(x) \cdot x}{x - z(x) \cdot x} \\ \frac{\partial z(x)}{\partial x} \cdot x &= \frac{1 + z(x)}{1 - z(x)} - z(x) = \frac{1 + z(x)}{1 - z(x)} - \frac{\left( 1-z(x) \right) \cdot z(x)}{1-z(x)} = \frac{1 + z(x) - z(x) + z^2(x)}{1 - z(x)} \\ \frac{\partial z(x)}{\partial x} &= \frac{1 + z^2(x)}{1 - z(x)} \cdot \frac{1}{x} \\ \frac{1 - z(x)}{1 + z^2(x)} \partial z(x) &= \frac{1}{x} \cdot \partial x \\ \int \frac{1 - z(x)}{1 + z^2(x)} \, \mathrm{d} z(x) &= \int \frac{1}{x} \, \mathrm{d} x \\ \tan^{-1}(z(x)) - \frac{1}{2} \cdot \ln \left( z^2(x)+1 \right) &= \ln(x) + C \\ 0 &= -\tan^{-1}\left(z(x)\right) + \frac{1}{2} \cdot \ln \left( z^2(x)+1 \right) + \ln(x) + C \\ 0 &= -\tan^{-1}\left(z(x)\right) + \ln \left( \sqrt{z^2(x) + 1} \cdot x \right) + C \\ 0 &= -\tan^{-1}\left(z(x)\right) + \ln \left( \sqrt{z^2(x) \cdot x^2 + x^2} \right) + C \\ \end{align*}

(I used the computer for the integration step) We now have a solution, but we first need to substitute back to get rid of the $$z(x) = \frac{y(x)}{x}$$

\begin{align*} 0 &= -\tan^{-1}\left(\frac{y(x)}{x}\right) + \ln \left( \sqrt{\frac{y^2(x)}{x^2} \cdot x^2 + x^2} \right) + C \\ 0 &= -\tan^{-1}\left(\frac{y(x)}{x}\right) + \ln \left( \sqrt{y^2(x) + x^2} \right) + C \end{align*}

Next, I want to set $$C$$ to the starting position of the boat by replacing $$x = x_0$$ and $$y(x) = y_0$$

\begin{align*} C &= \tan^{-1}\left(\frac{y_0}{x_0}\right) - \ln \left( \sqrt{y_0^2 + x_0^2} \right) \end{align*}

The final result is then an implicit function

$$0 = -\tan^{-1}\left( \frac{y}{x} \right) + \ln\left( \sqrt{x^2 + y^2} \right) + \tan^{-1}\left( \frac{y_0}{x_0} \right) - \ln \left( \sqrt{x_0^2 + y_0^2} \right). \label{eq:curveBoat}$$

So, we now have a function where we plug in the starting point $$(x_0,y_0)^T$$ of the boat and then check every relevant value for $$y$$ and $$x$$ where the equation is fulfilled. In total, this results in the boat's curve. Since the boat's position always depends on the current light ray from the lighthouse, you can think of the curve being defined by the ray. To clarify this aspect, you can play around with the slope of the ray in the following animation.

As you can see, the boat's curve originates by rotating the light ray. Also, note that there are actually two curves. This is because we can enclose a 45° angle on both sides of the light ray. The right curve encloses the angle with the top right side of the line and the left curve encloses with the bottom right side of the line. Actually, one starting point defines already both curves, but you may want to depict the situation like two drug boats starting at symmetric positions.

I marked the position $$(x_c,y_c) = (1,2)$$ as “angle checkpoint” to see if the enclosed angle is indeed 45°. To check, we first need the angle of the light ray which is just the angle $$\alpha$$ defined above for the given coordinates

\begin{equation*} \phi_{ray} = \tan^{-1}\left( \frac{y_c}{x_c} \right) = \tan^{-1}\left( \frac{2}{1} \right) = 63.43°. \end{equation*}

For the angle of the boat, we need its tangent at that position which is given by its slope value. So we only need to plug in the coordinates in the ODE previously defined

\begin{equation*} f(x_c,y_c) = \frac{1 + 2}{1 - 2} = -3. \end{equation*}

Forwarding this to $$\tan^{-1}$$ results in the angle of the tangent

\begin{equation*} \phi_{tangent} = \tan^{-1}\left(-3\right) + 180° = -71.57° + 180° = 108.43°. \end{equation*}

I added 180° so that the resulting angle is positive (enclosed angles can only be in the range $$\left[0°;180°\right[$$). Calculating the difference $$\phi_{tangent} - \phi_{ray} = 108.43° - 63.43° = 45°$$ shows indeed the desired result. Of course, this is not only true at the marked point, but rather at any point instead, because that is the way we defined it in \eqref{eq:slopeBoat}.

Another way of visualising \eqref{eq:curveBoat} is to switch to the the polar coordinate system by using $$\theta = \tan^{-1}\left( \frac{y}{x} \right)$$ respectively $$r = \sqrt{x^2 + y^2}$$

\begin{align*} 0 &= -\theta + \ln\left( r \right) + \theta_0 - \ln \left( r_0 \right) \\ \ln \left( \frac{r_0}{r} \right) &= -\theta + \theta_0 \\ \frac{r_0}{r} &= e^{-\theta + \theta_0}, \end{align*}

and solve by $$r$$

$$r\left( \theta \right) = \frac{r_0}{e^{\theta_0 -\theta}}. \label{eq:curveBoatPolar}$$

We can now visualize this function using a polar plot where we move around a (unit) circle and adjust the radius accordingly to \eqref{eq:curveBoatPolar}. The result is a graph which looks like a spiral. Beginning from the starting point the light ray forces the boat to move counter-clockwise in a circle with increasing distance from the island. So, without considering physics (infinity light ray, ...) and realistic human behaviour (escaping strategy of the boat, ...), this cat-and-mouse game lasts forever.

Next, I want to analyse how the curve varies when the starting position of the boat changes. Again, each position of the curve is just given by the corresponding light ray crossing the same position. The curve in total is, therefore, the result of a complete rotation (or multiple like in the polar plot) of the light ray (like above, just with all possible slope values). In the next animation, you can change the starting position manually.

Do you remember the property of scale invariance for a homogeneous ODE introduced in the beginning? Let's have a lock what this means for the current problem. For this, it helps to analyse the different slope values which the equation $$f(x,y)$$ produces. This is usually done via a vector field. At sampled positions ($$x_s,y_s$$) in the grid, a vector $$(1, f(x_s,y_s))$$ is drawn which points in the direction of the slope value at that position (here, the vectors are normalized). So the vector is just a visualization technique to show the value of $$f(x,y)$$. Additionally, I added some isoclines where on all points on one line the slope value is identical. This means that all vectors along a line have the same direction (easily checked on the horizontal line).

You can check this if you move the boat along the line $$y=x$$. This will result in different curves, but the tangent of the boat's starting point is always the same (vertical). Actually, this is already the property of scale invariance: starting from one point, you can scale your point (= moving along an isocline) and always get the same slope value.

List of attached files:

## Intersection area of two circles with implementation in C++

In this article, I want to discuss how the intersection area of two circles can be calculated. Given are only the two circles with their corresponding centre point together with the radius and the result is the area which both circles share in common. First, I want to take a look at how the intersection area can be calculated and then how the needed variables are derived from the given data. At the end of the article, I supply running code in C++.

The following figure illustrates the general problem. A small and a large circle is shown and both share a common area at the right part of the first circle.

As the figure already depicts, the problem is solved by calculating the area of the two circular segments formed by the two circles. The total intersecting area is then simply

\begin{equation*} A_0 + A_1. \end{equation*}

As equation 15 from MathWorld shows, the area of one circular segment is calculated as (all angles are in radiant)

\begin{equation*} \begin{split} A &= \frac{1}{2} r^2 (\theta - \sin(\theta)), \\ &= \frac{1}{2} r^2 \theta - \frac{1}{2} r^2 \sin(\theta). \end{split} \end{equation*}

The formula consists of two parts. The left part is the formula for the area of the circular sector (complete wedge limited by the radius), which is similar to the formula of the complete circle area ($$r^2\pi$$) where the arc length takes a complete round of the circle. Here instead, the arc length is explicitly specified by $$\theta$$ instead of $$\pi$$. If you plug a complete round into $$\theta$$, you get the same result: $$\frac{1}{2} r^2 2\pi = r^2\pi$$. The right part calculates the area of the isosceles triangle (triangle with the radii as sides and heights as baseline), which is a little bit harder to see. With the double-angle formula

\begin{equation*} \sin(2x) = 2\sin(x)\cos(y) \end{equation*}

$$\sin(\theta)$$ can be rewritten as

\begin{equation*} \sin(\theta) = 2\sin\left(\frac{1}{2}\theta\right) \cos\left(\frac{1}{2}\theta\right). \end{equation*}

This leaves for the right part of the above formula

\begin{equation*} \frac{1}{2} r^2 \sin(\theta) = r^2 \sin\left(\frac{1}{2}\theta\right) \cos\left(\frac{1}{2}\theta\right). \end{equation*}

Also, note that $$r \sin\left(\frac{1}{2}\theta\right) = a$$ and $$r \cos\left(\frac{1}{2}\theta\right) = h$$ (imagine the angle $$\frac{\alpha}{2}$$ from the above figure in a unit circle), which results in

\begin{equation*} r^2 \sin\left(\frac{1}{2}\theta\right) \cos\left(\frac{1}{2}\theta\right) = ar \end{equation*}

and since we have an isosceles triangle, this is exactly the area of the triangle.

Originally, the formula is only defined for angles $$\theta < \pi$$ (and probably $$\theta \geq 0$$). In this case, $$\sin(\theta)$$ is non-negative and the area of the circular segment is the subtraction of the triangle area from the circular sector area ($$A = A_{sector} - A_{triangle}$$). But as far as I can see, this formula also works for $$\theta \geq \pi$$, if the angle stays in the range $$[0;2\pi]$$. In this case, the triangle area and the area of the circular sector need to get added up ($$A = A_{sector} + A_{triangle}$$), which is considered in the formula by a negative $$\sin(\theta)$$ (note the negative factor before the $$\sin(\theta)$$ function). The next figure also depicts this situation.

The following table gives a small example of these two elementary cases (circular sector for one circle).

$$r$$ $$\theta$$ $$a = \frac{1}{2} r^2 \theta$$ $$b = \frac{1}{2} r^2 \sin(\theta)$$ $$A = a - b$$
$$2$$ $$\frac{\pi}{3} = 60°$$ $$\frac{2 \pi }{3}$$ $$\sqrt{3}$$ $$\frac{2 \pi }{3} - \sqrt{3} = 0.362344$$
$$2$$ $$\frac{4\pi}{3} = 240°$$ $$\frac{8 \pi }{3}$$ $$-\sqrt{3}$$ $$\frac{8 \pi }{3}- (-\sqrt{3}) = 10.1096$$

It is also from interest to see the area of the circular segment as a function of $$\theta$$:

It is noticeable that the area of one circular segment (green line) starts degressively from the case where the two circles just touch each other, because here the area of the triangle is subtracted. Beginning from the middle at $$\theta = \pi$$ the area of the triangle gets added and the green line proceeds progressively until the two circles contain each other completely (full circle area $$2^2\pi=4\pi$$). Of course, the function itself is independent of any intersecting scenario (it gives just the area for a circular segment), but the interpretation fits to our intersecting problem (remember that in total areas of two circular segments will get added up).

Next, we want to use the formula. The radius $$r$$ of the circle is known, but we need to calculate the angle $$\theta$$. Let's start with the first circle. The second then follows easily. With the notation from the figure, we need the angle $$\alpha$$. Using trigonometric functions, this can be done by

\begin{equation*} \begin{split} \tan{\frac{\alpha}{2}} &= \frac{\text{opposite}}{\text{adjacent}} = \frac{h}{a} \\ \text{atan2}(y, x) &= \text{atan2}(h, a) = \frac{\alpha}{2} \end{split} \end{equation*}

The $$\text{atan2}(y, x)$$ function is the extended version of the $$\tan^{-1}(x)$$ function where the sign of the two arguments is used to determine a resulting angle in the range $$[-\pi;\pi]$$. Please note that the $$y$$ argument is passed first. This is common in many implementations, like also in the here used version of the C++ standard library std::atan2(double y, double x). For the intersection area the angle should be be positive and in the range $$[0;2\pi]$$ as discussed before, so in total we have

\begin{equation*} \alpha = \text{atan2}(h, a) \cdot 2 + 2 \pi \mod 2 \pi. \end{equation*}

Firstly, the range is expanded to $$[-2\pi;2\pi]$$ (factor from the previous equation, since the height $$h$$ covers only half of the triangle). Secondly, positivity is ensured by adding $$+2\pi$$ leaving a resulting interval of $$[0;4\pi]$$. Thirdly, the interval is shrinked to $$[0;2\pi]$$ to stay inside one circle round.

Before we can calculate the $$\alpha$$ angle, we need to find $$a$$ and $$h$$1. Let's start with $$a$$. The two circles build two triangles (not to be confused with the previous triangle used to calculate the area of the circular segment) with the total baseline $$d=a+b = \left\| C_0 - C_1 \right\|_2$$ and the radii ($$r_0,r_1$$) as sides, which give us two equations

\begin{equation*} \begin{split} r_0^2 &= a^2 + h^2, \\ r_1^2 &= b^2 + h^2. \end{split} \end{equation*}

The parameter $$b$$ in the second equation can be omitted (using $$d-a=b$$)

\begin{equation*} r_1^2 = b^2 + h^2 = (d-a)^2 + h^2 = d^2 - 2da + a^2 + h^2 \end{equation*}

and the equation solved by $$h^2$$

\begin{equation*} h^2 = r_1^2 - d^2 + 2da - a^2. \end{equation*}

Plugging this into the equation for the first triangle

\begin{equation*} \begin{split} r_0^2 &= a^2 + r_1^2 - d^2 + 2da - a^2 \\ r_0^2 - r_1^2 + d^2 &= 2da \\ a &= \frac{r_0^2 - r_1^2 + d^2}{2d} \end{split} \end{equation*}

results in the desired distance $$a$$. This directly gives us the height

\begin{equation*} h = \sqrt{r_0^2 - a^2}. \end{equation*}

Using the existing information the angle $$\beta$$ for the second circle can now easily be calculated

\begin{equation*} \beta = \text{atan2}(h, d-a) \cdot 2 + 2 \pi \mod 2 \pi. \end{equation*}

Now we have every parameter we need to use the area function and it is time to summarize the findings in some code.


/**
* @brief Calculates the intersection area of two circles.
*
* @param center0 center point of the first circle
* @param center1 center point of the second circle
* @return intersection area (normally in px²)
*/
double intersectionAreaCircles(const cv::Point2d& center0, const double radius0, const cv::Point2d& center1, const double radius1)
{

const double d_distance = cv::norm(center0 - center1);	// Euclidean distance between the two center points

{
/* Circles do not intersect */
return 0.0;
}

if (d_distance <= fabs(radius0 - radius1)) // <= instead of <, because when the circles touch each other, it should be treated as inside
{
/* One circle is contained completely inside the other, just return the smaller circle area */
const double A0 = PI * std::pow(radius0, 2);
const double A1 = PI * std::pow(radius1, 2);

}

{
/* Both circles are equal, just return the circle area */
}

/* Calculate distances */
const double a_distanceCenterFirst = (std::pow(radius0, 2) - std::pow(radius1, 2) + std::pow(d_distance, 2)) / (2 * d_distance); // First center point to the middle line
const double b_distanceCenterSecond = d_distance - a_distanceCenterFirst;	// Second centre point to the middle line
const double h_height = std::sqrt(std::pow(radius0, 2) - std::pow(a_distanceCenterFirst, 2));	// Half of the middle line

/* Calculate angles */
const double alpha = std::fmod(std::atan2(h_height, a_distanceCenterFirst) * 2.0 + 2 * PI, 2 * PI); // Central angle for the first circle
const double beta = std::fmod(std::atan2(h_height, b_distanceCenterSecond) * 2.0 + 2 * PI, 2 * PI); // Central angle for the second circle

/* Calculate areas */
const double A0 = std::pow(radius0, 2) / 2.0 * (alpha - std::sin(alpha));	// Area of the first circula segment
const double A1 = std::pow(radius1, 2) / 2.0 * (beta - std::sin(beta));		// Area of the second circula segment

return A0 + A1;
}



Basically, the code is a direct implementation of the discussed points. The treatment of the three special cases (no intersection, circles completely inside each other, equal circles) are also from Paul Bourke's statements. Beside the functions of the C++ standard library I also use some OpenCV datatypes (the code is from a project which uses this library). But they play no important role here, so you can easily replace them with your own data structures.

I also have a small test method which covers four basic cases. The reference values are calculated in a Mathematica notebook.


void testIntersectionAreaCircles()
{
/* Reference values from IntersectingCirclesArea_TestCases.nb */
const double accuracy = 0.00001;
CV_Assert(std::fabs(intersectionAreaCircles(cv::Point2d(200, 200), 100, cv::Point2d(300, 200), 120) - 16623.07332) < accuracy);	// Normal intersection
CV_Assert(std::fabs(intersectionAreaCircles(cv::Point2d(200, 200), 100, cv::Point2d(220, 200), 120) - 31415.92654) < accuracy);	// Touch, inside
CV_Assert(std::fabs(intersectionAreaCircles(cv::Point2d(200, 200), 100, cv::Point2d(400, 200), 100) - 0.0) < accuracy);			// Touch, outside
CV_Assert(std::fabs(intersectionAreaCircles(cv::Point2d(180, 200), 100, cv::Point2d(220, 200), 120) - 28434.24854) < accuracy);	// Angle greater than 180°
}



List of attached files:

1. The derivations are based on the work by Paul Bourke.

## Representation of a line in the polar coordinate system

Recently, I read a tutorial about Hough line transform at the OpenCV tutorials. It is a technique to find lines in an image using a parameter space. As explained in the tutorial, for this it is necessary to use the polar coordinate system. In the commonly used Cartesian coordinate system, a line would be represented by $$y=mx+b$$. In the polar coordinate system on the other hand, a line is represented by

$$y=-\frac{\cos{\theta}}{\sin{\theta}}x + \frac{\rho}{\sin{\theta}}. \label{eq:PolarCoordinateSystem_LineRepresentation}$$

In \eqref{eq:PolarCoordinateSystem_LineRepresentation} there are two new parameters: radius $$\rho$$ and angle $$\theta$$ as also depicted in the following figure. $$\rho$$ is the length of a vector which always starts at the pole $$(0,0)$$ (analogous term to the origin in the Cartesian coordinate system) and ends at the line (orange in the figure) so that $$\rho$$ will be orthogonal to the line. This is important because otherwise the following conclusions wouldn't work.

So, first start with the $$y$$-intercept $$b=\frac{\rho}{\sin{\theta}}$$. Note that the angle $$\theta$$ comes up twice: between the $$x$$-axis and the $$\rho$$ vector plus between the $$y$$-axis and the blue line (on the right side). We will use trigonometrical functions to calculate the $$y$$-intercept. This is simply done by using the $$\sin$$-function

\begin{equation*} \begin{split} \sin{\theta} &= \frac{\text{opposite}}{\text{hypotenuse}} \\ \sin{\theta} &= \frac{\rho}{b} \\ b &= \frac{\rho}{\sin{\theta}} \end{split} \end{equation*}

and that is exactly what the equation said for the $$y$$-intercept.

Now it is time for the slope $$m=-\frac{\cos{\theta}}{\sin{\theta}}$$. For this, the relation

\begin{equation*} m = \tan{\alpha} \end{equation*}

is needed, where $$\alpha$$ is the slope angle of the line. $$\alpha$$ can be calculated by using our known $$\theta$$ angle:

\begin{equation*} \alpha = 180^{\circ} - (180^{\circ} - 90^{\circ} - \theta) = 90^{\circ} + \theta. \end{equation*}

Now we have $$m=\tan{\left(90^{\circ} + \theta\right)}$$, which is equivalent to $$m=\frac{\sin{\left(90^{\circ} + \theta\right)}}{\cos{\left(90^{\circ} + \theta\right)}}$$. Because of $$\sin{x}=\cos{\left(90^{\circ}-x\right)}$$ and $$\cos{x}=\sin{\left(90^{\circ}-x\right)}$$ we can do a little bit of rewriting

\begin{equation*} m=\frac {\cos\left({90^{\circ} - (90^{\circ} + \theta)}\right)} {\sin\left({90^{\circ} - (90^{\circ} + \theta}\right))} = \frac {\cos\left({-\theta}\right)} {\sin\left({-\theta}\right)} = \frac {\cos\left({\theta}\right)} {-\sin\left({\theta}\right)} = -\frac {\cos\left({\theta}\right)} {\sin\left({\theta}\right)} \end{equation*}

and we have exactly the form we need. In the last step, I used the property that $$\sin(x) = -\sin(-x)$$ is an odd and $$\cos(x) = \cos(-x)$$ an even function.

## Standardisierung einer Zufallsvariablen

Bei der Standardisierung einer Zufallsvariablen wird in der Statistik aus einer vorhandenen Zufallsvariable eine neue erzeugt, welche bestimmte Eigenschaften besitzt.

Wichtig ist dies vor allem bei der Verwendung der Normalverteilung, wenn normalverteilte Zufallsvariablen standardisiert werden. Dies ist nützlich für die Berechnung der Wahrscheinlichkeitswerte, da diese nun in entsprechenden vorberechneten Tabellen der Standardnormalverteilung abgelesen werden können.

Eine allgemeine Einführung zur Standardisierung ist im angehängten PDF-Dokument zu finden, ebenso die entsprechenden LaTeX-Source-Dateien. Das Dokument soll nur die Standardisierung aufzeigen (für das bessere Verständnis), nicht konkrete Anwendungen davon.

Standardisierung.pdf
Standardisierung.zip