A limited number of mathematical concepts escaped their original narrow field of application and became an essential conceptual tool throughout mathematics, physical sciences and engineering. Some of these tools came from the Egyptians and the Babylonians and are now taught in primary school: integers, fractions, sums and so on. Others arose in the 17th century to deal with tangents, areas and infinitesimal changes. The Fourier transform is perhaps the most advanced and beautiful of these universal tools. It is difficult to overestimate the importance of the Fourier transform in mathematics, physics, and engineering. The Fourier transform is also the most important tool of signal processing. The impressive range of applicability of the Fourier transform in signal processing is arguably due to its nonparametric nature. The Fourier transform can be applied to signals of any kind without requiring additional assumptions about their nature. The reliance of Gaussian process regression on a prior distribution over the space of latent functions seems to be incompatible with methods such as the Fourier transform that work on any kind of data without the need for prior knowledge . This blog post is based on this paper about the connection between Gaussian process regression and the Fourier transform.
The discrete Fourier transform
The discrete Fourier transform (DFT) is the main computational tool of signal processing. Consider a real-valued or complex-valued signal defined on a sequence of equally spaced sampling points. The discrete Fourier coefficient is defined as follows:
where the frequency index ranges from to . The signal can be reconstructed from the discrete Fourier coefficients as a linear combination of complex exponentials:
A Bayesian interpretation of the DFT
We can now begin to establish a connection between the Fourier transform and Gaussian process regression. Assume that the discrete function is sampled through noisy measurements :
where is the variance of the measurement noise. We assume that the function is a linear combination of complex exponentials. Furthermore, we place standard normal prior distributions on the coefficients:
The functional prior induced on the discrete signal is a multivariate Gaussian with mean zero and identity covariance matrix. In fact,
This is the finite dimensional analogue of a white-noise process where the values of the functions at different time points are uncorrelated. In this sense, the functional prior distribution corresponding to the DFT is maximally uninformative because it does not bias towards any particular frequency. The posterior expectation of the weights of this Bayesian model is obtained by shrinking the Fourier coefficients of the measurements:
This expression clearly tends to the standard DFT at the limit of zero noise.
From discrete to continuous signals
In most practical situations the signal of interest is not intrinsically discrete but instead is obtained by sampling an underlying continuous-time signal. The estimation of the Fourier transform of a continuous signal from a finite number of samples is an ill-posed problem since infinitely many functions can interpolate the samples. From a Bayesian point of view, ill-posed problems can be solved by assigning a prior distribution over the parameter space. In this case we need to assign a functional prior over the functional space of continuous-time signals. The use of the DFT for estimating the Fourier transform of the underlying signal implicitly assumes a very specific functional space. To see this we can write the continuous analogue of the model introduced in the previous section:
The resulting functional prior is a Gaussian process defined by the following band-limited covariance function:
The most important feature of this covariance function is that it assigns zero correlation between the values of the signal when is a multiple of the spacing between the sample points. In other words, the band-limited covariance function behaves like a white-noise covariance function when the signal is restricted to the sample points. The only exception to this rule is when is equal to a multiple of . In this latter case the autocorrelation is equal to . Therefore, functions sampled from this functional space are always periodic with period equal to . There are two other things to note about this Gaussian process prior. First, the prior depends on the choice of the sample points. This is a rather unnatural choice for a Bayesian model since the prior knowledge about the signal should conceptually be independent from the measurement process. Second, the Gaussian process is degenerate, meaning that the underlying functional space is finite-dimensional. A consequence of this is that at the zero measurement noise limit the posterior distribution over the whole function collapses to its expected value for all values of . This deterministic limit is a consequence of the Nyquist–Shannon sampling theorem.
The Bayesian reformulation allows to easily relax some of the assumptions behind the DFT as an estimator of the Fourier transform of a continuous signal. For example in this model the number of measurements need not to be equal to the number of basis functions. Consequently, we can consider the limit without requiring an infinite amount of data points and thereby remove the assumption of periodicity. The result is a Gaussian process with a sinc covariance function:
where is the Nyquest frequency. The resulting functional prior is somehow similar to the DFT case but it is not degenerate since it is obtained from an infinite number of basis functions. Fig.~??A shows the difference between the continuous signal recovered using the BL covariance function (green line) and the sinc covariance function (blue line). The shaded areas show the standard deviation of the posterior distribution of the GP regressions. Note that the standard deviation of the analysis performed using the band-limited covariance function is equal to zero everywhere since the covariance function is degenerate. Conversely, the standard deviation corresponding to the analysis with the sinc covariance function is very high even in between the sample points. This reflects the fact that the sinc is the most uninformative covariance function with a spectrum supported from to . I also included a comparison with a squared exponential covariance function (red line). In this case the resulting posterior has remarkably higher interpolation and extrapolation performance and lower posterior variance when compared to the sinc covariance function. The figure shows the resulting estimate of the Fourier transform. The main difference between the band-limited and the sinc covariance function is that the spectral density of the former is concentrated into a series of discrete spectral lines while the latter has a continuous spectrum. In this example, the estimation obtained using the squared exponential covariance function has remarkably higher performance, as shown by the about six order of magnitude higher sidelobe suppression.
The Bayes-Gauss-Fourier transform
As is clear from the previous example, the performance of the spectral estimate depends on the properties of the covariance function. The DFT and the sinc covariance functions have the advantage of being unbiased in the frequency domain, however this translates into a strong disadvantage when some prior information is available. For example, when the underlying signal is smooth the squared exponential covariance function has higher performance. Is it possible to construct a covariance function that is both unbiased and that exploits the regularities of the underlying signal? The answer is yes if we are willing to learn the covariance function from the data. This is what we did in a recent paper about this topic. The result is the Bayes-Gauss-Fourier (BGF) transform. You can see the results of this transform in the figure. Note that a squared exponential covariance function cannot extrapolate the oscillatory waveform beyond the sampling points and a periodic covariance function requires prior knowledge about the frequency of the oscillation. Conversely, the BGF transform is not biased towards any particular frequency.