Resources
GP Regression
Data
Loading
where
Loading
and
Loading
.
Model
Loading
where
Loading
are univariate,
Loading
is multivariate and
Loading
is independent of
Loading
.
Example
For modelling what we believe to be a continuously varying process on
Loading
centred on the origin, it is enough to set
Loading
and
Loading
.
Not worrying about this topic for now. Just hand tuning to keep things simple.
Inference
First let's consider the multivariate guassian
Loading
that we know we can extract from the GP using data points
Loading
on
Loading
. From the definition of guassian processes,
Loading
or more simply:
Loading
Since
Loading
and
Loading
are independent, the multivariate guassian
Loading
has means and variances which are simply summed (sum of covariances from
Loading
):
Loading
From this, we can get the conditional distribution:
Loading
where we can express
Loading
using the complicated looking, but simple to express formulas for conditional guassians in
Loading
:
Loading
Conclusions
Prior and Posterior
The GP itself is your prior knowledge about the model. The resulting conditional distribution is the posterior.

Model is Where the Tuning Happens
Tuning your model for the GP, i.e.
Loading
and
Loading
is where you gain control over how your inferencing result behaves. For example, a
stationary vs non-stationary kernel function typically induce very different behaviour in different parts of the domain.
Variance Collapses Around Training Points
For simplicity, if you set
Loading
(noise free), assume
Loading
in the model and
Loading
for a single training data point
Loading
, then working through the equations in
Loading
shows everything cancelling out and leaving you with just
Loading
and
Loading
. Throwing the noise in changes things a little, but you still get the dominant collapse of variance around the training points.
Characteristics of the Posterior
The mean in
Loading
can be viewed either as 1) a linear combination of the observations
Loading
or 2) a linear combination of the kernel functions centred on training data points (elements of
Loading
).
The variance can also be intuitively interpreted. It is simply the prior,
Loading
with a positive term subtracted due to information from the observations.
Gaussian Process vs Bayesian Regression
Guassian Process regression utilises kernels, not basis functions. However both can be shown to be equivalent for given choice of basis functions/kernels.. I rather like guassian processes for the ease of implementation.