GP Regression

GP Regression


Resources


GP Regression

Data

Loading

where 

Loading
 and 
Loading
.

Model

Loading

where 

Loading
 are univariate, 
Loading
 is multivariate and 
Loading
 is independent of 
Loading
.

INFO The GP represents your prior belief about the model. Your choice of

Loading
and
Loading
here is very influential to the result of the inferencing. Free parameters in your choice of covariance function
Loading
are called hyperparameters.

Example

For modelling what we believe to be a continuously varying process on 

Loading
 centred on the origin, it is enough to set 
Loading
 and 
Loading

INFO Training data can be used to influence your selection and parameterisation of

Loading
and
Loading
.

Not worrying about this topic for now. Just hand tuning to keep things simple.

Inference

DESIRED Here we are trying to infer the distribution of unknown points from the data points, i.e. the conditional 

Loading

First let's consider the multivariate guassian 

Loading
 that we know we can extract from the GP using data points 
Loading
  on 
Loading
. From the definition of guassian processes,

Loading


or more simply:

Loading

Since 

Loading
 and 
Loading
 are independent, the multivariate guassian 
Loading
  has means and variances which are simply summed (sum of covariances from 
Loading
):


Loading

From this, we can get the conditional distribution:

Loading

where we can express 

Loading
 using the complicated looking, but simple to express formulas for conditional guassians in 
Loading
:

Loading

Conclusions

Prior and Posterior

The GP itself is your prior knowledge about the model. The resulting conditional distribution is the posterior.

Model is Where the Tuning Happens

Tuning your model for the GP, i.e. 

Loading
 and 
Loading
 is where you gain control over how your inferencing result behaves. For example, a stationary vs non-stationary kernel function typically induce very different behaviour in different parts of the domain.

Variance Collapses Around Training Points

For simplicity, if you set 

Loading
 (noise free), assume 
Loading
 in the model and 
Loading
 for a single training data point 
Loading
, then working through the equations in 
Loading
 shows everything cancelling out and leaving you with just 
Loading
 and 
Loading
.  Throwing the noise in changes things a little, but you still get the dominant collapse of variance around the training points.

Characteristics of the Posterior

The mean in 

Loading
 can be viewed either as 1) a linear combination of the observations 
Loading
 or 2) a linear combination of the kernel functions centred on training data points (elements of 
Loading
).

The variance can also be intuitively interpreted. It is simply the prior, 

Loading
 with a positive term subtracted due to information from the observations.

Gaussian Process vs Bayesian Regression

Guassian Process regression utilises kernels, not basis functions. However both can be shown to be equivalent for given choice of basis functions/kernels.. I rather like guassian processes for the ease of implementation.