Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
maxLevel2
typeflat
Excerpt
hiddentrue

using guassian processes to learn from trained data and do inferencing on unknowns.

...

Resources

...

GP Regression

Data

Mathblock
\begin{align*}
a &= \left( 1, \dots, l \right) \hspace{10mm} &  \\
b &= \left( l, \dots, n \right) & \\
x_a &= \left(x_1, \dots, x_l \right) &(inference \hspace{2mm} points)\\
x_b &= \left(x_{l+1}, \dots, x_n \right) &(training \hspace{2mm} points)\\
y_a &= \left(y_1, \dots, y_l\right) &(unobserved)\\
y_b &= \left(y_l, \dots, y_n\right) &(observed) \\
\end{align*}

where 

Mathinline
bodyx_i \in S
 and 
Mathinline
bodyy_i \in \mathcal{R}
.

Model

Mathblock
\begin{align*}
\left(Z_x\right) &\sim GP(\mu, k) \hspace{1em} on \hspace{1em}  S \\
\xi &\sim \aleph\left(0, \sigma^2I\right) \\
\xi &= \left(\xi_1, \dots, \xi_n\right) \\
Y_i &= Z_{x_i} + \xi_i \\
Y &= \left(Y_1, \dots, Y_n\right) \\
\tilde{Z} &= \left(Z_{x_1}, \dots, Z_{x_n}\right) \\
Y &= \tilde{Z} + \xi \\
\end{align*}

where 

Mathinline
body\xi_i, Z_{x_i}, Y_i
 are univariate, 
Mathinline
body\xi, \tilde{Z}, Y
 is multivariate and 
Mathinline
body\xi
 is independent of 
Mathinline
body\left(Z_x\right)
.

Info
iconfalse

Status
colourBlue
titleInfo
The GP represents your prior belief about the model. Your choice of
Mathinline
body\mu
and
Mathinline
bodyk
here is very influential to the result of the inferencing. Free parameters in your choice of covariance function
Mathinline
bodyk
are called hyperparameters.

Example

For modelling what we believe to be a continuously varying process on 

Mathinline
body\mathcal{R}^d \rightarrow \mathcal{R}
 centred on the origin, it is enough to set 
Mathinline
body\mu = 0
 and 
Mathinline
bodyk(x_1, x_2) = \exp\left(|x_1 - x_2|^2\right)

Info
iconfalse

Status
colourBlue
titleInfo
Training data can be used to influence your selection and parameterisation of
Mathinline
body\mu
and
Mathinline
bodyk
.

Not worrying about this topic for now. Just hand tuning to keep things simple.

Inference

Tip
iconfalse

Status
colourGreen
titleDesired
Here we are trying to infer the distribution of unknown points from the data points, i.e. the conditional 
Mathinline
bodyp(y_a|y_b)

First let's consider the multivariate guassian 

Mathinline
body\tilde{Z} \sim \aleph\left(\tilde{\mu}, K\right)
 that we know we can extract from the GP using data points 
Mathinline
body\left(x_1, \dots, x_n\right)
  on 
Mathinline
bodyS
. From the definition of guassian processes,

Mathblock
\begin{align*}
\tilde{\mu} &= \left(\mu(x_1), \dots, \mu(x_n)\right) \\
K &= \left(k_{ij}\right) = k(x_i, x_j) \\
\end{align*}


or more simply:

Mathblock
\tilde{\mu} = \begin{bmatrix} \mu_a \\ \mu_b \\ \end{bmatrix} \hspace{5em} K = \begin{bmatrix} K_{aa} & K_{ab} \\ K_{ba} & K_{bb} \\ \end{bmatrix}

Since 

Mathinline
body\tilde{Z}
 and 
Mathinline
body\xi
 are independent, the multivariate guassian 
Mathinline
bodyY
  has means and variances which are simply summed (sum of covariances from 
Mathblock ref
anchorsum_of_variances
pageFundamental Properties
):


Mathblock
Y \sim \aleph(\tilde{\mu}, K + \sigma^2I)

From this, we can get the conditional distribution:

Mathblock
\left(Y_a|Y_b = y_b\right) \sim \aleph\left(m, C\right)

where we can express 

Mathinline
bodym, C
 using the complicated looking, but simple to express formulas for conditional guassians in 
Mathblock ref
anchorconditional
pageGuassian Distributions
:

Mathblock
anchorposterior_mean_variance
\begin{align*}
m &= \mu_a + K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}(y_b - \mu_b) \\
C &= \left(K_{aa} + \sigma^2I\right) - K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}K_{ba} \\
\end{align*}

...

Conclusions

Prior and Posterior

The GP itself is your prior knowledge about the model. The resulting conditional distribution is the posterior.

Image Added

Model is Where the Tuning Happens

Tuning your model for the GP, i.e. 

Mathinline
body\mu
 and 
Mathinline
bodyk
 is where you gain control over how your inferencing result behaves. For example, a stationary vs non-stationary kernel function typically induce very different behaviour in different parts of the domain.

Variance Collapses Around Training Points

For simplicity, if you set 

Mathinline
body\sigma = 0
 (noise free), assume 
Mathinline
body\mu = 0
 in the model and 
Mathinline
bodyx_a = (x_i)
 for a single training data point 
Mathinline
bodyx_i
, then working through the equations in 
Mathblock ref
anchorposterior_mean_variance
 shows everything cancelling out and leaving you with just 
Mathinline
bodym = y_i
 and 
Mathinline
bodyC = 0
.  Throwing the noise in changes things a little, but you still get the dominant collapse of variance around the training points.

Characteristics of the Posterior

The mean in 

Mathblock ref
anchorposterior_mean_variance
 can be viewed either as 1) a linear combination of the observations 
Mathinline
bodyy_b
 or 2) a linear combination of the kernel functions centred on training data points (elements of 
Mathinline
bodyK_{ab}
).

The variance can also be intuitively interpreted. It is simply the prior, 

Mathinline
bodyK_{aa}
 with a positive term subtracted due to information from the observations.

Gaussian Process vs Bayesian Regression

Guassian Process regression utilises kernels, not basis functions. However both can be shown to be equivalent for given choice of basis functions/kernels.. I rather like guassian processes for the ease of implementation.