Table of Contents

maxLevel	2
type	flat

Excerpt

hidden	true

using guassian processes to learn from trained data and do inferencing on unknowns.

...

Resources

Machine Learning - MIT 2006 - bible for MIT's machine learning groups.
Guassian Processes - mathematical monk@youtube on guassian processes (also follows on through 10 parts).

...

GP Regression

Data

Mathblock

\begin{align*}
a &= \left( 1, \dots, l \right) \hspace{10mm} &  \\
b &= \left( l, \dots, n \right) & \\
x_a &= \left(x_1, \dots, x_l \right) &(inference \hspace{2mm} points)\\
x_b &= \left(x_{l+1}, \dots, x_n \right) &(training \hspace{2mm} points)\\
y_a &= \left(y_1, \dots, y_l\right) &(unobserved)\\
y_b &= \left(y_l, \dots, y_n\right) &(observed) \\
\end{align*}

where

Mathinline

body	x_i \in S

and

Mathinline

body	y_i \in \mathcal{R}

.

Model

Mathblock

\begin{align*}
\left(Z_x\right) &\sim GP(\mu, k) \hspace{1em} on \hspace{1em}  S \\
\xi &\sim \aleph\left(0, \sigma^2I\right) \\
\xi &= \left(\xi_1, \dots, \xi_n\right) \\
Y_i &= Z_{x_i} + \xi_i \\
Y &= \left(Y_1, \dots, Y_n\right) \\
\tilde{Z} &= \left(Z_{x_1}, \dots, Z_{x_n}\right) \\
Y &= \tilde{Z} + \xi \\
\end{align*}

where

Mathinline

body	\xi_i, Z_{x_i}, Y_i

are univariate,

Mathinline

body	\xi, \tilde{Z}, Y

is multivariate and

Mathinline

body	\xi

is independent of

Mathinline

body	\left(Z_x\right)

.

Info

icon	false

Status

colour	Blue
title	Info

The GP represents your prior belief about the model. Your choice of

Mathinline

body	\mu

and

Mathinline

body	k

here is very influential to the result of the inferencing. Free parameters in your choice of covariance function

Mathinline

body	k

are called hyperparameters.

Example

For modelling what we believe to be a continuously varying process on

Mathinline

body	\mathcal{R}^d \rightarrow \mathcal{R}

centred on the origin, it is enough to set

Mathinline

body	\mu = 0

and

Mathinline

body	k(x_1, x_2) = \exp\left(\|x_1 - x_2\|^2\right)

.

Info

icon	false

Status

colour	Blue
title	Info

Training data can be used to influence your selection and parameterisation of

Mathinline

body	\mu

and

Mathinline

body	k

.

Not worrying about this topic for now. Just hand tuning to keep things simple.

Inference

Tip

icon	false

Status

colour	Green
title	Desired

Here we are trying to infer the distribution of unknown points from the data points, i.e. the conditional

Mathinline

body	p(y_a\|y_b)

.

First let's consider the multivariate guassian

Mathinline

body	\tilde{Z} \sim \aleph\left(\tilde{\mu}, K\right)

that we know we can extract from the GP using data points

Mathinline

body	\left(x_1, \dots, x_n\right)

on

Mathinline

body	S

. From the definition of guassian processes,

Mathblock
\begin{align} \tilde{\mu} &= \left(\mu(x_1), \dots, \mu(x_n)\right) \\ K &= \left(k_{ij}\right) = k(x_i, x_j) \\ \end{align}

or more simply:

Mathblock
\tilde{\mu} = \begin{bmatrix} \mu_a \\ \mu_b \\ \end{bmatrix} \hspace{5em} K = \begin{bmatrix} K_{aa} & K_{ab} \\ K_{ba} & K_{bb} \\ \end{bmatrix}

Since

Mathinline

body	\tilde{Z}

and

Mathinline

body	\xi

are independent, the multivariate guassian

Mathinline

body	Y

has means and variances which are simply summed (sum of covariances from

Mathblock ref

anchor	sum_of_variances
page	Fundamental Properties

):

Mathblock
Y \sim \aleph(\tilde{\mu}, K + \sigma^2I)

From this, we can get the conditional distribution:

Mathblock
\left(Y_a\|Y_b = y_b\right) \sim \aleph\left(m, C\right)

where we can express

Mathinline

body	m, C

using the complicated looking, but simple to express formulas for conditional guassians in

Mathblock ref

anchor	conditional
page	Guassian Distributions

:

Mathblock

anchor	posterior_mean_variance

\begin{align*}
m &= \mu_a + K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}(y_b - \mu_b) \\
C &= \left(K_{aa} + \sigma^2I\right) - K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}K_{ba} \\
\end{align*}

...

Conclusions

Prior and Posterior

The GP itself is your prior knowledge about the model. The resulting conditional distribution is the posterior.

Image Added

Model is Where the Tuning Happens

Tuning your model for the GP, i.e.

Mathinline

body	\mu

and

Mathinline

body	k

is where you gain control over how your inferencing result behaves. For example, a stationary vs non-stationary kernel function typically induce very different behaviour in different parts of the domain.

Variance Collapses Around Training Points

For simplicity, if you set

Mathinline

body	\sigma = 0

(noise free), assume

Mathinline

body	\mu = 0

in the model and

Mathinline

body	x_a = (x_i)

for a single training data point

Mathinline

body	x_i

, then working through the equations in

Mathblock ref

anchor	posterior_mean_variance

shows everything cancelling out and leaving you with just

Mathinline

body	m = y_i

and

Mathinline

body	C = 0

. Throwing the noise in changes things a little, but you still get the dominant collapse of variance around the training points.

Characteristics of the Posterior

The mean in

Mathblock ref

anchor	posterior_mean_variance

can be viewed either as 1) a linear combination of the observations

Mathinline

body	y_b

or 2) a linear combination of the kernel functions centred on training data points (elements of

Mathinline

body	K_{ab}

).

The variance can also be intuitively interpreted. It is simply the prior,

Mathinline

body	K_{aa}

with a positive term subtracted due to information from the observations.

Gaussian Process vs Bayesian Regression

Guassian Process regression utilises kernels, not basis functions. However both can be shown to be equivalent for given choice of basis functions/kernels.. I rather like guassian processes for the ease of implementation.

Versions Compared

Old Version 1

New Version Current

Key

Resources

GP Regression

Data

Model

Inference

Conclusions

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Resources

GP Regression

Data

Model

Inference

Conclusions