Excerpt |
---|
|
using guassian processes to learn from trained data and do inferencing on unknowns. |
...
Resources
- Machine Learning - MIT 2006 - bible for MIT's machine learning groups.
- Guassian Processes - mathematical monk@youtube on guassian processes (also follows on through 10 parts).
...
GP Regression
Data
Mathblock |
---|
\begin{align*}
a &= \left( 1, \dots, l \right) \hspace{10mm} & \\
b &= \left( l, \dots, n \right) & \\
x_a &= \left(x_1, \dots, x_l \right) &(inference \hspace{2mm} points)\\
x_b &= \left(x_{l+1}, \dots, x_n \right) &(training \hspace{2mm} points)\\
y_a &= \left(y_1, \dots, y_l\right) &(unobserved)\\
y_b &= \left(y_l, \dots, y_n\right) &(observed) \\
\end{align*} |
where
and .Model
Mathblock |
---|
\begin{align*}
\left(Z_x\right) &\sim GP(\mu, k) \hspace{1em} on \hspace{1em} S \\
\xi &\sim \aleph\left(0, \sigma^2I\right) \\
\xi &= \left(\xi_1, \dots, \xi_n\right) \\
Y_i &= Z_{x_i} + \xi_i \\
Y &= \left(Y_1, \dots, Y_n\right) \\
\tilde{Z} &= \left(Z_{x_1}, \dots, Z_{x_n}\right) \\
Y &= \tilde{Z} + \xi \\
\end{align*} |
where
are univariate, is multivariate and is independent of . Info |
---|
|
The GP represents your prior belief about the model. Your choice of and here is very influential to the result of the inferencing. Free parameters in your choice of covariance function are called hyperparameters. |
Example
For modelling what we believe to be a continuously varying process on
Mathinline |
---|
body | \mathcal{R}^d \rightarrow \mathcal{R} |
---|
|
centred on the origin, it is enough to set and Mathinline |
---|
body | k(x_1, x_2) = \exp\left(|x_1 - x_2|^2\right) |
---|
|
. Info |
---|
|
Training data can be used to influence your selection and parameterisation of and . |
Not worrying about this topic for now. Just hand tuning to keep things simple.
Inference
Tip |
---|
|
Here we are trying to infer the distribution of unknown points from the data points, i.e. the conditional . |
First let's consider the multivariate guassian
Mathinline |
---|
body | \tilde{Z} \sim \aleph\left(\tilde{\mu}, K\right) |
---|
|
that we know we can extract from the GP using data points Mathinline |
---|
body | \left(x_1, \dots, x_n\right) |
---|
|
on . From the definition of guassian processes, Mathblock |
---|
\begin{align*}
\tilde{\mu} &= \left(\mu(x_1), \dots, \mu(x_n)\right) \\
K &= \left(k_{ij}\right) = k(x_i, x_j) \\
\end{align*} |
or more simply:
Mathblock |
---|
\tilde{\mu} = \begin{bmatrix} \mu_a \\ \mu_b \\ \end{bmatrix} \hspace{5em} K = \begin{bmatrix} K_{aa} & K_{ab} \\ K_{ba} & K_{bb} \\ \end{bmatrix} |
Since
and are independent, the multivariate guassian has means and variances which are simply summed (sum of covariances from Mathblock ref |
---|
anchor | sum_of_variances |
---|
page | Fundamental Properties |
---|
|
):
Mathblock |
---|
Y \sim \aleph(\tilde{\mu}, K + \sigma^2I) |
From this, we can get the conditional distribution:
Mathblock |
---|
\left(Y_a|Y_b = y_b\right) \sim \aleph\left(m, C\right) |
where we can express
using the complicated looking, but simple to express formulas for conditional guassians in Mathblock ref |
---|
anchor | conditional |
---|
page | Guassian Distributions |
---|
|
: Mathblock |
---|
anchor | posterior_mean_variance |
---|
|
\begin{align*}
m &= \mu_a + K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}(y_b - \mu_b) \\
C &= \left(K_{aa} + \sigma^2I\right) - K_{ab}\left(K_{bb}+\sigma^2I\right)^{-1}K_{ba} \\
\end{align*} |
...
Conclusions
Prior and Posterior
The GP itself is your prior knowledge about the model. The resulting conditional distribution is the posterior.
Image Added
Model is Where the Tuning Happens
Tuning your model for the GP, i.e.
and is where you gain control over how your inferencing result behaves. For example, a stationary vs non-stationary kernel function typically induce very different behaviour in different parts of the domain.Variance Collapses Around Training Points
For simplicity, if you set
(noise free), assume in the model and for a single training data point , then working through the equations in Mathblock ref |
---|
anchor | posterior_mean_variance |
---|
|
shows everything cancelling out and leaving you with just and . Throwing the noise in changes things a little, but you still get the dominant collapse of variance around the training points.Characteristics of the Posterior
The mean in
Mathblock ref |
---|
anchor | posterior_mean_variance |
---|
|
can be viewed either as 1) a linear combination of the observations or 2) a linear combination of the kernel functions centred on training data points (elements of ).The variance can also be intuitively interpreted. It is simply the prior,
with a positive term subtracted due to information from the observations.Gaussian Process vs Bayesian Regression
Guassian Process regression utilises kernels, not basis functions. However both can be shown to be equivalent for given choice of basis functions/kernels.. I rather like guassian processes for the ease of implementation.