Resources

Predictive distribution for linear regression - mathematical monk@youtube breaking down the equation for the predictive distribution (also part II, part III and part IV).

Problem Definition

Basically the same as in the ML version of the linear regression problem except that we introduce knowledge about the weights (i.e. prior).

Introduce a prior belief about the weights, (can easily be generalised to non-zero mean).
Use MAP to generate the posterior and use this to determine the predictive distribution.

Likelihood

As before for training data , we have the likelihood given by:

Posterior Distribution

The goal here is to find the gaussian distribution that corresponds to the posterior, so we substitute for the above and move out constants, worrying just whether it is proportional to a required gaussian exponential form. Firstly, the denominator can be moved out as it is independent of . This leads to:

where we've completed the squares to get the required form and moved non-w dependent terms out (proportionality). Here and .

Hence the posterior distribution is guassian:

(1)

Predictive Distribution

The predictive distribution can be computed by averaging over the weights distribution.

Note here some simplifications in the syntax was made: 1) the new data point is independent from previous data points and 2) the weights distribution is independent of any new data (estimated as above on the training data only).

Representing the probabilistic functions by their gaussian expressions:

Here the first expression comes from our model of the system - it is simply a representation of the distribution of around any point (training data or otherwise). The second expression is simply the posterior distribution calculated above.

Goal

At this point our goal is to substitute the exponential terms and try to rearrange things so that we have

where we have used the fact that the integral of a guassian is always in the second line and established that the function is itself gaussian.

Details

This gets really messy, but basically just involves completing the squares and using some linear algebraic tricks to get the solution. In almost every documentation, this really tedious process is ignored. A great reference for it however is given by mathematicalmonk over several youtube videos (part I, part II, part III and part IV).

Result

The resulting predictive distribution is given by

(2)

This is quite intuitive. The mean of the predictive distribution is just multiplied by the mean of the posterior distribution for the weights .

Technical Notes

Linear Regression

Resources

Problem Definition

Likelihood

Posterior Distribution

Predictive Distribution