Linear Regression



Resources



Problem Definition

Basically the same as in the ML version of the linear regression problem except that we introduce knowledge about the weights (i.e. prior).

  • Introduce a prior belief about the weights,  (can easily be generalised to non-zero mean).
  • Use MAP to generate the posterior and use this to determine the predictive distribution.



Likelihood

As before for training data , we have the likelihood given by:



Posterior Distribution

The goal here is to find the gaussian distribution that corresponds to the posterior, so we substitute for the above and move out constants, worrying just whether it is proportional to a required gaussian exponential form. Firstly, the denominator can be moved out as it is independent of . This leads to:

where we've completed the squares to get the required form and moved non-w dependent terms out (proportionality). Here  and .

Hence the posterior distribution is guassian:

(1)



Predictive Distribution

The predictive distribution can be computed by averaging over the weights distribution.

Note here some simplifications in the syntax was made: 1) the new data point  is independent from previous data points  and 2) the weights distribution is independent of any new data (estimated as above on the training data only).

Representing the probabilistic functions by their gaussian expressions:

Here the first expression comes from our model of the system - it is simply a representation of the distribution of  around any point  (training data or otherwise). The second expression is simply the posterior distribution calculated above.

Goal

At this point our goal is to substitute the exponential terms and try to rearrange things so that we have

where we have used the fact that the integral of a guassian is always  in the second line and established that the function  is itself gaussian.

Details

This gets really messy, but basically just involves completing the squares and using some linear algebraic tricks to get the solution. In almost every documentation, this really tedious process is ignored. A great reference for it however is given by mathematicalmonk over several youtube videos (part Ipart IIpart III and part IV).

Result

The resulting predictive distribution is given by

(2)

This is quite intuitive. The mean of the predictive distribution is just  multiplied by the mean of the posterior distribution for the weights .