Linear Regression
Resources
- Predictive distribution for linear regression - mathematical monk@youtube breaking down the equation for the predictive distribution (also part II, part III and part IV).
Problem Definition
Basically the same as in the ML version of the linear regression problem except that we introduce knowledge about the weights (i.e. prior).
- Introduce a prior belief about the weights, (can easily be generalised to non-zero mean).
- Use MAP to generate the posterior and use this to determine the predictive distribution.
Likelihood
As before for training data , we have the likelihood given by:
Posterior Distribution
The goal here is to find the gaussian distribution that corresponds to the posterior, so we substitute for the above and move out constants, worrying just whether it is proportional to a required gaussian exponential form. Firstly, the denominator can be moved out as it is independent of . This leads to:
where we've completed the squares to get the required form and moved non-w dependent terms out (proportionality). Here and .
Hence the posterior distribution is guassian:
(1) |
Predictive Distribution
The predictive distribution can be computed by averaging over the weights distribution.
Note here some simplifications in the syntax was made: 1) the new data point is independent from previous data points and 2) the weights distribution is independent of any new data (estimated as above on the training data only).
Representing the probabilistic functions by their gaussian expressions:
Here the first expression comes from our model of the system - it is simply a representation of the distribution of around any point (training data or otherwise). The second expression is simply the posterior distribution calculated above.
Goal
At this point our goal is to substitute the exponential terms and try to rearrange things so that we have
where we have used the fact that the integral of a guassian is always in the second line and established that the function is itself gaussian.
Details
This gets really messy, but basically just involves completing the squares and using some linear algebraic tricks to get the solution. In almost every documentation, this really tedious process is ignored. A great reference for it however is given by mathematicalmonk over several youtube videos (part I, part II, part III and part IV).
Result
The resulting predictive distribution is given by
(2) |
This is quite intuitive. The mean of the predictive distribution is just multiplied by the mean of the posterior distribution for the weights .