See the slides for more examples.

Getting a Handle

NB: Any specific action has a probability transition matrix of it's own,

Mathinline

host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body	\mathcal{P}^a_{ss'}

- this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

The Optimal State Value Function is the maximum state value function over all policies, i.e.

Mathinline

host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body	v_*(s) = max_{\pi} v_{\pi}(s)

. Similar definition for action value functions.

Versions Compared