Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

See the slides for more examples.

Getting a Handle

NB: Any specific action has a probability transition matrix of it's own, 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\mathcal{P}^a_{ss'}
 - this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

The Optimal State Value Function is the maximum state value function over all policies, i.e. 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
bodyv_*(s) = max_{\pi} v_{\pi}(s)
. Similar definition for action value functions.