Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Computationally explosive -
    Mathinline
    host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
    bodyO(n^3)
    , resort to dynamic programming and other methods to solve for large MDP's

Markov Decision Processes

Markov Decision Process 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body<\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma>
 - includes actions, which condition the probability transition matrix and reward function.

NB: Any specific action has a probability transition matrix of it's own, 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\mathcal{P}^a_{ss'}
 - this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

Image Added

Policies and Value Functions

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\pi(a|s) = \mathbb{P}(A_t = a | S_t = s)

...

See the slides for more examples.

Getting a Handle

...

The Optimal State Value Function is the maximum state value function over all policies, i.e. Image Removed

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body

...

v_*(s) = max_{\pi} v_{\pi}(s)
. Similar definition for action value functions.