Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Computationally explosive -
    Mathinline
    host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
    bodyO(n^3)
    , resort to dynamic programming and other methods to solve for large MDP's

Markov Decision Processes

Markov Decision Process 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body<\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma>
 - includes actions, which condition the probability transition matrix and reward function.

NB: Any specific action has a probability transition matrix of it's own, 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\mathcal{P}^a_{ss'}
 - this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

Image Added

Policies and Value Functions

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\pi(a|s) = \mathbb{P}(A_t = a | S_t = s)

...

See the slides for more examples.

NB: Any specific action has a probability transition matrix of it's own, 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body\mathcal{P}^a_{ss'}
 - this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

Image Removed

The Optimal State Value Function is the maximum state value function over all policies, i.e. 

Mathinline
host5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
bodyv_*(s) = max_{\pi} v_{\pi}(s)
. Similar definition for action value functions.