Computationally explosive -
Mathinline
host 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body O(n^3)
, resort to dynamic programming and other methods to solve for large MDP's

Markov Decision Processes

Markov Decision Process

Mathinline

host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body	<\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma>

- includes actions, which condition the probability transition matrix and reward function.

NB: Any specific action has a probability transition matrix of it's own,

Mathinline

host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body	\mathcal{P}^a_{ss'}

- this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.

Image Added

Policies and Value Functions

Mathinline

host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4
body	\pi(a\|s) = \mathbb{P}(A_t = a \| S_t = s)

...

See the slides for more examples.

...

The Optimal State Value Function is the maximum state value function over all policies, i.e.

...

Image Removed

Mathinline

body
host	5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4

...

v_*(s) = max_{\pi} v_{\pi}(s)

. Similar definition for action value functions.