...
- Computationally explosive -
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | O(n^3) |
---|
|
, resort to dynamic programming and other methods to solve for large MDP's
Markov Decision Processes
Markov Decision Process Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | <\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma> |
---|
|
- includes actions, which condition the probability transition matrix and reward function.
NB: Any specific action has a probability transition matrix of it's own,
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | \mathcal{P}^a_{ss'} |
---|
|
- this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.Image Added
Policies and Value Functions
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | \pi(a|s) = \mathbb{P}(A_t = a | S_t = s) |
---|
|
...
See the slides for more examples.
NB: Any specific action has a probability transition matrix of it's own,
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | \mathcal{P}^a_{ss'} |
---|
|
- this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.Image Removed
The Optimal State Value Function is the maximum state value function over all policies, i.e.
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | v_*(s) = max_{\pi} v_{\pi}(s) |
---|
|
. Similar definition for action value functions.