...
- Computationally explosive -
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | O(n^3) |
---|
|
, resort to dynamic programming and other methods to solve for large MDP's
Markov Decision Processes
Markov Decision Process Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | <\mathcal{S}, \mathcal{A}, \mathcal{P}, \mathcal{R}, \gamma> |
---|
|
- includes actions, which condition the probability transition matrix and reward function.
NB: Any specific action has a probability transition matrix of it's own,
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | \mathcal{P}^a_{ss'} |
---|
|
- this means an action can probabilistically get you to more than one other state. See the result of the 'Pub' action below.Image Added
Policies and Value Functions
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body | \pi(a|s) = \mathbb{P}(A_t = a | S_t = s) |
---|
|
...
See the slides for more examples.
Getting a Handle
...
The Optimal State Value Function is the maximum state value function over all policies, i.e.
...
Image Removed
Mathinline |
---|
host | 5cf3c9eb-f97f-3ae9-acb7-6704dfd8f9e4 |
---|
body |
---|
|
...
v_*(s) = max_{\pi} v_{\pi}(s) |
|
. Similar definition for action value functions.