Home /
Expert Answers /
Computer Science /
problem-2-20-points-consider-the-following-markov-decision-process-mdp-with-discount-factor-pa544
(Solved):
Problem 2. (20 points) Consider the following Markov Decision Process (MDP) with discount factor ...
Problem 2. (20 points) Consider the following Markov Decision Process (MDP) with discount factor ?=0.5, shown in Figure 2. Upper case letters A, B, C represent states; arcs represent state transitions; lower case letters ab,ba,bc,ca,cb represents actions; signed integers represent rewards; and fractions represent transition probabilities. 1. (5 points) Define the state-value function v?(s) for a discolunted MDP. 2. (5points) Write down the Bellman expectation equation for state-value functions. 3. (5 points) Consider the uniform random policy ?1?(s,a) that takes all actions from state s with equal probability. Starting with an initial value function of v1?(A)=v1?(B)=v1?(C)=2, apply one synchronous iteration of iterative policy evaluation (i.e., one backup for each state) to compute a new value function v2?(s). 4. (5 points) Apply one iteration of greedy policy improvement to compute a new, deterministic policy ?2?(s) Figure 2: ? MDP.