Stanford7 Stanford CS234 Lecture 3 Lecture 3 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 recap MDP evaluation of Dynamic Programming Dynamic Programming case where we know exact model (not model free) Initialize $V_0 (s)=0$ for all s for k = 1 until convergence for all $s$ in $S$ $V^\pi_k(s)=r(s,\pi(s)) + \gamma \sum_{s' \in S} P(s'|s,\pi(s))V^\pi_{k-1}(s')$ and we iterate until it converges → $||.. 2022. 8. 5. 이전 1 2 다음 반응형