Reinforcement Learning11 Stanford CS234 Lecture 5 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 We need to be able to generalize from our experience to make “good decisions” Value Function Approximation(VFA) from now on, we will represent $(s,a)$ value function with parameterized function input would be state or state-action pair, output would be value in any kinds. parameter $w$ here would a vector in simple terms such as DN.. 2022. 8. 8. Stanford CS234 Lecture 4 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 →We evaluated policy in model-free situation last time How can an agent start making good decisions when it doen’t know how the world works: How do we make a “good decision”? Learning to Control Invovles... Optimization : we want maximal expected rewards Delayed Consequences : may take time to realize wheter previous action aws goo.. 2022. 8. 5. Stanford CS234 Lecture 3 Lecture 3 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 recap MDP evaluation of Dynamic Programming Dynamic Programming case where we know exact model (not model free) Initialize $V_0 (s)=0$ for all s for k = 1 until convergence for all $s$ in $S$ $V^\pi_k(s)=r(s,\pi(s)) + \gamma \sum_{s' \in S} P(s'|s,\pi(s))V^\pi_{k-1}(s')$ and we iterate until it converges → $||.. 2022. 8. 5. Stanford CS234 Lecture2 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 Given the model of the world Markov Property → stochastic process evolving over time(whether or not I investi stocks, stock market changes) Markov Chain sequence of random states with Markov property no rewards, no actions Let $S$ be set of states ($s \in S$) and $P$ a transition model that specifies $P(s_{t+1}=s'|s_t=s)$ for finit.. 2022. 8. 5. Stanford CS234 Lecture 1 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 What is Reinforcement Learning(RL) How an intelligent agent learns to make good sequences of decisions according to repeated interactions with World Key aspects of RL Optimization → goal is to find an optimal way to make decisions! Delayed consequences → decisions now can impact future situations... Exploration→ only get censored d.. 2022. 8. 4. 이전 1 2 다음 반응형