강화학습10 Stanford CS234 Lecture 4 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 →We evaluated policy in model-free situation last time How can an agent start making good decisions when it doen’t know how the world works: How do we make a “good decision”? Learning to Control Invovles... Optimization : we want maximal expected rewards Delayed Consequences : may take time to realize wheter previous action aws goo.. 2022. 8. 5. Stanford CS234 Lecture 3 Lecture 3 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 recap MDP evaluation of Dynamic Programming Dynamic Programming case where we know exact model (not model free) Initialize V0(s)=0 for all s for k = 1 until convergence for all s in S Misplaced &Misplaced & and we iterate until it converges → $||.. 2022. 8. 5. Stanford CS234 Lecture2 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 Given the model of the world Markov Property → stochastic process evolving over time(whether or not I investi stocks, stock market changes) Markov Chain sequence of random states with Markov property no rewards, no actions Let S be set of states (s∈S) and P a transition model that specifies P(st+1=s′|st=s) for finit.. 2022. 8. 5. Stanford CS234 Lecture 1 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 What is Reinforcement Learning(RL) How an intelligent agent learns to make good sequences of decisions according to repeated interactions with World Key aspects of RL Optimization → goal is to find an optimal way to make decisions! Delayed consequences → decisions now can impact future situations... Exploration→ only get censored d.. 2022. 8. 4. 이전 1 2 다음 반응형