본문 바로가기

강화학습10

Stanford CS234 Lecture 4 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 4 →We evaluated policy in model-free situation last time How can an agent start making good decisions when it doen’t know how the world works: How do we make a “good decision”? Learning to Control Invovles... Optimization : we want maximal expected rewards Delayed Consequences : may take time to realize wheter previous action aws goo.. 2022. 8. 5.

Stanford CS234 Lecture 3 Lecture 3 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 recap MDP evaluation of Dynamic Programming Dynamic Programming case where we know exact model (not model free) Initialize

V_{0} (s) = 0

for all s for k = 1 until convergence for all

s

S

Misplaced &

and we iterate until it converges → $||.. 2022. 8. 5.

Stanford CS234 Lecture2 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 2 Given the model of the world Markov Property → stochastic process evolving over time(whether or not I investi stocks, stock market changes) Markov Chain sequence of random states with Markov property no rewards, no actions Let

S

be set of states (

s \in S

P

a transition model that specifies

P (s_{t + 1} = s^{'} | s_{t} = s)

for finit.. 2022. 8. 5.

Stanford CS234 Lecture 1 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 What is Reinforcement Learning(RL) How an intelligent agent learns to make good sequences of decisions according to repeated interactions with World Key aspects of RL Optimization → goal is to find an optimal way to make decisions! Delayed consequences → decisions now can impact future situations... Exploration→ only get censored d.. 2022. 8. 4.

이전 1 2 다음

티스토리툴바