본문 바로가기

ML Study/Stanford CS234: Reinforcement Learning10

Stanford CS234 Lecture 10 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 10 continuing our discussion over Updating Parameters Given the Gradient Local Approximation we couldn’t calculate equation above because we had no clue of what $\tilde{\pi}$ is. So for approximation, we replace the term with previous policy. we take policy $\pi^i$ run it out, get $D$ trajectories and use them to obtain distribution .. 2022. 8. 12.
Stanford CS234 Lecture 9 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 9 Continuing discussion about gradient descent, recall that our goal is to converge ASAP to a local optima. We want our policy update to be a monotonic improvement. → guarantees to converge (emphirical) → we simply don’t want to get fired... Recall last time, we expressed gradient of value function as below this term is unbiased but .. 2022. 8. 11.
Stanford CS234 Lecture 8 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 Policy-Based Reinforcement Learning Recall last lecture → where we learned to find state-value($V$) or state-action value($Q$) for parameter $w$(or $\theta$) and use such to build (hopefully)optimal policy($\pi$) Today we’ll take that policy $\pi^\theta$ as parameter $$ \pi^\theta(s,a)=P[a|s;\theta] $$ our goal is to find a policy .. 2022. 8. 11.
Stanford CS234 Lecture 7 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 Imitation Learning there are occasions where rewards are dense in time or each iteration is super expensive → autonomous driving kind of stuff So we summon an expert to demonstrate trajectories Our problem Setup we will talk about three methods below and their goal are... Behavoiral Cloning : learn directly from teacher’s policy In.. 2022. 8. 9.
Stanford CS234 Lecture 6 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 6 We will combine Nueral Network(NN) features on RL Basic Deep Neural Network DNN is linear neural network structure with more than three hidden layers of functional operators which are differentiable. Benefits of using DNN are as below DNN is universal function approximator Requires less nodes/parameters to represent same function U.. 2022. 8. 9.
Stanford CS234 Lecture 5 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 We need to be able to generalize from our experience to make “good decisions” Value Function Approximation(VFA) from now on, we will represent $(s,a)$ value function with parameterized function input would be state or state-action pair, output would be value in any kinds. parameter $w$ here would a vector in simple terms such as DN.. 2022. 8. 8.
반응형