본문 바로가기

강화학습10

GNFactor 논문리뷰(GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields) 일단 들어가기 앞서 해당 논문은 이전에 포스팅한 PerAct의 후속 논문이다. 그거 읽고 와야 이해가 편하다!https://maltese-rocks.tistory.com/63 PerAct 논문 리뷰(PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic Manipulation)PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic ManipulationPerceiver-Actor: A Multi-Task Transformer for Robotic Manipulation기본적으로 알면 이해에 도움이 되는 TransformerAttention Is All You Need💡 **Keywords:** Transformers, L.. 2024. 7. 26.

Stanford CS234 Lecture 10 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 10 continuing our discussion over Updating Parameters Given the Gradient Local Approximation we couldn’t calculate equation above because we had no clue of what

\tilde{π}

is. So for approximation, we replace the term with previous policy. we take policy

π^{i}

run it out, get

D

trajectories and use them to obtain distribution .. 2022. 8. 12.

Stanford CS234 Lecture 9 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 9 Continuing discussion about gradient descent, recall that our goal is to converge ASAP to a local optima. We want our policy update to be a monotonic improvement. → guarantees to converge (emphirical) → we simply don’t want to get fired... Recall last time, we expressed gradient of value function as below this term is unbiased but .. 2022. 8. 11.

Stanford CS234 Lecture 8 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 Policy-Based Reinforcement Learning Recall last lecture → where we learned to find state-value(

V

) or state-action value(

Q

) for parameter

w

θ

) and use such to build (hopefully)optimal policy(

π

) Today we’ll take that policy

π^{θ}

π^{θ} (s, a) = P [a | s; θ]

our goal is to find a policy .. 2022. 8. 11.

Stanford CS234 Lecture 7 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 Imitation Learning there are occasions where rewards are dense in time or each iteration is super expensive → autonomous driving kind of stuff So we summon an expert to demonstrate trajectories Our problem Setup we will talk about three methods below and their goal are... Behavoiral Cloning : learn directly from teacher’s policy In.. 2022. 8. 9.

Stanford CS234 Lecture 5 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 We need to be able to generalize from our experience to make “good decisions” Value Function Approximation(VFA) from now on, we will represent

(s, a)

value function with parameterized function input would be state or state-action pair, output would be value in any kinds. parameter

w

here would a vector in simple terms such as DN.. 2022. 8. 8.

이전 1 2 다음

티스토리툴바