본문 바로가기

강화학습10

GNFactor 논문리뷰(GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields) 일단 들어가기 앞서 해당 논문은 이전에 포스팅한 PerAct의 후속 논문이다. 그거 읽고 와야 이해가 편하다!https://maltese-rocks.tistory.com/63 PerAct 논문 리뷰(PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic Manipulation)PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic ManipulationPerceiver-Actor: A Multi-Task Transformer for Robotic Manipulation기본적으로 알면 이해에 도움이 되는 TransformerAttention Is All You Need💡 **Keywords:** Transformers, L.. 2024. 7. 26.
Stanford CS234 Lecture 10 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 10 continuing our discussion over Updating Parameters Given the Gradient Local Approximation we couldn’t calculate equation above because we had no clue of what $\tilde{\pi}$ is. So for approximation, we replace the term with previous policy. we take policy $\pi^i$ run it out, get $D$ trajectories and use them to obtain distribution .. 2022. 8. 12.
Stanford CS234 Lecture 9 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 9 Continuing discussion about gradient descent, recall that our goal is to converge ASAP to a local optima. We want our policy update to be a monotonic improvement. → guarantees to converge (emphirical) → we simply don’t want to get fired... Recall last time, we expressed gradient of value function as below this term is unbiased but .. 2022. 8. 11.
Stanford CS234 Lecture 8 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 Policy-Based Reinforcement Learning Recall last lecture → where we learned to find state-value($V$) or state-action value($Q$) for parameter $w$(or $\theta$) and use such to build (hopefully)optimal policy($\pi$) Today we’ll take that policy $\pi^\theta$ as parameter $$ \pi^\theta(s,a)=P[a|s;\theta] $$ our goal is to find a policy .. 2022. 8. 11.
Stanford CS234 Lecture 7 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 Imitation Learning there are occasions where rewards are dense in time or each iteration is super expensive → autonomous driving kind of stuff So we summon an expert to demonstrate trajectories Our problem Setup we will talk about three methods below and their goal are... Behavoiral Cloning : learn directly from teacher’s policy In.. 2022. 8. 9.
Stanford CS234 Lecture 5 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 5 We need to be able to generalize from our experience to make “good decisions” Value Function Approximation(VFA) from now on, we will represent $(s,a)$ value function with parameterized function input would be state or state-action pair, output would be value in any kinds. parameter $w$ here would a vector in simple terms such as DN.. 2022. 8. 8.
반응형