본문 바로가기

ML Study17

PerAct 논문리뷰(PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic Manipulation) PERCEIVER-ACTOR: A Multi-Task Transformer for Robotic ManipulationPerceiver-Actor: A Multi-Task Transformer for Robotic Manipulation기본적으로 알면 이해에 도움이 되는 TransformerAttention Is All You Need💡 **Keywords:** Transformers, Language Grounding, Manipulation, Behavior CloningAbstract💡 “A language-conditioned BC agent that can learn to imitate a wide variety of 6-DoF manipulation tasks with just a few .. 2024. 7. 23.
Stanford CS234 Lecture 10 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 10 continuing our discussion over Updating Parameters Given the Gradient Local Approximation we couldn’t calculate equation above because we had no clue of what $\tilde{\pi}$ is. So for approximation, we replace the term with previous policy. we take policy $\pi^i$ run it out, get $D$ trajectories and use them to obtain distribution .. 2022. 8. 12.
Stanford CS234 Lecture 9 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 9 Continuing discussion about gradient descent, recall that our goal is to converge ASAP to a local optima. We want our policy update to be a monotonic improvement. → guarantees to converge (emphirical) → we simply don’t want to get fired... Recall last time, we expressed gradient of value function as below this term is unbiased but .. 2022. 8. 11.
Stanford CS234 Lecture 8 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 8 Policy-Based Reinforcement Learning Recall last lecture → where we learned to find state-value($V$) or state-action value($Q$) for parameter $w$(or $\theta$) and use such to build (hopefully)optimal policy($\pi$) Today we’ll take that policy $\pi^\theta$ as parameter $$ \pi^\theta(s,a)=P[a|s;\theta] $$ our goal is to find a policy .. 2022. 8. 11.
Stanford CS234 Lecture 7 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 7 Imitation Learning there are occasions where rewards are dense in time or each iteration is super expensive → autonomous driving kind of stuff So we summon an expert to demonstrate trajectories Our problem Setup we will talk about three methods below and their goal are... Behavoiral Cloning : learn directly from teacher’s policy In.. 2022. 8. 9.
Stanford CS234 Lecture 6 Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 6 We will combine Nueral Network(NN) features on RL Basic Deep Neural Network DNN is linear neural network structure with more than three hidden layers of functional operators which are differentiable. Benefits of using DNN are as below DNN is universal function approximator Requires less nodes/parameters to represent same function U.. 2022. 8. 9.
반응형