Imitation learning empowers artificial agents to mimic behavior by learning from demonstrations. Recently, diffusion models, which have the ability to model high-dimensional and multimodal distributions, have shown impressive performance on imitation learning tasks. These models learn to shape a policy by diffusing actions (or states) from standard Gaussian noise (Fig.1). However, the target policy to be learned is often significantly different from Gaussian and this mismatch can result in poor performance when using a small number of diffusion steps (to improve inference speed) and under limited data.

Fig 1. Examples of current diffusion-based imitation learning which generates actions from random Gaussian noise [1].


Fig 2. Overview of action generation with BRIDGER. With trained velocity b and score s functions, BRIDGER transports the actions from source distribution to the target distribution via the forward SDE.


The key idea in this work is that initiating from a more informative source than Gaussian enables diffusion methods to mitigate the above limitations. We contribute both theoretical results, a new method, and empirical findings that show the benefits of using an informative source policy. Our method, which we call BRIDGER, leverages the stochastic interpolants framework to bridge arbitrary policies (Fig.2), thus enabling a flexible approach towards imitation learning. It generalizes prior work in that standard Gaussians can still be applied, but other source policies can be used if available.

Fig 3. Experimental Domains. (A)-(D) Adroit tasks that involve the control of a 24-degree-of-freedom robot hand to accomplish four specific tasks: (A) Door: opening a door, (B) Relocate: moving a ball to a target position, (C) Hammer: driving a nail into a board, and (D) Pen: aligning a pen with a target orientation. (E) Franka Kitchen includes 7 objects available for interaction and the aim is to accomplish 4 subtasks: opening the microwave, relocating the kettle, flipping the light switch, and sliding open the cabinet door, with arbitrary order. (F) The goal of 6-DoF Grasp-Pose generation is to generate grasp poses capable of successfully picking up an object.


In experiments on challenging simulation benchmarks (Fig.3), BRIDGER outperforms state-of-the-art diffusion policies (Fig.4 and 5).

Fig 4. Average task performance on Adroit (success rate) and Franka Kitchen (number of successful sub-tasks). Best scores in bold. We compare BRIDGER against state-of-the-art methods under a different number of diffusion steps when trained with the Large dataset. BRIDGER with $k = 0$ indicates the source policy. BRIDGER generally outperforms the competing methods.


Fig 5. Success rate (averaged over 100 grasps on ten test objects). BRIDGER significantly outperforms DDIM and Residual Policy across the number of diffusion steps. Compared to SE3, BRIDGER achieve higher success rate when the number of diffusion steps is small.


Fig 6. (Left A) Real-world Grasping using a Panda arm with a two-finger gripper. Observations were point clouds obtained from the RealSense Camera on the robot (Left B) Test objects used in our experiments (unseen during training). (Left C) Grasp samples from the competing models on three objects (20 diffusion steps). (Right A) Real-world Synthetic Wound Cleaning using a UR5e with Shadow Dextrous Hand Lite. (Right B) Demonstrations consisted of moving the hand to the sponge from an initial position, grasping it, then manipulating the sponge to wipe off the mark. The robot had to learn an action policy conditioned upon RGB images from the RealSense Camera and its joint angles (both arm and hand).


Similar positive results were observed on real-world experiments using two robots: a Franka Emika Panda arm with a two-finger gripper for stable grasping, and a UR5e equipped with a Shadow Dexterous Hand Lite for synthetic wound cleaning. These tasks involved real-world high-dimensional observations (e.g., point clouds and images) and complex actions (22 action dimensions per time-step for the wound cleaning task) (Fig.6). Compared to state-of-the-art diffusion policy, BRIDGER generates more smooth action trajectories (see the video at the top) and achieves higher task success rate (Fig. 7 and 8).

Fig 7.Real-World Grasping Success rate (averaged over 10 grasps on 10 test objects).


Fig 8. Normalized cleaned area (averaged over 9 positions) for the Cleaning Task.


Code

Code for reproducing experiments presented in this paper can be found at this Github repo.

Citation

If you find our code or the ideas presented in our paper useful for your research, consider citing our paper.

Kaiqi Chen★, Eugene Lim★ , Kelvin Lin★, Yiyang Chen★, and Harold Soh★. “Don’t Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion” 2024 Robotics-Science and Systems (RSS 2024).*

@article{chen2024behavioral,
  title={Don’t Start from Scratch: Behavioral Refinement via Interpolant-based Policy Diffusion},
  author={Chen, Kaiqi and Lim, Eugene and Lin, Kelvin and Chen, Yiyang and Soh, Harold},
  journal={arXiv preprint arXiv:2402.16075},
  year={2024}
}

Contact

If you have questions or comments, please contact Kaiqi Chen or Harold.

References

[1] Cheng Chi, Siyuan Feng, Yilun Du, Zhenjia Xu, Eric Cousineau, Benjamin Burchfiel, and Shuran Song. Diffusion policy: Visuomotor policy learning via action diffusion. arXiv preprint arXiv:2303.04137, 2023.

Written by

Kaiqi Chen