DISCO: Language-Guided Manipulation with Diffusion Policies and Constrained Inpainting
DISCO uses vision-language models to guide diffusion policies with optimized keyframe inpainting, enabling superior zero-shot open-vocabulary robotic manipulation.