Action Hallucination in Generative Vision-Language-Action Models

A theoretical analysis of why generative VLAs produce physically infeasible actions, identifying topological, precision, and horizon barriers that impose unavoidable tradeoffs.

Action Hallucination in Generative Vision-Language-Action Models, Harold Soh★, Eugene Lim★, arXiv preprint
Links: Paper

action-hallucination

Robot Foundation Models, such as VLAs, promise end-to-end generative robot policies with broad generalization. Yet it remains unclear whether they fundamentally resolve the core problem of action generation in embodied settings, or overcome the long-standing challenges of robotics. We address this question by analyzing action hallucinations that violate physical constraints and their extension to plan-level failures.

Focusing on latent-variable generative policies, we show that hallucinations can arise from structural mismatches between feasible robot behavior and common model architectures. We study three such barriers—topological, precision, and horizon—and show how they impose unavoidable tradeoffs. Our analysis provides mechanistic explanations for reported empirical failures of generative robot policies and suggests principled directions for improving reliability and trustworthiness, without abandoning their expressive power.

Resources

You can find our paper here.

Citation

Please consider citing our paper if you build upon our results and ideas.

Harold Soh★, Eugene Lim★, “Action Hallucination in Generative Vision-Language-Action Models”, arXiv preprint

@misc{soh2026action, title={Action Hallucination in Generative Vision-Language-Action Models}, author={Soh, Harold and Lim, Eugene}, year={2026}, eprint={2602.06339}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2602.06339} }

Contact

If you have questions or comments, please contact Harold or Eugene.

Guided Streaming Stochastic Interpolant Policy

A principled inference-time guidance framework for streaming generative robot policies, enabling fast, reactive obstacle avoidance within the action chunk.

Oscar Puming Jiang 13 Jul 2026

learn VLA manipulation R:SS

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

A dual-arm manipulation framework that enables skill reuse—recomposing learned single-arm skills into novel left–right pairings to tackle combinatorial diversity.

Ce Hao 13 Jul 2026

learn generative ICML

CAR Guidance: Staying On-Manifold under Compositional Rewards

We introduce a plug-and-play module that corrects off-manifold drift when guiding flow models with multiple rewards at inference time.

Xuehui Yu 06 Jul 2026

Action Hallucination in Generative Vision-Language-Action Models

Resources

Citation

Contact

Harold Soh

Guided Streaming Stochastic Interpolant Policy

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

CAR Guidance: Staying On-Manifold under Compositional Rewards

CLeAR

Recent posts

Guided Streaming Stochastic Interpolant Policy

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

Menu

Action Hallucination in Generative Vision-Language-Action Models

Resources

Citation

Contact

Harold Soh

You may also like...

Guided Streaming Stochastic Interpolant Policy

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse

CAR Guidance: Staying On-Manifold under Compositional Rewards

Guided Streaming Stochastic Interpolant Policy

SkillVLA: Tackling Combinatorial Diversity in Dual-Arm Manipulation via Skill Reuse