OpenVLA 7B
Vision-language-action model for robot control via images and text
... supports real-world robotics tasks, with robust generalization to environments seen in pretraining. Its actions include delta values for position, orientation, and gripper status, and can be un-normalized based on robot-specific statistics. OpenVLA is MIT-licensed, fully open-source, and designed collaboratively by Stanford, Berkeley, Google DeepMind, and TRI. Deployment is facilitated via Python and Hugging Face tools, with flash attention support for efficient inference.