Skip to main content

VLA Systems References

This page contains references and resources for the Vision-Language-Action (VLA) systems covered in Module 4.

Academic Papers

  • Do as I Can, Not as I Say: Grounding Language in Robotic Affordances, Collaborative work on instruction-following robots
  • RT-1: Robotics Transformer for Real-World Control at Scale, DeepMind paper on transformer-based robotic control
  • SayTap: Language-guided Quadrupedal Locomotion, Research on language-guided robot movement
  • PaLM-E: An Embodied Multimodal Language Model, Google's work on embodied language models
  • CoRAL: Compositional Robot Ability Learning from Large-Scale Human Demonstrations, Paper on learning robot abilities
  • Language Models as Zero-Shot Planners, Research on using LLMs for robotic planning
  • Manipulation with Language Models, Study on language-guided manipulation tasks

Books and Surveys

  • Robotics and AI: A Survey of Current Techniques - Comprehensive overview of modern robotics
  • Embodied Intelligence: From Action to Cognition - Book on embodied AI principles
  • Human-Robot Interaction: A Survey - Academic survey of HRI research

Technical Documentation

Online Resources

Tutorials and Courses

  • MIT Introduction to Robotics - Course materials on robot perception and control
  • Stanford CS320: Robotic Manipulation - Course on manipulation with deep learning
  • Berkeley CS287: Advanced Robotics - Graduate course on advanced robotics topics
  • ETH Zurich Robotic Systems Lab - Educational materials on legged locomotion

Tools and Frameworks