Dexterous Manipulation with Touch Sensing
High-precision peg assembly using multimodal feedback
Stanford IPRL Lab | May 2025 - Present Advisor: Professor Jeannette Bohg
Overview
This project focuses on enhancing vision-language-action (VLA) models with tactile and force-torque sensing to enable high-precision manipulation tasks, specifically targeting the challenging NIST board assembly and disassembly tasks.
Key Contributions
-
Residual Policy with Multimodal Feedback: Augmenting VLA models with a residual policy that leverages force-torque and tactile feedback to achieve sub-millimeter precision required for peg insertion tasks.
-
Slow-Fast Multimodal Networks: Developing architectures that process data streams at different frequencies—vision at lower rates, tactile/force feedback at higher rates—to enhance real-time reactivity while maintaining computational efficiency.
-
Precision Assembly Tasks: Applying the approach to NIST board tasks which require precise alignment and insertion, serving as a benchmark for evaluating contact-rich manipulation.
Impact
This research addresses a critical gap in robotic manipulation: the integration of multiple sensory modalities at appropriate temporal resolutions. The insights from this work can enable robots to perform delicate assembly tasks in manufacturing and other precision-demanding applications.