Dexterous Manipulation with Touch Sensing

Stanford IPRL Lab | May 2025 - Present Advisor: Professor Jeannette Bohg

Overview

This project focuses on enhancing vision-language-action (VLA) models with tactile and force-torque sensing to enable high-precision manipulation tasks, specifically targeting the challenging NIST board assembly and disassembly tasks.

Key Contributions

Residual Policy with Multimodal Feedback: Augmenting VLA models with a residual policy that leverages force-torque and tactile feedback to achieve sub-millimeter precision required for peg insertion tasks.
Slow-Fast Multimodal Networks: Developing architectures that process data streams at different frequencies—vision at lower rates, tactile/force feedback at higher rates—to enhance real-time reactivity while maintaining computational efficiency.
Precision Assembly Tasks: Applying the approach to NIST board tasks which require precise alignment and insertion, serving as a benchmark for evaluating contact-rich manipulation.

Impact

This research addresses a critical gap in robotic manipulation: the integration of multiple sensory modalities at appropriate temporal resolutions. The insights from this work can enable robots to perform delicate assembly tasks in manufacturing and other precision-demanding applications.