Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation

Arxiv 2025

*Equal Contribution 1ETH Zürich 2Technical University of Munich

Actron3D acquires transferable manipulation skills from a few RGB-only human videos using Neural Affordance Functions, enabling zero-shot generalization to novel instances / viewpoints with high sample efficiency.

Abstract

We present Actron3D, a framework that enables robots to acquire transferable 6-DoF manipulation skills from just a few monocular, uncalibrated, RGB-only human videos. At its core lies the Neural Affordance Function, a compact object-centric representation that distills actionable cues from diverse uncalibrated videos-geometry, visual appearance, and affordance-into a lightweight neural network, forming a memory bank of manipulation skills. During deployment, we adopt a pipeline that retrieves relevant affordance functions and transfers precise 6-DoF manipulation policies via coarse-to-fine optimization, enabled by continuous queries to the multimodal features encoded in the neural functions. Experiments in both simulation and the real world demonstrate that Actron3D significantly outperforms prior methods, achieving a 14.9 per- centage point improvement in average success rate across 13 tasks while requiring only 2-3 demonstration videos per task.

Video

Experiment Results

Quantitative and Qualitative Results in Real-world Settings

When evaluated in real-world settings, our method not only outperforms baseline approaches by a significant margin but also learns more complex and dexterous manipulation policies.

Realworld Experiment Results


Zero-shot Deployment in Unseen Environments

Our distilled Neural Affordance Functions can be seamlessly transferred to unseen instances and viewpoints for daily manipulation tasks. We first demonstrate the transfer process in detail.

Further more realworld manipulation experiments.

Long-horizon Tasks

Our pipeline can be seamlessly integrated into LLM-based task planning frameworks for long-horizon tasks.

BibTeX


@misc{zhang2025actron3d,
  title={Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation}, 
  author={Anran Zhang and Hanzhi Chen and Yannick Burkhardt and Yao Zhong and Johannes Betz and Helen Oleynikova and Stefan Leutenegger},
  year={2025},
  eprint={2510.12971},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  note={arXiv:2510.12971}
}