Video 2 Robot

Motion Imitation from Videos for Humanoid Robots

💼 Work done at Infosys

Abstract

This project implements an end-to-end pipeline for converting video sequences of human motions into executable robot trajectories for humanoid robots. Inspired by SLoMo and VideoMimic, the system extracts human pose from videos using 3D pose estimation, retargets the motion to robot joints using motion imitation techniques, and deploys policies on real hardware using RoboJuDo. The motion imitation training is performed in mjlab (Isaac Lab API with MuJoCo-Warp), enabling efficient GPU-accelerated policy learning.

End-to-End Pipeline

Video → 3D Pose → Motion Retargeting → Policy Training → Hardware Deployment

Multi-Motion Support

Demonstrated on Bharatnatyam dance, pick & place, and painting tasks

Real-World Deployment

Deployed on Unitree G1 humanoid using RoboJuDo framework

Demonstration Videos

Below are demonstrations of the motion imitation system on various tasks. The top row shows reference videos from which motions are extracted, and the bottom row shows the resulting robot motions in simulation.

Bharatnatyam Dance - Reference Motion

Bharatnatyam Dance - Robot Execution (mjlab)

Pick & Place Motion

Painting Motion

Method

The pipeline consists of three main stages:

1. Video Input

→

2. Pose Estimation

→

3. Motion Retargeting

→

4. Policy Training (mjlab)

→

5. Deployment (RoboJuDo)

Pose Extraction

Extract 3D human pose from monocular videos using state-of-the-art pose estimation models (similar to PromptHMR, VideoMimic approaches)

mjlab Training

Use mjlab's motion imitation framework to train tracking policies. Reference motions are preprocessed and converted to NPZ format for GPU-accelerated training

RoboJuDo Deployment

Export trained policies to ONNX and deploy using RoboJuDo's motion tracking controller for real robot execution

References

video2robot

End-to-end pipeline converting generative videos (Veo, Sora) to humanoid robot motions. Provides video to robot motion conversion with pose extraction and retargeting.

VideoMimic

CoRL 2025 Best Student Paper. Visual Imitation Enables Contextual Humanoid Control. Real-to-sim pipeline for motion capture and humanoid policy training.

SLoMo

A General System for Legged Robot Motion Imitation from Casual Videos. Converts in-the-wild videos to robot motion primitives.

mjlab

Isaac Lab API powered by MuJoCo-Warp for RL and robotics research. Provides GPU-accelerated motion imitation training pipeline.

RoboJuDo

Plug-and-play deployment framework for humanoid robots. Supports multiple policies and enables real hardware execution.