Single Depth View Based Real-time Reconstruction of Hand-objectInteractions

Published in ToG | SIGGRAPH, 2021

Hao Zhang   Yuxiao Zhou   Yifei Tian   Jun-Hai Yong   Feng Xu  
Tsinghua University

Reconstructing hand-object interactions is a challenging task due to strong occlusions and complex motions. This paper proposes a real-time system that uses a single depth stream to simultaneously reconstruct hand poses, object shape, and rigid/non-rigid motions. To achieve this, we first train a joint learning network to segment hand and object in a depth image, and to predict the 3D keypoints of the hand. With most layers shared by the two tasks, computation cost is saved for real-time performance. A hybrid dataset is constructed here to train the network with real data (to learn real-world distributions) and synthetic data (to cover variations of objects, motions, and viewpoints). Next, the depth of the two targets and the keypoints are used in a uniform optimization to reconstruct the interacting motions. Benefited from a novel tangential contact constraint, the system not only solves the remaining ambiguities but also keeps the real-time performance. Experiments show that our system handles different hands and object shapes, various interactive motions, and moving camera.

[paper] [video]