Monocular Real-time Full Body Capture with Inter-part Correlations

Published in CVPR, 2021

Yuxiao Zhou1   Marc Habermann2   Ikhsanul Habibie2   Ayush Tewari2   Christian Theobalt2   Feng Xu1  
1Tsinghua University   2Max Planck Institute for Informatics, Saarland Informatics Campus  

We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image. Our approach uses a new neural network architecture that exploits correlations between body and hands at high computational efficiency. Unlike previous works, our approach is jointly trained on multiple datasets focusing on hand, body or face separately, without requiring data where all the parts are annotated at the same time, which is much more difficult to create at sufficient variety. The possibility of such multi-dataset training enables superior generalization ability. In contrast to earlier monocular full body methods, our approach captures more expressive 3D face geometry and color by estimating the shape, expression, albedo and illumination parameters of a statistical face model. Our method achieves competitive accuracy on public benchmarks, while being significantly faster and providing more complete face reconstructions.

[paper] [supplementary document] [code to be released]


  author = {Zhou, Yuxiao and Habermann, Marc and Habibie, Ikhsanul and Tewari, Ayush and Theobalt, Christian and Xu, Feng},
  title = {Monocular Real-time Full Body Capture with Inter-part Correlations},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2021}