Three-dimensional human pose and shape estimation is to compute a full human 3D mesh given a single image. The contamination of features caused by occlusion usually degrades its performance significantly. Recent progress in this field typically addressed the occlusion problem implicitly. By contrast, in this paper, we address it explicitly using a simple yet effective de-occlusion multi-task learning network. Our key insight is that feature for mesh parameter regression should be noiseless. Thus, in the feature space, our method disentangles the occludee that represents the noiseless human feature from the occluder. Specifically, a spatial regularization and an attention mechanism are imposed in the backbone of our network to disentangle the features into different channels. Furthermore, two segmentation tasks are proposed to supervise the de-occlusion process. The final mesh model is regressed by the disentangled occlusion-aware features. Experiments on both occlusion and non-occlusion datasets are conducted, and the results prove that our method is superior to the state-of-the-art methods on two occlusion datasets, while achieving competitive performance on a non-occlusion dataset. We also demonstrate that the proposed de-occlusion strategy is the main factor to improve the robustness against occlusion. The code is available at https://github.com/qihangran/De-occlusion_MTL_HMR. © 2023
Funding: This work is supported by the National Natural Science Foundation of China (Grant No. 61901436).