HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu1    Xiaohang Zhan2    Jiaxiang Tang3   
Ying Shan2    Gang Zeng3    Dahua Lin1    Xihui Liu4    Ziwei Liu5
1CUHK    2Tencent AI Lab    3PKU    4HKU    5NTU
Avatar Gallery
Demo Video
Abstract
Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios.
We propose HumanGaussian, an efficient yet effective framework that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our method adapts 3D Gaussian Splatting into text-driven 3D human generation with novel designs.
Framework Overview
Overview of the proposed HumanGaussian Framework. We generate high-quality 3D humans from text prompts with the neural representation of 3D Gaussian Splatting (3DGS). In Structure-Aware SDS, we start from the SMPL-X prior to densely sample Gaussians on the human mesh surface as initial center positions. Then, a Texture-Structure Joint Model is trained to simultaneously denoise the image x and depth d conditioned on pose skeleton p. Based on this, we design a dual-branch SDS to jointly optimize human appearance and geometry, where the 3DGS density is adaptively controlled by distilling from both the RGB and depth space. In Annealed Negative Prompt Guidance, we use the cleaner classifier score with an annealed negative score to regularize the stochastic SDS gradient of high variance. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness.
Qualitative Comparisons
Visual Comparisons with Text-to-3D and 3D Human Models. We compare with recent state-of-the-art baselines on five different prompts, each showing two camera views. Note that the textural unrealism and blurriness are highlighted with yellow arrows; the geometric artifacts are highlighted with green rectangles. Please kindly zoom in for best view and refer to demo video for more results.
Ablation Study
Ablation Studies on HumanGaussian Module Design. We present generation results of the human frontal view under five ablation settings for better visualization and comparisons: (A) baseline; (B) +SMPL-X, Pose-Cond.; (C) +Neg. Guidance, CFG=7.5; (D) +Dual-Branch SDS; (E) +Size-based Prune. The detailed ablation setting designs and result analysis are elaborated in Sec.4.3.
Zero-Shot Animation
Though the HumanGaussian framework is trained on a single body pose at the training stage, it can be animated with unseen pose sequences in a zero-shot manner, i.e., we can use a sequence of SMPL-X pose parameters to animate the pre-trained avatars w/o further finetuning.
BibTeX
  @article{liu2023humangaussian,
    title={HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting},
    author={Liu, Xian and Zhan, Xiaohang and Tang, Jiaxiang and Shan, Ying and Zeng, Gang and Lin, Dahua and Liu, Xihui and Liu, Ziwei},
    journal={arXiv preprint arXiv:2311.17061},
    year={2023}
}
Related Work
Jiaxiang Tang et al. DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation. arXiv preprint arXiv:2309.16653, 2023.
Comment: The first work that adapts Gaussian Splatting to the Text-to-3D generation problem.
Xian Liu et al. HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion. arXiv preprint arXiv:2310.08579, 2023.
Comment: An in-the-wild human generation foundation model that simultaneously denoises the RGB, depth, and surface-normal to capture the joint distribution in a unified framework.