可驱动数字人重建与交互技术综述

高珮; 孙浩然; 黄继峰

doi:10.19483/j.cnki.11-4653/n.2025.06.012

可驱动数字人重建与交互技术综述

doi: 10.19483/j.cnki.11-4653/n.2025.06.012

中国工信出版传媒集团北京信通传媒有限责任公司,深圳大学数字创意研究中心,深圳大学电子与信息工程学院

详细信息

作者简介:
高珮（1988—），女，湖北咸宁，现就职于北京信通传媒有限责任公司，研究方向为新媒体与科技期刊建设；孙浩然（2004—），男，四川乐山，深圳大学电子信息工程拔尖创新实验班在读，研究方向为三维重建、数字人重建；黄继峰（2000—），男，江西萍乡，深圳大学数创意研究中心硕士生，研究方向为元宇宙、数字人重建。

计量
- 文章访问数: 1
- HTML全文浏览量: 0
- PDF下载量: 0
- 被引次数: 0
出版历程
- 网络出版日期: 2025-07-18
- 刊出日期: 2025-07-16

摘要

摘要: 【目的】随着信息技术发展及元宇宙兴起，高真实感、可交互三维数字人需求日增。本文旨在梳理可驱动数字人重建与交互的关键技术、成果及难题，为相关研究提供参考。【方法】本文首先阐述数字人技术背景、价值及其向深度学习驱动的转变。重点叙述三维数字人主流重建技术，包括传统方法及基于神经辐射场（NeRF）与三维高斯溅射（3DGS）的方法。随后梳理基于网格、NeRF、3DGS 等不同表征下的驱动技术，尤其关注语音驱动的进展。最后探讨实现实时、可定制化交互的关键环节及其难题，总结技术挑战并展望未来。【结果】分析表明，NeRF、3DGS 等新技术的应用显著提升了数字人重建的真实感与驱动的自然度。但在单目重建精度、动态细节捕捉、情感丰富度、个性化驱动灵活性、实时交互时延、多模态数据有效融合及高质量数据集构建等方面仍存诸多问题。【结论】可驱动数字人重建与交互技术正向更高保真度、更强交互性、更具表现力、更低延迟的方向发展。未来研究需持续攻克现有技术难题，加强对跨模态的理解与生成，提高个性化以及情感化的水平，并构建更完善的数据集与评价体系，以推动其在虚拟现实、新媒体等领域的广泛应用。
- 虚拟数字人 /
- 三维重建 /
- 语音驱动 /
- 实时交互 /
- 神经辐射场 /
- 三维高斯溅射

HTML全文

参考文献(17)

[1]	Goodfellow I，Pouget-Abadie J，Mirza M，et al. Generative adversarial networks[J]. Communications of the ACM，2020，63（11）：139-144.
[2]	Vaswani A. Attention is all you need[J]. Advances in Neural Information Processing Systems，2017.
[3]	Mildenhall B，Srinivasan P P，Tancik M，et al. Nerf：Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM，2021，65（1）：99-106.
[4]	Kerbl B，Kopanas G，Leimk hler T，et al. 3D Gaussian Splatting for Real-Time Radiance Field Rendering[J]. ACM Trans. Graph.，2023，42（4）：139：1-139：14.
[5]	Newcombe R A，Izadi S，Hilliges O，et al. Kinectfusion：Real-time dense surface mapping and tracking[C]//2011 10th IEEE international symposium on mixed and augmented reality. Ieee，2011：127-136.
[6]	Loper M，Mahmood N，Romero J，et al. SMPL：A skinned multi-person linear model[M]//Seminal Graphics Papers：Pushing the Boundaries，Volume 2. 2023：851-866.
[7]	Blanz V，Vetter T. A morphable model for the synthesis of 3D faces[M]//Seminal Graphics Papers：Pushing the Boundaries，Volume 2. 2023：157-164.
[8]	Li T，Bolkart T，Black M J，et al. Learning a model of facial shape and expression from 4D scans[J]. ACM Trans. Graph.，2017，36（6）：194：1-194：17.
[9]	Grassal P W，Prinzler M，Leistner T，et al. Neural head avatars from monocular rgb videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022：18653-18664.
[10]	Gafni G，Thies J，Zollhofer M，et al. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021：8649-8658.
[11]	Jiang T，Chen X，Song J，et al. Instantavatar：Learning avatars from monocular video in 60 seconds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023：16922-16932.
[12]	Qian S，Kirschstein T，Schoneveld L，et al. Gaussianavatars：Photorealistic head avatars with rigged 3d gaussians[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024：20299-20309.
[13]	Cudeiro D，Bolkart T，Laidlaw C，et al. Capture，learning，and synthesis of 3D speaking styles[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019：10101-10111.
[14]	Fan Y，Lin Z，Saito J，et al. Faceformer：Speech-driven 3d facial animation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022：18770-18780.
[15]	Guo Y，Chen K，Liang S，et al. Ad-nerf：Audio driven neural radiance fields for talking head synthesis[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021：5784-5794.
[16]	Cho K，Lee J，Yoon H，et al. Gaussiantalker：Realtime talking head synthesis with 3d gaussian splatting[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024：10985-10994.
[17]	Li J，Zhang J，Bai X，et al. Talkinggaussian：Structurepersistent 3d talking head synthesis via gaussian splatting[C]//European Conference on Computer Vision. Springer，Cham，2025：127-145.