-
摘要: 【目的】随着信息技术发展及元宇宙兴起,高真实感、可交互三维数字人需求日增。本文旨在梳理可驱动数字人重建与交互的关键技术、成果及难题,为相关研究提供参考。【方法】本文首先阐述数字人技术背景、价值及其向深度学习驱动的转变。重点叙述三维数字人主流重建技术,包括传统方法及基于神经辐射场(NeRF)与三维高斯溅射(3DGS)的方法。随后梳理基于网格、NeRF、3DGS 等不同表征下的驱动技术,尤其关注语音驱动的进展。最后探讨实现实时、可定制化交互的关键环节及其难题,总结技术挑战并展望未来。【结果】分析表明,NeRF、3DGS 等新技术的应用显著提升了数字人重建的真实感与驱动的自然度。但在单目重建精度、动态细节捕捉、情感丰富度、个性化驱动灵活性、实时交互时延、多模态数据有效融合及高质量数据集构建等方面仍存诸多问题。【结论】可驱动数字人重建与交互技术正向更高保真度、更强交互性、更具表现力、更低延迟的方向发展。未来研究需持续攻克现有技术难题,加强对跨模态的理解与生成,提高个性化以及情感化的水平,并构建更完善的数据集与评价体系,以推动其在虚拟现实、新媒体等领域的广泛应用。
-
[1] Goodfellow I,Pouget-Abadie J,Mirza M,et al. Generative adversarial networks[J]. Communications of the ACM,2020,63(11):139-144. [2] Vaswani A. Attention is all you need[J]. Advances in Neural Information Processing Systems,2017. [3] Mildenhall B,Srinivasan P P,Tancik M,et al. Nerf:Representing scenes as neural radiance fields for view synthesis[J]. Communications of the ACM,2021,65(1):99-106. [4] Kerbl B,Kopanas G,Leimk hler T,et al. 3D Gaussian Splatting for Real-Time Radiance Field Rendering[J]. ACM Trans. Graph.,2023,42(4):139:1-139:14. [5] Newcombe R A,Izadi S,Hilliges O,et al. Kinectfusion:Real-time dense surface mapping and tracking[C]//2011 10th IEEE international symposium on mixed and augmented reality. Ieee,2011:127-136. [6] Loper M,Mahmood N,Romero J,et al. SMPL:A skinned multi-person linear model[M]//Seminal Graphics Papers:Pushing the Boundaries,Volume 2. 2023:851-866. [7] Blanz V,Vetter T. A morphable model for the synthesis of 3D faces[M]//Seminal Graphics Papers:Pushing the Boundaries,Volume 2. 2023:157-164. [8] Li T,Bolkart T,Black M J,et al. Learning a model of facial shape and expression from 4D scans[J]. ACM Trans. Graph.,2017,36(6):194:1-194:17. [9] Grassal P W,Prinzler M,Leistner T,et al. Neural head avatars from monocular rgb videos[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:18653-18664. [10] Gafni G,Thies J,Zollhofer M,et al. Dynamic neural radiance fields for monocular 4d facial avatar reconstruction[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:8649-8658. [11] Jiang T,Chen X,Song J,et al. Instantavatar:Learning avatars from monocular video in 60 seconds[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:16922-16932. [12] Qian S,Kirschstein T,Schoneveld L,et al. Gaussianavatars:Photorealistic head avatars with rigged 3d gaussians[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024:20299-20309. [13] Cudeiro D,Bolkart T,Laidlaw C,et al. Capture,learning,and synthesis of 3D speaking styles[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019:10101-10111. [14] Fan Y,Lin Z,Saito J,et al. Faceformer:Speech-driven 3d facial animation with transformers[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:18770-18780. [15] Guo Y,Chen K,Liang S,et al. Ad-nerf:Audio driven neural radiance fields for talking head synthesis[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021:5784-5794. [16] Cho K,Lee J,Yoon H,et al. Gaussiantalker:Realtime talking head synthesis with 3d gaussian splatting[C]//Proceedings of the 32nd ACM International Conference on Multimedia. 2024:10985-10994. [17] Li J,Zhang J,Bai X,et al. Talkinggaussian:Structurepersistent 3d talking head synthesis via gaussian splatting[C]//European Conference on Computer Vision. Springer,Cham,2025:127-145. -

计量
- 文章访问数: 1
- HTML全文浏览量: 0
- PDF下载量: 0
- 被引次数: 0