|
Biography
I am a Research Scientist at NVIDIA Research, Deep Imagination Research Group. I am also a final-year Ph.D. at CUHK Multi-Media Lab (MMLab), supervised by Prof. Dahua Lin, Prof. Ziwei Liu, and Prof. Xihui Liu.
Before that, I received the Bachelor's degree at Zhejiang University in 2021, advised by Prof. Xiaowei Zhou.
I am fortunate to have extensive industrial experience during Ph.D. study, with multiple internships at several leading research institutes, including NVIDIA Research, Snap Research, Tencent AI Lab, SenseTime Research, and Shanghai AI Lab.
My research interests include computer vision and generative modeling, especially the foundation GenAI pre-training / post-training, vision-language models, multi-modal tokenizers, and their applications in digital humans and physical AI.
I am always open to discussions and collaborations, feel free to drop me an email if you are interested in :)
News
- [01/2025] Cosmos won the Best of CES, Best of AI, and Best Overall Awards in CNET 2025!
- [01/2025] We release Cosmos, a world foundation model platform for Physical AI. Models open-sourced on Github and HF!
- [01/2025] Four papers are accepted to ICLR 2025.
- [12/2024] One paper is accepted to AAAI 2025.
- [11/2024] We release Cosmos-Tokenizer, a suite of SOTA image/video tokenizers with models available on Github and HF!
- [09/2024] Honored to receive ECCV 2024 Outstanding Reviewer Award. Great thanks for the recognition!
- [07/2024] Two papers are accepted to ECCV 2024.
- [05/2024] One paper is accepted to ICML 2024.
- [03/2024] Start my internship at NVIDIA Research. See you in Santa Clara!
- [03/2024] Two papers are accepted to CVPR 2024, with HumanGaussian accepted as Highlight (Top 2.8%). See you in Seattle!
- [01/2024] One paper is accepted to ICLR 2024, with HyperHuman receiving review score of 6, 6, 8, 10 (Top 1.6%, Rank).
- [01/2024] I will intern at GenAI Team @ Meta AI Research in 2024 Fall. See you in Menlo Park!
- [11/2023] I will intern at Deep Imagination Research @ NVIDIA Research in 2024 Spring with Ming-Yu Liu. See you in Santa Clara!
- [11/2023] A high-quality 3D human generation framework HumanGaussian is released, with all the code and models available!
- [10/2023] A hyper-realistic human generation foundation model HyperHuman collaborated with Snap Research is on arXiv!
- [07/2023] One paper is accepted to ICCV 2023.
- [05/2023] Start my internship at Snap Research. See you in Los Angeles!
- [03/2023] Two papers are accepted to CVPR 2023.
- [03/2023] One paper is accepted to TMLR 2023.
- [09/2022] One paper is accepted to NeurIPS 2022, with ANGIE accepted as Spotlight (Top 5%)!
- [07/2022] Three papers are accepted to ECCV 2022, with SSP-NeRF accepted as Oral (Top 2.7%)!
- [03/2022] One paper is accepted to CVPR 2022.
- [12/2021] One paper is accepted to AAAI 2022.
[Show more]
Industrial Research
|
Cosmos World Foundation Model Platform for Physical AI
Contributions: Auto-Regressive Foundation Model Pre-Training & Post-Training. (CES'25 Best of AI, Best Overall)
|
|
Cosmos Tokenizer: A Suite of Image and Video Neural Tokenizers
Contributions: Continuous/Discrete Image/Video Tokenizers.
|
Selected Publications [ Full List ] (* indicates equal contribution)
|
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
International Conference on Learning Representations (ICLR), 2025.
|
|
High-Quality Joint Image and Video Tokenization with Causal VAE
International Conference on Learning Representations (ICLR), 2025.
|
|
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng,
Han Shi,
Xian Liu,
Xuefei Ning,
Guohao Dai,
Yu Wang,
Zhenguo Li,
Xihui Liu.
International Conference on Learning Representations (ICLR), 2025.
|
|
EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation
International Conference on Learning Representations (ICLR), 2025.
|
|
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024. (Highlight, Top 2.8%)
|
|
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
International Conference on Learning Representations ( ICLR), 2024. (Review Score 6, 6, 8, 10, Top 1.6%, Rank)
|
|
Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation
European Conference on Computer Vision (ECCV), 2022. (Oral, Top 2.7%)
|
|
Audio-Driven Co-Speech Gesture Video Generation
Advances in Neural Information Processing Systems (NeurIPS), 2022. (Spotlight, Top 5%)
|
|
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
|
|
Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023.
|
|
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
AAAI Conference on Artificial Intelligence (AAAI), 2022.
|
|
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani*,
Xian Liu*,
Yifan Wang*,
Ivan Skorokhodov,
Victor Rong,
Ziwei Liu,
Xihui Liu,
Jeong Joon Park,
Sergey Tulyakov,
Gordon Wetzstein,
Andrea Tagliasacchi,
David B. Lindell.
European Conference on Computer Vision (ECCV), 2024.
|
|
Object-Compositional Neural Implicit Surfaces
European Conference on Computer Vision (ECCV), 2022.
|
|
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
European Conference on Computer Vision (ECCV), 2024.
|
|
TextCraftor: Your Text Encoder Can be Image Quality Controller
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
|
Experiences
|
Research Scientist.
Jun. 2024 - Now
NVIDIA Research, Deep Imagination Research Group.
|
|
Generative AI Research Intern, Deep Imagination Research, NVIDIA Research.
Mar. 2024 - Jun. 2024
Topic: Image/Video Foundation Models, Tokenizers, Multi-Modal Language Models.
|
|
Research Visiting Student, Toronto Computational Imaging Group.
Dec. 2023 - Mar. 2024
Topic: Text-to-4D Generation.
|
|
Research Intern, Tencent AI Laboratory.
Sept. 2023 - Dec. 2023
Topic: Text-Driven 3D Human Generation.
|
|
Research Intern, Creative Vision Group, Snap Research.
May. 2023 - Sept. 2023
Topic: Human Generation Foundation Model.
|
|
Research Intern, Digital Content Group, Shanghai AI Laboratory.
Jul. 2021 - Feb. 2022
Topic: Digital Human, Gesture Generation.
|
|
Research Intern, Intelligent Video Group, SenseTime Research.
Aug. 2020 - Jun. 2021
Topic: Digital Human, Face Animation.
|
Invited Talks
Professional Services
- Conference Program Committee / Reviewer: CVPR, ECCV, ICCV, SIGGRAPH, SIGGRAPH Asia, NeurIPS, ICML, ICLR, AISTATS, AAAI.
- Journal Reviewer: TPAMI, IJCV, TVCG, EG, CGF, PG.
Selected Honors & Awards
- CNET 2025 Best of CES, Best of AI, and Best Overall.
2025
- ECCV Outstanding Reviewer Award.
2024
- CVPR Travel Award.
2024
- ICLR Travel Award.
2024
- National Scholarship.
2019, 2020
- Hong Kong Ph.D. Fellowship Scheme (HKPFS).
2021- 2025
- Outstanding Graduate of Zhejiang Province.
2021
- Outstanding Bachelor Thesis Award of Zhejiang University, Top 1%.
2021
- UCLA CSST Scholarship Program.
2020
- SenseTime Scholarship.
2020
- Tang Lixin Scholarship.
2019
- First Class Scholarship for Academic Excellence.
2019, 2020
Teaching Experience
- ENGG 1120, Linear Algebra for Engineers.
Spring 2022.
- ENGG 2440, Discrete Mathematics for Engineers.
Fall 2021.
|