Sign up for our daily and weekly newsletters for the latest updates and exclusive content on the industry’s best AI coverage. Learn more
Tencent today unveiled “Hunyuan3D 2.0,” an AI system that transforms a single image or text description into a detailed 3D model within seconds. The system makes what would normally be a lengthy process that could take a skilled artist days or weeks a quick, automated task.
Following its predecessor, this new version of the model has been made available as an open source project on both Hugging Face and GitHub, making the technology immediately accessible to developers and researchers around the world.
“Creating high-quality 3D assets is a time-intensive process for artists, so automated generation is a long-term goal for researchers,” the team wrote in a technical report. The upgraded system builds on the foundation of the previous system with significant improvements in speed and quality.
How Hunyuan3D 2.0 converts images into 3D models
Hunyuan3D 2.0 uses two main components: Hunyuan3D-DiT creates basic shapes, and Hunyuan3D-Paint adds surface details. The system first creates multiple 2D views of the object and then builds them into a complete 3D model. The new guidance system solves common problems with AI-generated 3D models by ensuring all views of objects match.
“We positioned the cameras at specific heights to capture the maximum visible area of each object,” the researchers explain. This approach, combined with mixing different perspectives, helps the system capture details that other models often miss, especially at the top and bottom of objects.
Faster and more accurate: What sets Hunyuan3D 2.0 apart
The technical results are impressive. Hunyuan3D 2.0 produces more accurate and visually appealing models than existing systems, according to standard industry measurements. The standard version creates a complete 3D model in about 25 seconds, while the smaller, faster version takes just 10 seconds.
What sets Hunyuan3D 2.0 apart is that it can handle both text and image input, making it more versatile than previous solutions. The system also introduces innovative features such as “adaptive classifier-free guidance” and “hybrid input” that help ensure consistency and detail in the generated 3D models.
According to published benchmarks, Hunyuan3D 2.0 achieved a CLIP score of 0.809, outperforming both open source and proprietary alternatives. This technology significantly improves texture synthesis and geometric accuracy, outperforming existing solutions across all standard industry metrics.
A key technological advancement in the system is its ability to generate high-resolution models without the need for large-scale computing power. The team developed a new method to increase detail while keeping processing requirements manageable. This is a frequent limitation of other 3D AI systems.
These developments are important to many industries. Game developers can quickly create test versions of characters and environments. Online stores can show their products in 3D. Film studios can preview special effects more efficiently.
Tencent has shared almost every part of its system through its AI tool platform, Hugging Face. Developers can now use the code to create 3D models that work with standard design software, making them ready for immediate use in professional environments.
The technology represents a significant advance in automated 3D production, but it raises questions about how artists will work in the future. Tencent sees Hunyuan3D 2.0 not as a replacement for human artists, but as a tool to handle technical tasks while creators focus on artistic decisions.
As 3D content becomes a mainstay of gaming, shopping and entertainment, tools like Hunyuan3D 2.0 point to a future where creating virtual worlds is as simple as explaining it. The challenge ahead may not be creating 3D models, but deciding how to use them.