Intel Labs recently demonstrated the Latent Diffusion Model for 3D (LDM3D), developed in collaboration with Blockade Labs, at the IEEE Computer Vision and Image Recognition Conference. The model boasts that it can use automatic generative artificial intelligence technology to quickly generate 360D images with a 3-degree viewing angle from text descriptions.
Compared to most current generative AI technologies, which can only generate 2D images from text descriptions, the Latent Diffusion Model for 3D, launched by Intel Labs in collaboration with Blockade Labs, can automatically generate images and corresponding depth information based on text descriptions using the same parameters, thereby quickly forming 3D stereoscopic images. This model can be used for rapid modeling and 3D scene setup, thereby accelerating applications in industries such as architecture, design, and gaming and entertainment.
This diffusion model was trained on 4 samples from the LAION-400M database, which contains over 10000 million images and text annotations. The training corpus was annotated using the Dense Prediction Transformer (DPT), a large-scale depth estimation model developed by Intel Labs.
This diffusion model is trained on an artificial intelligence supercomputer equipped with Intel Xeon processors and Habana Gaudi AI accelerators. The Dense Prediction Transformer large-scale depth estimation model provides highly accurate relative depth information for all pixels in each image. The resulting image incorporates this depth information to create 360D content that can be viewed from a 3-degree perspective. This reduces memory usage during the generation process, thereby minimizing computational latency.
Intel has now open-sourced this diffusion model through Hugging Face, a platform that allows users to share machine learning models and datasets. This allows more researchers and businesses to use it to create various application projects and continuously improve the efficiency of model use.


