DeepSeek open-sources its multimodal artificial intelligence model "Janus-Pro-7B" that combines computer vision
In addition to attracting market attention with its namesake large-scale AI model, DeepSeek recently released its multimodal AI model "Janus-Pro-7B," which combines computer vision with text understanding, on GitHub. Like other multimodal models, "Janus-Pro-7B" analyzes image content using computer vision and combines this with text comprehension to provide deeper inferences. According to the description, "Janus-Pro-7B" can analyze and describe image content, identify geographical locations, recognize text content in images, and answer questions about the context within images. Furthermore, images generated by "Janus-Pro-7B" are of higher quality, more realistic in detail, and can even generate more suitable images based on understanding user input. Like the previously released DeepSeek R1 model, "Janus-Pro-7B" is now also available as open source on GitHub. Microsoft also announced earlier that the "DeepSeek-R1-Distill-Qwen-1.5B" model, after distillation, can be executed on devices conforming to the "Copilot+PC" design. It is expected to be initially available for laptops equipped with Qualcomm Snapdragon X-series processors, followed by laptops with Intel Core Ultra 200V series processors and AMD Ryzen AI 9 series processors. In addition to allowing the "DeepSeek-R1-Distill-Qwen-1.5B" model to be used on "Copilot+PC" design devices, versions with 70 billion and 140 billion parameters will continue to be provided in the future. Furthermore, it will be made available to more developers and enterprises through the Microsoft cloud service platform via Azure AI Foundry.
