Recently launched an artificial intelligence model that enhances logical reasoning「QwQ-32B-Preview」Alibaba earlier announced the launch of an artificial intelligence model with visual inference capabilities.「QVQ-72B-Preview」, and emphasizes significant progress in language understanding and visual reasoning, claiming to be able to solve complex problems.
In addition to increasing the number of parameters to 72 billion, the "QVQ-720B-Preview" also adds image recognition capabilities, combined with the understanding and analysis capabilities of a large natural language model, to infer and solve complex problems through contextual understanding, inference, and visual analysis.
Alibaba stated that the QVQ-72B-Preview will be used for simulating the placement of large furniture in a space and for medical image analysis and diagnosis. Furthermore, the QVQ-72B-Preview achieved excellent performance on the MathVista, MathVision, and OlympiadBench math learnable benchmarks, with its MathVision performance approaching that of OpenAI's o1 artificial intelligence model.
However, since it is still in the preview stage, "QVQ-72B-Preview" may encounter recursive reasoning loops during execution, and its execution response time will increase when mixing different languages.
Currently, "QVQ-72B-Preview" has been hosted on the Hugging Face platform and is available to everyone in an open source form.



