Baidu launches ERNIE 4.5 and ERNIE X1 models for multimodal operation, touting lower cost than DeepSeek.
To compete with DeepSeek R1, which has attracted widespread adoption, Baidu recently announced updates to its multimodal-enabled base model, ERNIE 4.5 (Wenxin 4.5), and its deep inference model, ERNIE X1, boasting that the latter's cost is only half that of DeepSeek R1. Furthermore, Baidu upgraded its chatbot, Wenxin Yiyan (ERNIE Bot), to support both ERNIE 4.5 and ERNIE X1 models. ERNIE 4.5 claims to simultaneously understand text, images, audio, and video content, as well as the context of content descriptions. It can also understand internet memes and satirical cartoons, outperforming OpenAI's GPT-4.5 in multimodal operation and surpassing GPT-4.5, GPT-4o, and DeepSeek V3 Chat in text comprehension. Regarding usage costs, Baidu claims that ERNIE 4.5 costs only RMB 0.004 per 1000 input tokens, just 1% of GPT-4.5, while the output cost is RMB 0.016 per 1000 input tokens. ERNIE X1 excels in Chinese knowledge comprehension question answering, literary creation, writing, dialogue, logical reasoning, and complex calculations. It also supports advanced search, document content comprehension question answering, image recognition, image generation, interpreter coding, webpage reading, and tree diagram mapping. Furthermore, it can serve as the system behind Baidu Academic Search and in business information-related search applications. Baidu states that ERNIE X1 starts at RMB 0.002 per 1000 input tokens, almost half the cost of DeepSeek R1, while the output cost is RMB 0.008 per 1000 input tokens.








