As on-device AI applications accelerate in popularity, demands for everything from photography, voice, and image processing to interacting with AI assistants become increasingly complex. Arm has announced the release of its Scalable Matrix Extension 2 (SME2) instruction set, enhancing the AI computing capabilities of its computing platform in mobile devices. This allows developers to improve AI application performance without requiring additional source code modifications, while achieving a low-power, high-efficiency application experience.
SME2 is a high-performance CPU instruction set upgrade within the Armv9 instruction set architecture. It's designed specifically to accelerate the processing efficiency of AI workloads such as computer vision, speech recognition, and natural language processing. It can run directly on mobile device CPUs, eliminating the need for additional cloud computing resources and mitigating latency and energy consumption bottlenecks in on-device AI applications.
More importantly, through Arm's KleidiAI software acceleration layer, developers can automatically benefit from the improved execution performance of SME2 without modifying their models or applications. SME2 has already been integrated into several mainstream AI frameworks and libraries, including Google's XNNPACK, LiteRT, and MediaPipe, Alibaba's MNN, Microsoft ONNX Runtime, and even llama.cpp. This means that SME2 has been fully integrated into the existing Android AI software stack, ensuring that developers can immediately benefit from it within their existing architectures.
Iliyan Malchev, a senior engineer for Android at Google, stated that SME2 technology enables advanced language models like Gemma 3 to run smoothly on local devices, eliminating the need for cloud computing. Compared to comparable hardware, the SME2-enabled Gemma 3 model is six times faster in text summarization tasks, even generating content up to 6 words with just a single CPU core, highlighting the substantial improvement SME1 brings to mobile AI performance.
Android devices equipped with SME2 are about to hit the market, and iOS devices already support the SME2 instruction set architecture. Combined with KleidiAI's automatic matrix operations, developers can seamlessly upgrade to the next generation of AI applications by simply building on supported platforms, providing users with a more immediate, low-latency, and high-performance interactive experience.
According to Arm statistics, over 900 million applications and services currently run on Arm-based computing platforms, with over 2200 million developers actively building them. As SME2 technology becomes more widespread, more applications will be able to perform complex AI calculations on mobile devices, revolutionizing user experiences and performance expectations.





