Designed for the needs of device-side learning applicationsProject Trillium Design PlatformIn an interview at MWC 2018, Jem Davies, general manager of ARM's Machine Learning Group, stated that device-based learning computing will be the future development trend. Applications will range from small devices such as the Internet of Things and mobile phones to large-scale "equipment" such as autonomous vehicles and robots, as well as data centers and smart cities. Deep learning is expected to optimize data transmission and reduce latency, thereby gaining advantages such as optimized implementation costs and enhanced security.
Regarding the Project Trillium design platform recently announced by ARM, the processor designed mainly based on the ARM ML architecture can achieve faster machine learning efficiency. At the same time, it can also more quickly perceive and identify changes in objects through the processor designed based on the ARM OD architecture. For example, it can accurately judge movements and gestures other than faces, and even objects worn on the body. Then, through software called ARM NN, it can be connected to learning frameworks such as TensorFlow, Caffe or Android NN, and the hardware computing performance can be used to accelerate the learning effect.
Unlike previous approaches that achieved machine learning efficiency through the collaborative work of the CPU and GPU, the Project Trillium platform can achieve learning data throughput that is nearly 2-4 times higher. Furthermore, images can be captured in 1080p@60fps format, allowing end devices to learn to correctly recognize faces and even further learn to detect changes in facial expressions, gestures, and other body movements, as well as to recognize decorations other than faces.
According to Jem Davies, processors designed based on the ARM ML architecture can generate up to 4.6 TFLOPS of computing power on mobile devices, or operate at a relatively more efficient 3 TFLOPS, and can provide processing speeds 80 times faster than general digital signal processing components. At the same time, processors designed based on the ARM OD architecture can also cope with industrial-grade object recognition effects.
At the same time, within the Project Trillium design platform, ARM also offers technical solutions for developers looking to leverage existing hardware and software frameworks to create endpoint learning computing models. This means that ARM believes that whether achieving acceleration through standalone learning computing components or leveraging existing CPUs, GPUs, and other components in conjunction with software learning frameworks, these can all be successful endpoint learning application models. Ultimately, the actual application will determine the most appropriate learning application model for each use case.
For example, if faster real-time recognition speed is required, it may be better to use a dedicated learning acceleration processing component. If the operation mode needs to be flexibly changed according to different usage scenarios, using existing CPU, GPU and other components in conjunction with the learning framework to operate on the FPGA programmable instruction set architecture can provide greater computing advantages.
On-device learning applications will drive new computing models
At present, manufacturers including Huawei and Apple have adopted dedicated learning computing components to assist acceleration, thereby reducing the lag caused by traditional computing. At the same time, MediaTek announced earlierHelio P60A dedicated APU independent computing unit is also added to the image recognition application to achieve faster facial recognition results.
Qualcomm and other manufacturers believe that while design acceleration through dedicated learning computing components is important, considering that most hardware designs must operate within the same architectural model, using independent computing components to assist in acceleration will inevitably face limitations during software and hardware updates. This can lead to application service incompatibility or inability to achieve optimal efficiency, and may even result in additional costs during product design. Therefore, they believe that achieving similar or even higher learning acceleration effects through the combination of existing hardware and software computing methods is a more cost-effective approach.
From ARM's perspective, while the market proposes different approaches, they generally originate from its partners. Therefore, ARM will also provide more convenient design and application reference solutions to meet such needs, allowing more partners to quickly create application products through such design platforms, or further adapt and create new technologies. ARM primarily plays a role as a technology supplier in this development.
Jem Davies believes that device-side learning acceleration applications are not absolutely related to the development of 5G networking technology. Instead, it is due to the growing demand for computing efficiency, privacy and security, and personal use on the device side, which has led to a shift from relying on cloud-based collaborative computing to learning acceleration to complete early computing on the device side, while cooperating with cloud services to complete larger-scale data computing applications, thereby reducing the delay in collaborative computing between the device and the cloud.
At the same time, Jem Davies also expects that device-side learning acceleration applications will change traditional computing models, not only covering mobile devices, monitoring equipment, or more IoT application markets where ARM is currently strong, but may also further change the usage model of PC devices in the future.


