Microsoft’s end-to-end Arm development environment is here at last (2024)

Enterprise Microsoft

By Simon Bisson, Contributor, InfoWorld |

With Arm developer hardware now shipping, Windows is ready to embrace an NPU-powered AI future.

Microsoft’s end-to-end Arm development environment is here at last (2)

Accelerate AI with neural processing

NPUs are new to Windows, but these specialized accelerators are a key feature of the hyperscale cloud. An Arm system on a chip with its own NPUs is an important part of Microsoft CEO Satya Nadella’s “intelligent edge.” Microsoft has already delivered an example of this in its Azure Percept IoT appliances, which use Intel NPU hardware and NXP Arm processors. In this approach, machine learning models are built and trained in the cloud before being exported into standard formats and executed on local runtimes that leverage the NPU hardware.

That’s all very well for Microsoft’s own models, but how do we build our own? There are still questions to be answered, but much of the picture has finally come into focus.

First, we need to start with affordable developer hardware. The Arm-powered Surface Pro 9 and its Surface Pro X predecessors are lovely pieces of hardware, but they’re high-end personal devices and unlikely to be used as developer hardware. Microsoft is well aware of this, announcing its“Project Volterra” Arm-based developer hardwareat Build earlier this year.

After a series of supply chain–related delays, the hardware is now shipping as Windows Dev Kit 2023, an affordable $599 desktop Arm PC based on the Snapdragon 8cx Gen 3 processor, with 32GB of RAM and 512GB of storage. You can think of it as an Arm-powered NUC, using a variant of the mobile hardware used in Arm-based Windows devices, though without the 5G connectivity.

NPUs in Windows

Microsoft is presenting the NPU as the future of Windows, so it needs an NPU-centric development platform like these Arm-powered boxes. Qualcomm claims 29 TOPS for its AI Engine NPU, which it states supports “cloud-class models running on thin and light laptops.” Offloading these from the CPU to the NPU should allow applications to remain responsive, using the 8 Arm cores and the GPU to use and render NPU outputs. The sample applications Microsoft demonstrated at its recent Surface launch show this approach in action: The SQ3’s NPU manages complex audio and video tasks, with the results composited and displayed via the existing camera application into tools such as Teams.

At the heart of Microsoft’s NPU support is the ONNX (Open Neural Network Exchange) portable neural network format. This allows you to take advantage of the compute capabilities of Azure’s Machine Learning platform to build and train models before exporting them to run locally on a Windows device through either the Windows ML or ML.NET APIs or directly using the Qualcomm Neural Processing SDK.

For now, Windows Arm devices will need to use the Qualcomm tool to access their NPUs. Although Windows ML and ML.NET support is likely if Microsoft and Qualcomm collaborate on a DirectCompute wrapper for its AI Engine APIs, for now, it looks as though you have to build separate versions of your applications if you want to run ONNX on Qualcomm-based Arm devices or on Intel hardware. As there’s already a Qualcomm-optimized version of the .NET ONNX Runtime library, this should be relatively easy for higher-level tools to implement.

Microsoft and Qualcomm provide a complete toolchain for building NPU applications on the Windows Dev Kit 2023 hardware, with an Arm build of Visual Studio 2022 available as a preview, along with an Arm-optimized release of the upcoming .NET 7. Alongside these, sign up for the closed Windows release of Qualcomm’s Neural Processing SDK for AI, often referred to by its old name: the Snapdragon Neural Processing Engine (SNPE). This comes with an SNPE runtime for ONNX models.

Build an NPU application on Windows Dev Kit hardware

Getting a model running on the 8cx AI Engine NPUisn’t entirely straightforward. It requires both the Linux and Windows versions of the Qualcomm NPU tool to create an optimized ONNX file from a pretrained model. As Windows 11 and WSL2 support Arm binaries, you can do this all on the Dev Kit system, first setting up an Ubuntu WSL environment and then installing Qualcomm’s tools and configuring your Linux environment to use them. You can, of course, use any Ubuntu system; for complex models, you may prefer Azure Arm instances to process models.

Start by installing Python 3 in your Linux system, then use pip to install the ONNX tools, following the instructions on GitHub. You can now install the SNPE tools, first unzipping the distribution file into a directory and then running SNPE’s dependency checker to ensure you have everything needed to run the SDK. Once all the prerequisites are in place, use its configuration script to set environment variables for use with ONNX.

You’re now ready to process an existing ONNX model for use with a Qualcomm NPU. Download your model and a sample file along with its label data. The example Microsoft uses is an image recognizer, so you’ll need the SNPE tools to preprocess the sample image before converting the ONNX model into SNPE’s internal DLC format. Once that process is complete, use SNPE to quantize the model before exporting a ONNX-wrapped DLC file ready for use in your code.

Copy the ONNX file from WSL into Windows. You can now install the Microsoft.ML.OnnxRuntime.Snpe package from Nuget, ready to use in your applications. This is an optimized version of Microsoft’s existing ONNX Runtime tooling, so it should be relatively simple to add to existing code or build into a new app. If you need hunts, the sample C# code in the example Windows SNPE repository will help you use the sample ONNX model in a basic console application.

There’s enough in the combination of Qualcomm’s machine learning tool and Microsoft’s Arm platform to get you building code for this first generation of Windows NPU hardware. Microsoft’s own NPU-powered video and audio features in Windows 11 should help inspire your own code, but there’s a lot that can be done with hardware neural network accelerators, for example, using them to speed up image processing tools like those used in Adobe’s creative tools or using NPU-accelerated Arm hardware running Windows 11 IoT Enterprise on the edge of your network to preprocess data before delivering it to Azure IoT Hubs.

This is the early stage of a new direction for Windows, and while Microsoft has been using these tools internally for some time, they’re now available to us all—ready to take advantage of a new generation of Arm-based Windows systems on our desks, on the edge, and in the cloud.

Next read this:

Why companies are leaving the cloud
5 easy ways to run an LLM locally
Coding with AI: Tips and best practices from developers
Meet Zig: The modern alternative to C
What is generative AI? Artificial intelligence that creates
The best open source software of 2023

Development Tools
Machine Learning
Cloud Computing
Internet of Things
Microsoft Azure

Author of InfoWorld's Enterprise Microsoft blog, Simon Bisson has worked in academic and telecoms research, been the CTO of a startup, run the technical side of UK Online, and done consultancy and technology strategy.