Microsoft* Turbocharges AI with Intel FPGAs. You Can, Too.

Today, Microsoft* announced a public preview of Azure Machine Learning Hardware Accelerated Models powered by Project Brainwave*, a new AI inferencing service.  The service uses Intel® Arria® 10 FPGAs, configured as “soft DNN processing units” highly-tuned to the ResNet-50 image recognition model, to provide extraordinary throughput levels.  Microsoft calls it “real time AI.”  


One year ago, Microsoft Azure CTO, Mark Russinovich, described their plan to build the Azure Cloud Services infrastructure with an FPGA in every node. Instead of creating node pools with specialized hardware accelerators for the wide-ranging workloads that are deployed in Azure, the Microsoft team went with the flexibility of FPGAs, which can be reconfigured to provide hardware acceleration perfectly aligned to nearly any task.  So far, Microsoft has explored acceleration scenarios ranging from network packet processing, Bing* Intelligent Search, Cognitive Services, and today, AI inferencing.

This announcement highlights how much more accessible FPGAs have become. In the past, FPGAs were notoriously challenging for developers -- so we embarked on a fast-paced program to make FPGAs easier and more accessible to developers and data scientists. Intel FPGAs in Azure Cloud Services is one way to take advantage of specific use cases, without the knowledge of programming FPGAs.  Using Python* and TensorFlow*, you can bring existing models into Project Brainwave, or work with Microsoft to onboard new models.  

The performance of mainstream data center processors can address most of the workloads with optimal solution costs. However, throughput or low-latency demands of workloads such as real-time AI inference and continuous model training are growing at such a rapid pace that hardware acceleration may be required. In such cases, Intel® Xeon® Scalable processors in conjunction with Intel FPGAs can offer the best results. If you’re considering FPGA projects on in-house hardware, we recently announced a number of resources that should remove the mystery and accelerate your schedule. A few examples are the Intel® Acceleration Stack for Intel® Xeon® processor with FPGA, Open Programmable Acceleration Engine (OPAE) Technology, and Intel® Programmable Acceleration Cards.  

AI spans a wide range of use cases, deployment models, performance needs, power requirements, and device types.  For example, the solution for hyper-scale AI in a cloud data center is vastly different from autonomous driving or real-time quality inspections on an assembly line.  We offer a portfolio of AI technologies, from high-performance Intel Xeon Scalable processors to ultra-low power Intel® Movidius™ Vision Processing Units, Intel FPGAs, and our upcoming Intel® Nervana™ Neural Network Processor.  We also work across the ecosystem to ensure that frameworks like neon™, TensorFlow, Caffe*, Theano* are optimized for our products. As well, the recently announced Microsoft WinML API takes advantage of Intel’s Movidius VPU, our DX12 capable GPUs, as well as Intel CPUs and AVX512 instructions. 

We offer many developer resources to help you get your solution to market. For commercial developers, consider joining the AI Builders Program. For all developers, explore the Intel® AI Academy which provides learning materials, Intel optimized frameworks and tools along with a vibrant community of AI developers and students.  You may also consider attending the Intel AI Dev Con, later this month in San Francisco. 

It’s a great time to be working in AI. We’re here to help.


Roger Chandler
Roger Chandler

Vice President,
Core and Visual Computing Group,
General Manager,
Developer Programs and Initiatives

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment


This is really interesting, but I can't see any details on how a Field Programmable Gate Array (FPGA) fits into this solution.  I can envision that it is essentially a hardware adaptable processor, that can be configured and reconfigured for the task at hand, but it's my understanding that the field programming of the gate array is not real-time. 

I understand that FPGA will be particularly well suited to providing hardware adaptive to software adaptive algorithms, but I don't understand how the FPGA adjusted, how quick this adjustment is, and what is the type of end advantage.  In short - what is the FPGA's role in the solution?  Is it that Intel's FPGA is able to change gate array programming "in real-time"?  Is it that the FPGA means processing is faster once an AI has been trained, or is it even helpful in the training process? Is it something completely different, such as the FPGA being pre-programmed with statistical or mathematical processes (such as wavelets) to give faster hardware layer to an adaptive software layer?

Any links would be very much appreciated. 

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.