Insights from Intel Visionary Moh Haghighat
By Edward J. Correia
Among the applications that would benefit from such performance gains include those for speech and facial recognition, audio and video processing, complex 2D and 3D graphics processing, perceptual computing—such as hand and finger tracking—and gaming. Such applications generally require direct hardware access to perform adequately. But with SIMD.js, these types of resource-intensive apps can be deployed through a browser with excellent performance and significant power savings.
Edward Correia recently sat down with Moh Haghighat to discuss the status of the SIMD.js project, technical details behind the library and its interfaces, benefits of SIMD.js for multiple-core systems, development and debugging tools now available, involvement of computer and browser makers for browser runtime aspects of the project, interest in driving standardization, and its implications for HTML5 and the future of cross-platform development.
WHAT IS SIMD?
Can you give readers a brief explanation of SIMD and how it benefits application performance?
SIMD instructions operate on multiple data elements of vectors simultaneously. For example, if you have a vector of four elements, you can say, "Load this to a register," and your register has enough space for four 32-bit elements. Then you do another load or add this to a memory vector and you get four additions at the same time using one instruction.
Now, if you have an 8-wide vector or even a 16-wide vector—the more recent Intel® architectures have up to 16 elements—then this obviously gives you significant speed-up using just one operation by a factor of the vector length or even larger because of implementation.
Is it possible to determine programmatically the width of the pipe in which you can operate and then program it accordingly to take as much advantage as possible?
MH: We have approached it in simpler steps. We currently support vectors of length four, [but having vectors of length] eight is under discussion. We've also discussed parallel abstractions, which would hide the vector length and let you code without your program hard-coded to a specific vector length value. However, if you rely too much on the runtime system and just-in-time (JIT) compiler for determining those things and how it has to compile, it's more difficult. So for now, we started with explicit vectors of length four.
In one of your HTML5 presentations, you spoke of bridging the gaps between the browser apps and native development. Can you explain?
SIMD.js isn't generally available yet. Is there something similar that developers can use today?
MH: The full interpreter patch has already landed in Firefox* Nightly. That is, if you write code with our SIMD API and run it in Firefox, the interpreter part is fully implemented. It will recognize those operations and execute them in the interpreter mode using the scalar operations. But that doesn't give you speed-up because you need to map to the vector instructions that are generated dynamically by the JIT compiler; this work is ongoing.
If you write to the API according to the straw-man spec we have with Google and Mozilla*, Firefox will recognize it and actually execute that, but slowly because our implementation is not yet complete. Work is ongoing in landing our full patch in small incremental patches. Internally, we have fully working versions of both Firefox and Chromium* (the open source version of Chrome) with tremendous performance improvements.
On our SIMD benchmarks, shown in the speedups chart, we're getting up to 4x, 8x, and even larger improvement, even on vectors with four elements. This is a super-linear speedup. For that, Mozilla wants us to break down the full patch to smaller increments of patches so they can test and verify, and then adopt that. This ongoing work with Mozilla and Google is going really well.
SIMD Speedups on Chromium
You've said that another gap lies in the use of parallelism to leverage the multi-core processors in all of today's devices. Is the project doing anything there?
HOW TO CHANGE YOUR APPS
As developers move forward, how significantly will their applications need to change in order to access native hardware and take advantage of SIMD instructions?
What hurdles exist before SIMD.js becomes part of the official HTML5 specification?
MH: We are working on that through ECMAScript. Once the SIMD.js spec is final, core compute-intensive algorithms can be written that way. Once written, you can embed them in libraries. The same way that today, for example, if you operate on an HTML5 Canvas object and say, "Rotate this canvas base by x number of degrees," you don't really need to know how that rotate property has been implemented, whether it's implemented using a GPU or a CPU, and whether or not there's hardware acceleration. The developer doesn't need to know that.
"We changed both Firefox and Chromium. It's roughly 20,000 lines of new code and changes in each browser. But once there, the developer doesn't need to do anything other than write the SIMD code and users won't need to modify anything in the VM. They just get the SIMD-enabled version of the browser or the run-time, and their program will work fine." Moh Haghighat, senior principal engineer, Intel
Describe the changes you're making to browser runtimes.
MH: This is one development where you're mapping a high-level language to really low-level machine instructions. We changed both the Firefox and Chromium engines. On the Mozilla side, the involved components are currently Mozilla's IonMonkey* [JIT] and the OdinMonkey* [JIT and optimizations] for the asm.js part.
So, if a SIMD-capable app runs on Chrome for Windows, let's say, in theory it will run unchanged on Chrome and Android?
To make all this possible, are the JIT and runtime doing the heavy lifting?
MH: Yes. Everything is actually done in both the VM and the JIT. Mainly in the JIT, because that is where object properties are mapped to your processor instructions. The JIT has to take care of register allocation, recognizing the operation, mapping, and so on. We're excited about bringing the fantastic high-performance power-efficient SIMD capability to the most widely used language in a consistent fashion that will run on all devices.
A follow-up to this article, which will be published this summer, will cover Haghighat's discussion on development tools (including Intel® XDK), origins of the SIMD.js project, a move toward standardization, and the ECMAScript Committee. In the meantime, explore these resources for more information:
Originally developed at Intel Labs, River Trail is designed to provide data parallelism for Web applications. Check out this video to learn more.