Data-Parallelism Spanning From SSE to AVX to Larrabee to...

Greetings all, and thanks for reading my first Intel Software Network blog! I just took my wife to see the movie Julie and Julia, and was inspired to blog, and since the popcorn is still processing, I'm not yet asleep. I won't make a 365-day commitment to parallelize every numerical recipe or anything, but I will try to keep coming back, answer questions, follow-up on new Intel technology developments, etc.

I first started writing SIMD code for the first instantiation of the Intel® Itanium™ processor (codenamed Merced) in 1997. This was before Intel's compiler for Itanium was fully functional, so when they wanted to demonstrate peak FFT performance, they assigned the task to an unsuspecting new member of the team, and provided me with an assembler, an instruction-set manual, and some other relevant training. Wow, that was painful. When I went to work for the team developing the first Intel® Pentium™ III (codenamed Katmai), writing SSE assembly code seemed easy by comparison. Thank you Out-of-Order Execution, goodbye EPIC! Around the time that the Intel® Pentium™ 4 processor began shipping, the Intel C/C++ compiler was beginning to enable us to-the-metal programmers to step back from assembly code somewhat with good support for SSE2 compiler intrinsics. However, as my scope widened from focused media-processing kernels (filters, codecs, etc.) to applications orders of magnitude larger, it eventually became clear that the gradual rate of improvement in the compiler and optimization tools was not enough to satisfy my customers’ desire to quickly develop and ship new code that would run at peak performance on the latest Intel hardware. For some years, I tried to help fill this gap by writing many tens of thousands of lines of SSE2-SSE4 intrinsics code, and sure it helped make the apps significantly faster, but now what?

Now we have new technology like Intel AVX and Intel Larrabee technology just around the corner, and all of that code will have to be rewritten to fully take advantage of either. At the same time, the landscape of software development has changed, and it’s no longer purely CPU-focused. The big question I think software developers want to know is “In what technology/technologies should we invest our limited development resources?” Certainly Intel is working on answering “What software technology should we develop and productize?” In my next posts, I will discuss both of these questions with anticipation of your feedback.

Thanks for reading,


For more complete information about compiler optimizations, see our Optimization Notice.