Short-vector math: Intel Compiler vs. IPP

Short-vector math: Intel Compiler vs. IPP


For several years, there have been articles posted on ISN about how to use the Intel Compiler's SVML (short-vector math library) manually for "manual" integration into your own code. Last year, I ran into an issue caling vmlsPowf4 in an x64 build where the results were only sometimes wrong (which for most programmers is worse than if it were just always wrong). After posting the issue to Premier Support, I was surprised to learn that for a user to directly call SVML functions is actually not supported. The issue I was seeing was because the functions were intended to be called only from compiler-generated code, and in x64 that code did not use the standard ABI, so when I called it from my code, it was wrong. The short-term solution in the 10.1 compiler, which only happened to work and was still not "supported" according to the compiler team, was for me to use __svml_powf4 instead of vmlsPowf4. But as more and more code is targeted for x64, we clearly needed a solution.

The usefulness of having highly-optimized SVM functions is easy to see, and the IPP team saw it, and added a set of functions to IPP 6.0 (see ippvm.h), even including variants with more and less floating-point precision so you can choose more accuracy or more performance. Great feature, right? Well, I'm sure it's nice for some uses, but as is often the case when writing highly tuned code, the IPP function doesn't quite suit my needs. My code needs to call math functions like pow, sin, cos, acos, etc on 4 floats at a time (one xmm register), and using IPP to do this is roughly 10x slower than using the unsupported SVML call. Some example code is attached - you may need the 11.0 compiler installed to open it in Visual Studio.

A solution on the horizon

According to the Intel Compiler team, they will begin supporting manual calling of SVML functions in the 11.1 version, with both the Composer and Compiler Pro products. For example, instead of using __svml_powf4, I will use _mm_pow_ps - looking just like an intrinsic for actual instructions. (I have mixed feelings on this choice of naming - what are readers thoughts?) My intent here is to inform Intel Compiler users of the unsupported nature of manually calling SVML functions, and encourage them to test the new 11.1 feature when it becomes available. Also, I'm curious how many readers may already call SVML functions directly, and for those that didn't know it was an option, would like to in the future.

If you're not sure what I'm talking about, hopefully the attached code will clarify it for you. If it doesn't, well this is a discussion forum...

- Eric

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.