Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • Eric Palmer (Intel)December 10, 2008 12:22 PM PST   
    Short-vector math: Intel Compiler vs. IPP

    Background

    For several years, there have been articles posted on ISN about how to use the Intel Compiler's SVML (short-vector math library) manually for "manual" integration into your own code.  Last year, I ran into an issue caling vmlsPowf4 in an x64 build where the results were only sometimes wrong (which for most programmers is worse than if it were just always wrong).  After posting the issue to Premier Support, I was surprised to learn that for a user to directly call SVML functions is actually not supported.  The issue I was seeing was because the functions were intended to be called only from compiler-generated code, and in x64 that code did not use the standard ABI, so when I called it from my code, it was wrong.  The short-term solution in the 10.1 compiler, which only happened to work and was still not "supported" according to the compiler team, was for me to use __svml_powf4 instead of vmlsPowf4.  But as more and more code is targeted for x64, we clearly needed a solution.

    The usefulness of having highly-optimized SVM functions is easy to see, and the IPP team saw it, and added a set of functions to IPP 6.0 (see ippvm.h), even including variants with more and less floating-point precision so you can choose more accuracy or more performance.  Great feature, right?  Well, I'm sure it's nice for some uses, but as is often the case when writing highly tuned code, the IPP function doesn't quite suit my needs.  My code needs to call math functions like pow, sin, cos, acos, etc on 4 floats at a time (one xmm register), and using IPP to do this is roughly 10x slower than using the unsupported SVML call.  Some example code is attached - you may need the 11.0 compiler installed to open it in Visual Studio.

    A solution on the horizon

    According to the Intel Compiler team, they will begin supporting manual calling of SVML functions in the 11.1 version, with both the Composer and Compiler Pro products.  <Applause>  For example, instead of using __svml_powf4, I will use _mm_pow_ps - looking just like an intrinsic for actual instructions.  (I have mixed feelings on this choice of naming - what are readers thoughts?)  My intent here is to inform Intel Compiler users of the unsupported nature of manually calling SVML functions, and encourage them to test the new 11.1 feature when it becomes available.  Also, I'm curious how many readers may already call SVML functions directly, and for those that didn't know it was an option, would like to in the future. 

    If you're not sure what I'm talking about, hopefully the attached code will clarify it for you.  If it doesn't, well this is a discussion forum...

     - Eric



Forum jump:  

Intel Software Network Forums Statistics

17,025 users have contributed to 48,319 threads and 172,758 posts to date.

In the past 24 hours, we have 11 new thread(s) 54 new posts(s), and 47 new user(s).

In the past 3 days, the most popular thread for everyone has been Optimalization of sine function\'s taylor expansion The most posts were made to Most likely, the issue is that The post with the most views is Optimalization of sine function\'s taylor expansion

Please welcome our newest member redfruit83


For more complete information about compiler optimizations, see our Optimization Notice.