Not getting log() or pow() to vectorize

Not getting log() or pow() to vectorize

Michael Hlavinka's picture

Consider the following code:

double* a;
size_t n;
a[0:n] = log(a[0:n]);

 The compiler reports that _log cannot be vectorized.  Reports the same for the pow() function.  However, changing to functions such as exp(), sin(), etc. allow vectorization.  I thought that log() and pow() were vectorizable functions as in http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-E98D4E0A-9730-425D-A898-3BB4AB9B2330.htm.  Does anyone know the cause?

Thanks.

15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Sergey Kostrov's picture

I didn't verify your test case yet but could you try a workaround like:

- Make a for-loop ( from 0 to n-1 ) with unrolling 4-in-1
- Use 4 temporary variables to calculate 4 log-values
- Store 4 calculated log-values in an output array
- Repeat until all output array is filled out

Please use -vec-report:3 option to see why there is no vectorization of your processing with log function.

Michael Hlavinka's picture

I can try those workarounds, but I don't know what is different about log() and pow().  This is the output from level 6:

vectorization support: call to function _log cannot be vectorized.

Same occurs with pow(), with the statement referencing the _pow function.

Tim Prince's picture

Your example would require #include <math.h> and possibly a change from size_t to int.

Michael Hlavinka's picture

Actually I have #include <mathimf.h> in the file.  Isn't that what is required?  Why int?  It's the same size on 32 bit builds and I believe technically wrong for 64 bit builds.

Again, these would not explain why other math functions work.

Sergey Kostrov's picture

>>...Again, these would not explain why other math functions work.

Please post a complete reproducer with all include files and a list of command options you've used.

Sergey Kostrov's picture

>>... The compiler reports that _log cannot be vectorized.

Is that a macro or a C-like function? Use debugger to verify.

Michael Hlavinka's picture

I figured out what is causing it not to be vectorized.  The use of /fp:precise.  However, I don't understand why that switch will affect only certain functions while others can be vectorized.

Sergey Kostrov's picture

>>...However, I don't understand why that switch will affect only certain functions while others can be vectorized...

It could be by design of the compiler. Please review the following topics:

Programming with Auto-parallelization
http://software.intel.com/sites/products/documentation/doclib/iss/2013/c...

Programming Guidelines for Vectorization
http://software.intel.com/sites/products/documentation/doclib/iss/2013/c...

Vectorization and Loops
http://software.intel.com/sites/products/documentation/doclib/iss/2013/c...

and there are some restrictions on vectorization and parallelization of codes.

Tim Prince's picture

Quote:

Michael Hlavinka wrote:

I figured out what is causing it not to be vectorized.  The use of /fp:precise.  However, I don't understand why that switch will affect only certain functions while others can be vectorized.

So you can see why everyone has been asking for a reproducer.

In recent compilers, increasing numbers of svml function invocations are disabled by /fp:precise.  That some of them slipped by in the past may have been an oversight.  svml functions aren't designed to permit capturing exceptions on individual operands.  If you wish to over-rule this effect on math function vectorization, you may set /Qfast-transcendentals. 

In principle, you may also need to consider the /Qimf- options.  The svml default "guarantees" accuracy only within 4 Ulps (although it is usually better), which is not consistent with expectation for /fp:precise.  exp() and pow() functions (and their relatives) are notoriously difficult to vectorize while maintaining full accuracy for corner cases.  /Qimf-... allows you to request higher precision/slower or lower/faster functions if they exist. 

I noticed a case this week where disabling svml vectorization by -fp-model source doesn't affect vec-report.  Apparently, the decision not to report the difference between full vectorization with and partial vectorization without /Qcomplex-limited-range has been carried over to  /Qfast-transcendentals. 

I had to revise my recommendation for options to observe parentheses while allowing maximum optimization to include fast-transcendentals:

/fp:source /Qftz /Qfast-transcendentals [/Qprec-div- /Qprec-sqrt-]

This still disables vectorization of sum and indexed max/min reductions.

Michael Hlavinka's picture

Thanks for the information everyone.  Do you still want a repro case as all I did was extract this from a much larger program?  My repro case really doesn't do anything more than here.

Tim, do you know the accuracy of the VC++ library in /fp:precise and /fp:fast mode?  Since part of my application is compiled with it, I suspect I may need similar accuracies for the various modules.

Tim Prince's picture

Microsoft /fp:fast vs. /fp:precise don't affect their math libraries, as far as I know.   Most of them, particularly if based on x87 code, should be what Intel calls "high" accuracy.  I don't believe there are any vector math functions in the Microsoft libraries.  If it's critical, you may want /Qimf-precision:high versions for Intel vector libraries (high accuracy is the default for the scalar functions). Although ICL /fp:source is roughly equivalent to Microsoft /fp:fast, the more aggressive ICL default /fp:fast affects math function accuracy only when it promotes vectorization and imf-precision is set to medium (default) or low (where double "low" is barely better than float high precision).

By the way, /Qimf-precision also affects vectorized divide and sqrt, but /Qprec-div /Qprec-sqrt will force those independently to full accuracy.

Sergey Kostrov's picture

>>...do you know the accuracy of the VC++ library in /fp:precise and /fp:fast mode? Since part of my application is compiled with it,
>>I suspect I may need similar accuracies for the various modules...

It is very easy to verify as follows ( or in a similar way ):
...
// Sub-Test 5.1 - Calculates Product of 0.1 * 0.1 - RTfloat
//
{
CrtPrintf( RTU("Sub-Test 5.1 - RTfloat\n") );

RTfloat fVal = 0.1f;
RTfloat fRes = 0.0f;

uiControlWordx87 = CrtControl87( _RTFPU_PC_24, _RTFPU_MCW_PC );
fRes = fVal * fVal;
CrtPrintf( RTU("24-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes );

uiControlWordx87 = CrtControl87( _RTFPU_PC_53, _RTFPU_MCW_PC );
fRes = fVal * fVal;
CrtPrintf( RTU("53-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes );

uiControlWordx87 = CrtControl87( _RTFPU_PC_64, _RTFPU_MCW_PC );
fRes = fVal * fVal;
CrtPrintf( RTU("64-bit : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes );

uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC );
fRes = fVal * fVal;
CrtPrintf( RTU("Default : [ %1.1f * %1.1f = %.17f ]\n"), fVal, fVal, fRes );
}

// Sub-Test 5.2 - Calculates Product of 0.1 * 0.1 - RTdouble
//
{
CrtPrintf( RTU("Sub-Test 5.2 - RTdouble\n") );

RTdouble dVal = 0.1L;
RTdouble dRes = 0.0L;

uiControlWordx87 = CrtControl87( _RTFPU_PC_24, _RTFPU_MCW_PC );
dRes = dVal * dVal;
CrtPrintf( RTU("24-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes );

uiControlWordx87 = CrtControl87( _RTFPU_PC_53, _RTFPU_MCW_PC );
dRes = dVal * dVal;
CrtPrintf( RTU("53-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes );

uiControlWordx87 = CrtControl87( _RTFPU_PC_64, _RTFPU_MCW_PC );
dRes = dVal * dVal;
CrtPrintf( RTU("64-bit : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes );

uiControlWordx87 = CrtControl87( _RTFPU_CW_DEFAULT, _RTFPU_MCW_PC );
dRes = dVal * dVal;
CrtPrintf( RTU("Default : [ %1.1f * %1.1f = %.17f ]\n"), dVal, dVal, dRes );
}
...

You will need to comment all calls to CrtControl87 CRT function and FPU settings need to be set at a compilation time using /fp:[ mode ] option.

Notes:

CrtControl87 = _control87
CrtPrintf = _tprintf
RTU = _T
RTfloat = float
RTdouble = double
etc

Sergey Kostrov's picture

Here is a a collection of IDZ threads related to different issues with floating point data types, FPU, etc on Intel CPUs:

Forum topic: Support of 'long double' floating point data type on Intel CPUs ( A collection of threads )
Web-link: software.intel.com/en-us/node/375459

Forum topic: Mathimf and Windows
Web-link: software.intel.com/en-us/forums/topic/357759

Forum topic: Support of Extended or Quad IEEE FP formats
Web-link: software.intel.com/en-us/forums/topic/358472

Forum topic: Using 'long double' in Parallel Studio?
Web-link: software.intel.com/en-us/forums/topic/266290

Forum topic: Why function printf does not support long double?
Web-link: software.intel.com/en-us/forums/topic/372720

Forum topic: Mixing of Floating-Point Types ( MFPT ) when performing calculations. Does it improve accuracy?
Web-link: software.intel.com/en-us/forums/topic/361134

Forum topic: Test results for CRT-function 'sqrt' for different Floating Point Models
Web-link: software.intel.com/en-us/forums/topic/368241

Michael Hlavinka's picture

Sergey, thanks for the info.  I'll look into it.

Login to leave a comment.