exponent **2.d0 or **2 best performance and precision

exponent **2.d0 or **2 best performance and precision

Hello!

It's Friday and time for a little question which I wondered myself a while now. mecej4 wrote in another thread:

Expressions that involve the squaring of real variables are more efficiently handled by keeping the exponent (=2) as an integer and evaluating the power by multiplication. Promoting integer exponents to real would require the use of the mathematical equivalences

xy = exp(y*logex) = 2y*log2x

Evaluation of the transcendental functions may be much slower, and problems could arise if the variable x were not positive.

To test the impact on performance and precision I wrote a litle program:

! ###############################################################
! ### This is a little program to test the exponent precision ###
! ###############################################################  
program exponent_precision
  
  use ifport
  implicit none
  
  integer             :: i
  integer             :: i_exponent
  integer, parameter  :: int_count = 100000000
  integer, parameter  :: dp = kind(1.d0)
  real(dp)            :: result_iex(int_count), result_rex(int_count)
  real(dp)            :: r_exponent
  real(dp)            :: diff
  real(dp)            :: t1, t2, t3
  
! ---------------------------------------------------------------------
  
  i_exponent = 4
  r_exponent = dble(i_exponent)
  
  result_iex(1) = 1000.d0**i_exponent
  
  result_rex(1) = 1000.d0**r_exponent
  
  ! Difference in per mille
  diff = (result_rex(1) - result_iex(1))/result_rex(1) * 1000.d0
  write(*,*) 'Difference between results: ',diff
  
  ! calculation time difference
  t1 = dclock()
  do i = 1, int_count
    result_iex(i) = (1000.1d0+dble(i))**i_exponent
  end do
  t2 = dclock()
  
  do i = 1, int_count
    result_rex(i) = (1000.1d0+dble(i))**r_exponent
  end do
  t3 = dclock()
  
  write(*,*) 'Elapsed time integer exponent in sec: ',t2-t1
  write(*,*) 'Elapsed time real    exponent in sec: ',t3-t2
  write(*,*) 'Factor: ', (t3-t2)/(t2-t1)
  read(*,*)
  
! ---------------------------------------------------------------------  
end program exponent_precision

maybe not elegant, but the results are not as I would expect:

The result with default debug (WIN32) project settings (ifort 12.1.6.369) is:

Difference between results:   0.000000000000000E+000
Elapsed time integer exponent in sec:    2.45599999999831
Elapsed time real    exponent in sec:    6.14600000000064
Factor:    2.50244299674465

Now the result with default release (WIN32) project settings (ifort 12.1.6.369) is:

Difference between results:   0.000000000000000E+000
Elapsed time integer exponent in sec:   0.000000000000000E+000
Elapsed time real    exponent in sec:    3.10200000000623
Factor:  Infinity

In both cases the kind of the exponent (double precision, integer) has no impact on the precision of the results. The performance difference in debug is that the calcultion with an integer exponent is about 2.5 times faster (in this special case on my Xeon X5 machine!!). Not bad to know. And now comes what I have not expected. With activated optimization (Maximize Speed) the calculation time with integer exponent is below a measurable value for dclock and so the factor becomes infinity. The optimizer made a good job! I don't know the impact in a real program but I for myself will prefor the integer exponent for future.

Kind regrads,

Johannes

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

There are benchmarks such as Polyhedron which are written gratuitously with integer valued real exponents, where good results require that the compiler make the non-standard optimization for simple cases of treating them as integers.

In the more difficult cases, the integer exponents should produce better accuracy, if the compiler doesn't recognize the integer-valued reals and make the substitution.   As the case of negative numbers raised to a real power is not defined by Fortran, some compilers choose to let you take your chances on whether raising to an even integral-valued power gives you a "correct" numerical result or NaN.  Fortran doesn't even set requirements for the case of negative number raised to a zero power, and there are inconsistencies among various otherwise highly regarded implementations on the treatment of NaN raised to an integer or real zero power.

The compiler will recognize integer-valued real constant exponents and treat them as integers. Your test program uses variables, so that can't be done. But it does definitely use different sequences for a real exponent and an integer exponent, with the latter being faster.

Steve - Intel Developer Support

Dear Steve,

thanks for your hint on the recognation of real as a whole-number by the compiler. I tested it with the example program above and replaced line 39 be

result_rex(i) = (1000.1d0+dble(i))**4.d0
. And again the results are not what I thought:

in debug mode

Difference between results:   0.000000000000000E+000
Elapsed time integer exponent in sec:    2.43299999999726
Elapsed time real    exponent in sec:    2.43400000000111
Factor:    1.00041101520914

in release mode

Difference between results:   0.000000000000000E+000
Elapsed time integer exponent in sec:   0.000000000000000E+000
Elapsed time real    exponent in sec:   0.243999999998778
Factor:  Infinity

With optimization in release mode the real exponent is still slower than the integer counterpart. Not much but measurable. In debug mode both calculation times are nearly equal, varied a little each new run. Why is there a difference after optimization, if the compiler recognize the exponent as a whole-number? Assigning r_exponent as a parameter delivers the same results as using 4.d0 directly.

Again writing the exponent as an integer seems to be the best solution.

Kind regards,

Johannes

Dear TimP,

I tested a little bit with a negative base and real exponents ((-1000.d0)**4.1d0) and the intel compiler delivers NaN with a double precision real as result variable. As mathematically the result is a complex number I set the result as a complex and extended the input to ((-1000.d0,0.d0)**4.1d0) . The result is as expected. With a whole-number exponent the double precision real result variable delivers the correct result, too.

I don't tested how gcc interprets this because I don't have any negative bases and real exponents in combinition in my programs so far...

Kind regards,

Johannes

I looked at the generated code and the optimizer removed the whole loop assigning to result_iex because you never used the result. This is a classic problem in constructing benchmarks. When I fix that, and replace the integer exponent with the constant 4, I get:

 Difference between results:   0.000000000000000E+000
 Elapsed time integer exponent in sec:   0.173999999999069
 Elapsed time real    exponent in sec:   0.172000000005937
 Factor:   0.988505747165849

So this proves my point - using an integer-valued real constant exponent is just as fast as an integer exponent. If the exponent is a variable, then this doesn't apply. A PARAMETER constant is the same as a literal.

Steve - Intel Developer Support

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui