poor performance with quadruple-precision on ia64.

poor performance with quadruple-precision on ia64.


I compare the performance of the 128-bit floating-point arithmetic between ia64 and ev68. The application use intensively basic arithmetic (+,-,/,*) and few times the standard math function (sin, cos, sqrt). On this two hardware, the arithmetic on real 128-bits is done by software emulation.

The execution time is very similar between itanium processor ia64/1.5Ghz (BULL novascale 4040) and alpha processor ev68/833Mhz (HP alphaserver DS20E) : 1hour00minutes against 1hour 06minutes.

On itanium hardware, I use intel c++ 8.1 (Build 20050203 Package ID: l_cc_pc_8.1.028) with the type _Quad for real numbers. For standard math functions, I call fortran functions which only call the math function sin, cos, sqrt. The fortran compiler is intel fortran 8.0 ( Build 20040416 Package ID: l_fc_pc_8.0.046).

On alpha hardware, I use compaq C compiler (V6.5-207) with "long double" type for real numbers which are 128-bits.

I generate a profile execution on ia64. I saw that most time is spend in the quadruple-precision arithmetic (libirc.a : ia64_mulq.o, ia64_divq.o).

% cumulative self self total
time seconds seconds calls ms/call ms/call name
14.87 8.20 8.20 lbl12
9.38 13.37 5.17 __eval_neg_poly
8.53 18.07 4.70 lbl44
4.96 20.81 2.73 lbl14
4.50 23.29 2.48 __quad_common
3.83 25.40 2.11 lbl36
3.64 27.41 2.01 mulq
2.68 28.88 1.48 __dpml_multiply__
2.44 30.23 1.34 ea_gt_eb
2.20 31.44 1.21 __mcount

I think that performance on real numbers 128bits could be improved on ia64 hardware. Do you perform benchmark ?
It will be very appreciated if speed is increased on quadruple-precision arithmetic.


1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.