I've compared performance of f77 test program
which:
a) calls BLAS1 (daxpy) and BLAS2
(dgemm) subroutines
b) do (theoretically) the same work, but with
explicit coding of f77 loops instead of library calls.
(I used for compilation ifc -O3 -tpp6 on my old
Celeron 433 w/RH 6.2).
I didn't find any significant performance
increase at MKL (and Atlas also) library calls
in comparison w/direct compilation of f77 loops:-(
(it was for measured for some PIII based systems).
I understand that "BLAS3 calls" works much more
better and will give speed-up at using of DGEMM
(for example), but it's BLAS3... Is there some
available data about speed-up (in comparison
w/loops coding and compilers optimization)
at using of BLAS1 or dgemv calls for modern Intel
P4 (w/SSE2) CPUs ?
Or may be like data for IA-64 ?
Mikhail Kuzminsky
Zelinsky Inst. of Organic Chemistry
Moscow
~
BLAS Lev.1/2:MKL/Atlas/... calls vs compilation
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.


