Are VML routines 'pure'?

Are VML routines 'pure'?

I'm in the process of taking an application with a lot of long (~200 element) vector operations performed in FORALL loops, and replacing them with VML calls. As the vectors are so long I would expect to get a performance enhancement from doing this, though I haven't got as far as testing this yet.

I notice, however, that the compiler won't let me have VML calls inside FORALL loops. If I replace the FORALL loops with more prosaic DO loops, then everything works fine. Is this because the VML procedures aren't 'pure', or is there some other reason? The 'mkl_vml.fi' interface doesn't declare the functions as pure, but can I safely write my own interface declaring them thus?

I'm concerned about this because some of my functions need to be passed as arguments and I believe that they need to be pure themselves for this reason. I can forsee problems if they contain calls to VML subroutines that aren't pure.

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Optimal performance of VML functions can be observed on multi-core/multi-processor systems for vector lengths greater than 200. Please, have a look at the performance graphs available at http://www3.intel.com/cd/software/products/asmo-na/eng/266863.htm.

VML interface file mkl_vml.fi is the same for both Fortrans 90 and 95, this is one of the reasons we avoid using 'pure' attribute for Vector Math functions. Also, VML functions admit in-place calls, and use of the attribute should be additionally
tested in your VML usage model.

Thank you; it sounds as though I'm right in the sweet-spot to get the best value out of these. Can I safely assume that the benefit gained from the VML routines will offset the lost parallellizability in going from FORALL structures to DO loops?

What lost parallelizability? forall might be a help in diagnosing programming practices which defeat optimization, but it introduces obstacles of its own. Read about it on comp.lang.fortran archives, for example.

I don't think there's even much demand for ifort to optimize forall as well as might be done, although it generally does as well as other compilers.

Fair enough. Metcalf, Reid & Cohen state (section 6.9):

"[The standard DO construct] represents a potentially severe impediment to optimization on a parallel processor so, for this purpose, Fortran has the forall statement"

...which is why I have made a lot of use ofit in the non-VML version of my application, but you're quite right that comp.lang.fortran documents plenty of cases where forall causes more problems than it solves.

It looks like the way ahead is to use VML calls in DO loops and, if I find I really need to crank out an extra few percent in performance, use compiler directives like DISTRIBUTE POINT,IVDEP and LOOP COUNT to help things along.

Thank you.

Leave a Comment

Please sign in to add a comment. Not a member? Join today