Vectorize max reduction

Vectorize max reduction

Hi,

I have a loop in C++ that loos like this:

float max = 0;
for (int i = 0; i < n; i++) {
  float localMax;
  // Vectorizable code setting localMax
  if (localMax > max)
    max = localMax
}

Vectorizing the first part of the loop is easy. The compiler also recognizes that localMax is only alive in one loop iteration. However, the max reduction at the end stops the compiler from vectorizing anything. Is there a simple way (e.g. pragmas) to tell the compiler that this is a simple max reduction?

I only found some manuals on how to do this with a sum reduction.

Best regards
Sebastian

7 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

If you have set an SSE2 (the icpc defaujlt) or newer architecture, not made the errors you show here, and (for icpc) set -ansi-alias, this really should vectorize, (if you haven't created a namespace clash) . So you may have omitted important information, besides omitting the array of which you want the max (if positive) MSVC doesn't vectorize reductions, even in VS2012.
Intel C++ 2013 supports the reduction best by e.g.
your_max = max(0.f, __sec_reduce_max( your_array[0: n-1]) ;

Needless to way, it's always good to show exactly what you tried.

This is the compiler version I'm currently working with:

icpc --version

icpc (ICC) 12.1.5 20120612

Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

I've set the compiler flag -fstrict-aliasing, but adding -ansi-alias as well does not help.

The part I commented out is rather complex (a Rieman solver with about 200 lines of code), but without any dependencies. This is also what the compiler tells me when compiling with "-vec-report3":

file.cpp(121): (col. 5) remark: loop was not vectorized: existence of vector dependence.

file.cpp(140): (col. 7) remark: vector dependence: assumed ANTI dependence between max line 140 and max line 140.

file.cpp(140): (col. 7) remark: vector dependence: assumed FLOW dependence between max line 140 and max line 140.

file.cpp(140): (col. 7) remark: vector dependence: assumed FLOW dependence between max line 140 and max line 140.

file.cpp(140): (col. 7) remark: vector dependence: assumed ANTI dependence between max line 140 and max line 140.

Using __sec_reduce_max does not work because I don't have an array of all localMax. The values are computed on the fly an never stored longer than one iteration.

In fact, there are cases where defining and filling a local array, so as to write the vectorizable reduction separately, could be the key to performance optimization. In some cases, the compiler will end up fusing the loops, fulfilling your original intention to accomplish the job in a single loop. There may be improvements in vectorized reduction in recent compiler releases.
If the compiler were to implement
#pragma simd reduction(max: ...) private localMax
that would open up additional opportunities, analogous to those currently implemented for sum reduction.
Communications I have received are ambiguous about whether this is planned. If you are interested, you might submit a feature request on premier.intel.com so as to attempt to get an answer.
If you don't need to test limits by using identifiers which clash with STL as well as common Microsoft-style non-portability issues, I would still suggest you don't use max as an identifier.

>>The compiler also recognizes that localMax is only alive in one loop iteration.

Did you try to declare 'localMax' as static, like?
...
static float localMax;
...
Your original declaration without the static keyword inside some scope doesn't look good.

The primary point for definition of the variable is to give it local scope, which the code originally quoted does OK. The compiler diagnostics refer to the variable max, and it would be good first to choose a name which doesn't have C++ namespace conflicts. Beyond that, it might be useful to try one of the icpc extensions, but currently max/min aren't supported under #pragma reduction. So that leaves the option of making a local array of localMax and making a separate reduction loop, as with a Cilk reducer.
By the way, if you would like attention to this question by a compiler expert, a full reproducer should be submitted via your premier.intel.com account, or at least on the companion C++ forum, where it could be set as private so that only Intel personnel can see it, if that is your requirement. I have submitted premier issues on max reduction, but documented customer requirements might help to raise the priority.

Thx for the hints. I solved the problem like this


float max[VECTOR_LENGT];

#pragma simd vectorlength(VECTOR_LENGTH)

for (int i = 0; i < n; i++) {

  float localMax;

  // Vectorizable code setting localMax

  std::max(max[i%VECTOR_LENGHT] = localMax);

}

VECTOR_LENGTH should be something like 4 or 8 (depending on the hardware).
This code does most of the reduction work inside the loop. After the loop I only reduce the (small) "max" array.

Faça login para deixar um comentário.