# Why ICC doesn't always vectorize this

## Why ICC doesn't always vectorize this

This code doesn't vectorize whatever you do:

```void split_icc_0(int count)
{
int z, ndx;

for (z = 0; z < count; z++) {
ndx = inz[z];
ip0[z] = ip[ndx];
ip1[z] = ip[ndx + 1];
}
}
```

But this code does vectorize:

```void split_icc_1(int count)
{
int z, ndx0, ndx1;

#pragma vector always
for (z = 0; z < count; z++) {
ndx0 = inz[z];
ip0[z] = ip[ndx0];
ndx1 = inz[z];
ip1[z] = ip[ndx1 + 1];
}
}
```

And also this variation:

```		ndx0 = inz[z];
ip0[z] = ip[ndx0];
ndx1 = inz[z] + 1;
ip1[z] = ip[ndx1];
```

But this variation again does not vectorize:

```		ndx0 = inz[z];
ip0[z] = ip[ndx0];
ndx1 = ndx0 + 1;
ip1[z] = ip[ndx1];
```

Nor this one:

```		ndx0 = inz[z];
ndx1 = inz[z] + 1;
ip0[z] = ip[ndx0];
ip1[z] = ip[ndx1];
```

Nor this one:

```		ndx0 = inz[z];
ndx1 = ndx0 + 1;
ip0[z] = ip[ndx0];
ip1[z] = ip[ndx1];
```

So, why doesn't compiler treat all of the above as the same damn thing and why doesn't it always vectorize?

If you leave out #pragma vector always it says possible but inefficient. I slapped together quick test case and the speedup is ~10% which is not something to barf at.

3 posts / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.

Igor, my old friend, once again you are a step ahead of us :-)

I spoke with our vectorizer expert and he is aware of similar problems and is working on some improvements for a future release.

It's difficult to answer your question without getting into too many details about the internal workngs of the compiler, but you could say that it doesn't treat all the cases the same because by the time the code reaches the optimization that deals with vectorization, the different cases don't all look the same. Add in the complication of an indirect reference which complicates the cost model, and it ends up not vectorizing, when as you discovered, it probably should.

Thanks for pointing it out, I'll try to post here when we have an improvement for this case.

Dale

Igor, my old friend, once again you are a step ahead of us :-)

Well Dale, could that be anything else but a clear sign for Intel that I could be a valuable addition to your team? ;-)

It's difficult to answer your question without getting into too many details about the internal workngs of the compiler...

I suppose I might even understand parts of it if you delve into details because I read Compilers: Principles, Techniques, and Tools recently out of pure curiosity.

In the above case I guess that the compiler "sees" some sort of false dependency between ip[ndx] and ip[ndx + 1] even though it should be completely obvious that they do not have any, because they are both read-only references. I am really surprised that the compiler is not able to figure that one out.