I have a code thtat uses gcc vector extensions and achieves 80% of the nominal peak performance with avx vectors.
The gcc vector extensions allow me to write explictely vectorize code, with only very few intrisics (sum, products and the like are all simply written a+b, a*b, etc...)
The performance is awsome, but when compiled with icc, it falls back to scalar data-types, and it turns out that the performance is horrible, nearly four times slower (reaching 20% of the nominal peak performance).
Clearly despite trying hard, icc is not able to vectorize my inner loops correctly.
It would not bother me that much, because gcc is available almost everywhere, but currently I'm trying to run this on the mic (xeon phi), but I have to go through icc, which leads to poor vectorization, and poor performance. (20% of the peak of the mic will be less than 80% of the peak of a 16-core avx machine...)
Please, support gcc vector extensions in icc !