icl vs. msvc9 frustration

tim18
Total Points:
68,987
Status Points:
68,987
Black Belt
July 10, 2009 8:40 AM PDT
Rate
 
#3 Reply to #2

/O1 disables vectorization, since ICL 10.0; I mentioned that in case your loops are too short for vectorization to be useful.  In version 9.1, /O1 vectorized, but without extra unrolling, thus giving vector performance on shorter loops than /O2 did.
ICL vectorization typically takes loop iterations in groups of 8, with adustments for 16-byte alignment before and after.  It doesn't often pay off for loops of length less than 16 plus the adjustments, and you will see performance peaking for loop lengths at intervals of 8.
In typical C or C++ code, unless arrays are declared with fixed size local to the function, it's nearly impossible for the compiler to pick up information to change the default assumption that optimization should be for loop length 100.
If you know that no alignment adjustment is required at the beginning of the loop to make all data 16-byte aligned, but it's not visible to the compiler,
#pragma vector aligned
should speed up the loop, but it will break if your assertion is wrong. This pragma also over-rides the compiler's cost/benefit analysis where it decides whether vectorization should gain.
#pragma no vector
would prevent vectorization of a loop.
Vectorization of loops of length 60 to 3000 should more than double the performance.  When combined with OpenMP or similar parallelization, the combined gain is better on the current Core i7 or Xeon 5500 CPUs than on the earlier ones.  Still, it is common to find a loop of length 1000 where either vectorization or parallelization gives good speedup, but there is no use in combining the optimizations, unless the parallelization can take place at a higher level.

Intel Software Network Forums Statistics

8489 users have contributed to 31627 threads and 100761 posts to date.
In the past 24 hours, we have 33 new thread(s) 145 new posts(s), and 197 new user(s).

In the past 3 days, the most popular thread for everyone has been gemm(A,A,A) like possible? The most posts were made to Crash when loading skeleton The post with the most views is Dear Steve, excuse me for a d

Please welcome our newest member chat1983