Vectorization and Threading are Crucial to Performance
On modern processors, it is becoming crucial to both vectorize (use AVX* or SIMD* instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be up to 187X faster than unthreaded/unvectorized code―and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Threaded plus vectorized can be much faster than either one alone. The gap is growing with each new hardware generation. Details.
Intel® Advisor gives you data to forecast the performance gain before you invest significant effort in implementation. Implement only the options that have a high return on investment.
Data-Driven Vectorization Optimization and Threading Design
You need good data to make good design decisions. What loops should be threaded and vectorized first? Is the performance gain worth the effort? Will the threading performance scale on larger core counts? Does this loop have a dependency that prevents vectorization? What are the trip counts and memory access patterns? Have I vectorized efficiently with the latest AVX2? Or am I using older SIMD instructions?
Vectorization Optimization: Guidance to Speed up your Application
Quickly find what’s blocking vectorization in the locations that matter the most. Intel Advisor sorts your loops by potential gain, makes compiler reports easier to read by showing messages on your source, and gives you tips for effective vectorization. It also provides key data like trip counts, data dependencies, and memory access patterns to let you vectorize safely and efficiently.