Vectorization Series, Part 3 - What are the Benefits?

This will be the final post in my planned short vectorization series. Although I reserve the right to post more on vectorization in the future! In the first post on this topic, I explained that vectorization was parallelism inside a single CPU core, achieved by applying a CPU instruction to multiple data elements at once. In the second post, we discussed what types of applications could use vectorization - scientific, engineering, financial, media, and graphical. For this blog I want to emphasize why you would want to vectorize. What are the benefits?

There are two major reasons to vectorize your code now. The first is performance on current processors. Vectorizing a loop can deliver a significant performance boost. For example, if you had a loop performing a mathematical operation across array elements (such as the example in the first post), and it was able to be converted by the compiler to a vectorized loop working with 8 elements at a time, then you would achieve roughly an 8x speedup for that loop. There are many factors affecting the performance improvement you would see from vectorization - the number of data elements that can be processed at one time (depending on data size and register size), whether the elements can be packed into a register efficiently (meaning they are from contiguous memory), and the the trip count of the loop are all examples. You would also need to consider how "hot" the loop to be vectorized is- that is, whether the loop takes a significant percentage of your application time. But assuming the right conditions and possibly a bit of tuning, vectorization can deliver a sizable speedup on today's hardware. Our current generation of Core processors and Xeons both support vectorization.

The second reason is scalability. While vectorization can be an important contributor to performance now, it is even more critical for future processor architectures, including the upcoming Intel® Many Integrated Core (MIC) architecture. If you vectorize your code now using one of our Intel "scales forward" methods, such as using the Intel Compiler and Intel® Cilk™ Plus, you are prepared for these future architectures as well. Here are the vectorization methods that we support that will "scale forward": Using the Intel Compiler auto-vectorizer, using the Intel Compiler with Cilk Plus, using the Intel Compiler with vectorization-friendly Fortran constructs, or using already vectorized functions from libraries like Intel® Integrated Performance Primitives (IPP) or Intel® Math Kernel Library (MKL). With one of these methods, you can be assured that code you write that vectorizes today will be vectorizable by the Intel Compiler on future Intel architectures as well, without you needing to make changes. If you are not using the Intel Compiler, you still have options. Parts of Cilk Plus are being implemented in GCC, and other compilers have auto-vectorizers as well. You can also consider using the Intel Compiler on just a portion of your code.

If you want to take a deeper look at vectorization, you can start with our recorded overview webinar here, or you can look through our Vectorization Toolkit, which gives a 6-step process for vectorizing your application. If you have additional questions, let me know in the comments!

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.