The latest Intel® Xeon® processor E5 v3 family includes a feature called Intel® Advanced Vector Extensions 2 (Intel® AVX2), which can potentially improve application performance related to high performance computing, databases, and video processing. Here we will explain the context, and provide an example of how using Intel® AVX2 improved performance for a commonly known benchmark.
If you are familiar with the Intel® Integrated Performance Primitives (Intel® IPP) library you know that it is widely used to build applications built for the Microsoft* Windows* and Linux* operating systems – today's most prevalent "standard" desktop and server operating system (OS) platforms. What you may not know is that the Intel IPP library can also be used with applications built for some embedded and real-time operating systems (RTOS).
A simple, widely known and studied problem was posed to the class students: matrix multiplication. We made an internal contest, which was to obtain the fastest serial code in which the students learned a lot about compiler optimizations, and even more, the effect of caches in code performance. The objective of the contest was to extrapoloate this exercise into a massive multicore architecture. Students were given kickstart code with a naive C using an OpenMP implemention of the problem, and a series of rules.
The included source code implements a parallel search for intersections of input line segments within a 3-D space, as described in the included problem description text file. Three different methods of solution are initially considered. Complexity analysis and potential parallelization of the first two (brute force search, sweep-line algorithm) are considered and used to eliminate each from further consideration. The third method, Tree Decomposition, is chosen and explained in detail.