SSE
IPP Dispatcher Control Functions - ipp*Init*() functions
Intel® IPP 7.0 Release Notes
Understanding CPU Dispatching in the Intel® IPP Library
Classroom challenge: Matrix Multiplication, Performance and Scalability in OpenMP
A simple, widely known and studied problem was posed to the class students: matrix multiplication. We made an internal contest, which was to obtain the fastest serial code in which the students learned a lot about compiler optimizations, and even more, the effect of caches in code performance. The objective of the contest was to extrapoloate this exercise into a massive multicore architecture. Students were given kickstart code with a naive C using an OpenMP implemention of the problem, and a series of rules.
Parallel algorithm for finding intersections of line segments in 3-D (Dmitry Vyukov)
The included source code implements a parallel search for intersections of input line segments within a 3-D space, as described in the included problem description text file. Three different methods of solution are initially considered. Complexity analysis and potential parallelization of the first two (brute force search, sweep-line algorithm) are considered and used to eliminate each from further consideration. The third method, Tree Decomposition, is chosen and explained in detail.
SIMD tuning with ASM pt. 4 - Vectorization & ICC
float x[PTS],
float y[PTS];
for (int i = 0; i < PTS; i++) { // line 13 in orig source
x[i] += y[i]; // line 14 in orig source
}
SIMD tuning with ASM pt. 3 - PS good, SS bad
.LBB52:
.loc 1 14 0
movss (%rbp,%rax,4), %xmm0
addss (%rdx,%rax,4), %xmm0
movss %xmm0, (%rbp,%rax,4)
addq $1, %rax
