Intel® Moderncode for Parallel Architectures

Intel® Modern Code Developer Community

Intel launches the new Intel® Modern Code Developer Community - check out the new site.

The Modern Code Developer program applies multi-level parallelism as the framework, that uses all of the parallel performance features available on modern hardware via vectorization, multi-threading, and multi-node optimizations. Explore how to deliver multi-level parallel algorithms that effectively scale forward for today’s and tomorrow’s hardware.

Suggestion for Fortran SIMD optimization

On one of the other threads on this forum a user is experiencing an optimization issue relating to vectorization. The jist of the situation is the algorithm uses a temporary array that enables the compiler to determine that SIMD instructions can be used. No issue here, however, the values of the temporary array are not used outside the scope of a small section of code. The user could create an external function or subroutine that is attributed as a vector function/subroutine, and if the routine is inlined, it would likely attain the desired effect.

Free Parallel Programming Training

 Intel is offering FREE online training, in collaboration with Colfax.

The upcoming training will start September 9th and it includes free 3-week remote access to a Intel® Xeon® and Intel® Xeon Phi™ server.

Another series of training will start October 13th.

For more details and registration, please check: http://colfaxresearch.com/how-series/

You can also check the training page for more training options:

Multidimensional Transpose -- Prefetching

Hello,

I have been investigating the performance of two multi-dimensional transpositions. Among
other things I have noticed that some transpositions take more time then other, despite
the fact that they move the same amount of data.

I ended up writing a small code-generator which generates vectorized (with AVX intrinsics)
code for a given transposition and all its possible loop orders (I made sure that icc is
not reordering the loops).

Iscriversi a Intel® Moderncode for Parallel Architectures