Intel® Moderncode for Parallel Architectures

Intel® Modern Code Developer Community

Intel launches the new Intel® Modern Code Developer Community - check out the new site.

The Modern Code Developer program applies multi-level parallelism as the framework, that uses all of the parallel performance features available on modern hardware via vectorization, multi-threading, and multi-node optimizations. Explore how to deliver multi-level parallel algorithms that effectively scale forward for today’s and tomorrow’s hardware.

Fully scalable Parallel Varfiler



I have implemented a Fully scalable Parallel Varfiler that uses a lightweight reader-writer mutex called MREW in a lock striping manner, please read about it and download it from here:

The other one that is fully scalable that i have implemented uses the scalable distributed reader-writer mutex in a lock-striping manner, here is the other one:

Saa3d please refrain from posting "likes" to old posts

While I do not wish to restrain you from meaningful comments on new threads, I do kindly ask that you do not post what amounts to a Facebook "like" to old threads. In doing so, this causes a great inconvenience to the contributors to the forum in the sense that we have to take some time to re-read old content in order to discover we did not need to re-read the thread. Scrolling to bottom of thread takes time. Too many of these "likes" conditions the valued responders to NOT read threads where your "like" is the last post.

About my SemaCondvar and SemaMonitor


I feel that i must explain to you how do work my inventions that are my SemaCondvar and SemaMonitor objects, you will find those classes inside the SemaCondvar.pas file inside the zip file, SemaCondvar and SemaMonitor are new and portable synchronization objects , SemaCondvar combines all the charateristics of a semaphore and a condition variable and SemaMonitor combines all charateristics of a semaphore and an eventcount , they only use an event object and a very fast and efficient and portable FIFO fair Lock , so they are fast and they are FIFO fair.

writing data structures for parallel coding.


I initially wrote this question on SO but it seems nobody over there understands what I am asking. Searching on the net lead me here, so I'll ask here instead.


I am trying to wrap my mind around SOA [structure of arrays] in c programming.

I have some simple math functions which I have written what I believe is pretty decent scalar implementations.

here is a simple vector 3 data structure

Vectorization with SIMD-enabled functions works from functions, not from main()


I have run into a situation that I cannot explain. I have a loop with a SIMD-enabled function and I use #pragma simd before it. This loop vectorizes if it is placed in a separate function, but does not vectorize if it is inside main(). I am using Intel C++ compiler Please see code and vectorization reports below. Can anyone explain what is happening and if there is a way to work around this?

This is

Running MKL-BLAS on dual-processor Zeon server

I am running currently using Openblas libraries on a dual-processor Zeon (E5-2680) server with 4*8GB RAM, and the performance is worse than that of a core I7 (generation 3) based PC with 32 GB RAM running Openblas. I am interested in improving BLAS performance by using MKL-BLAS, and would like to know how to install and configure this for performance.

Vectorization - single compilation unit doubles performance(!)

Hi - I'm using Visual Studio C++ 2012 in Windows 8 with Intel Compiler 16.0 to develop some code to implement a digital signal processing algorithm. The main loop iterates over received 'symbol' data (1200 symbols), and lends itself well to vectorization. My laptop has an i5-4300U which supports AVX2 instructions.

Assine o Intel® Moderncode for Parallel Architectures