This will be the final post in my planned short vectorization series. Although I reserve the right to post more on vectorization in the future!
In my last blog, I introduced the concept of vectorization, which is parallelism across data elements in a regi
As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications.
Proposal: rename for in C and C++ to serial_for No more incumbent "for." (it was voted off the island)
It is time to make Parallelism a full First Class Citizen in C and C++. Hardware is once again ahead of software, and we need to close the gap so that application development is better able to uti
Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough core
The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.
This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12:
A palestra "Como domar uma fera de 1 TFlop que cabe na palma da sua mão" foi apresentada em 3/7/13, no FISL14, por Luciano Palma - Community Manager da Intel para Servidores e Computação de Alto De
Big Data requires processing huge amounts of data. Intel Advanced Vector Extensions 2 (aka AVX2) promoted most Intel AVX 128-bits integer SIMD instruction sets to 256-bits.