By Ingo Wald
Handling user-defined function-calls inside vector-loopsIf you want to vectorize a loop that has a user-defined function call, (possibly re-factor the code and) make the function-call a vector-elemental function.
Intel recently released the 4th Generation Intel® Core™ processors, which have Intel® Transactional Synchronization Extensions (Intel® TSX) enabled. Intel TSX can improve the performance of applications that use lock-based synchronization to protect data structure updates. This feature allows multiple non-conflicting lock-protected changes to data to occur in parallel.
Your Intel® Xeon Phi™ coprocessor starter kit has all the tools needed to go parallel – now what? This TOP10 list provides suggestions for what to do next . Once you've learned more about parallel programming techniques and practiced some basic exercises, you'll be best prepared to optimize your own application!
Monte Carlo uses a statistical computing method for solving complex scientific computing problems. It innovatively uses random numbers to simulate the uncertainty of inputs to a problem and processes the repeated sampling of the parameter to obtain a deterministic result and solve problems that would otherwise be impossible. This method was originally pioneered by nuclear physicists involved in the Manhattan Project in late 1940s. It is named after the biggest casino in the principality of Monaco.
To improve the performance of applications and kernels we are constantly on the search for novel Best Known Methods or BKMs, but as our searches grow more esoteric, it is important to keep in mind the basics and how many performance improvements rely on them. This article will describe some common BKMs for improving parallel performance and show their application over this spectrum of processor architectures. The advice collected here should help you speed up your code, whether running on an Intel® Xeon Phi™ coprocessor or an Intel Xeon process