It finally happened!
Modern high performance computers are built with a combination of resources including:
This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
A 3-part educational series on Optimization Techniques for the Intel® MIC Architecture is provided by Colfax Research. The series focuses on select topics on optimization of applications for Intel’s multi-core and manycore architectures (Intel® Xeon® processors and Intel® Xeon Phi™ processors).
We can hope that companies like Intel® will come along with a faster processor. (And this does tend to happen every year). Or we can improve our compilers to produce better machine code. Or we can analyze our own code and change it to run more optimally. For PHP, we do all three: We partner with the processor architects to improve the way they execute PHP; we look for changes we can make to the...
What three code modernization techniques would I suggest to help a programmer improve the execution performance of her code? With too many specific things to choose from, these are three recommendations for any programmer anywhere and anytime.
Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
I can. And if you read this post you will also be able to write one, too. (Might be a cool party trick or a sucker bet to make a little cash.)
Parallelize loops with Intel® Threading Building Blocks using Intel® C++ Compiler for lambda expressions.
As I mentioned in my previous post about writing a vectorized reduction code from Intel vector intrinsics, that part of the code was just the finishing touch on a loop computing squared difference of complex values.