博客

1024cores: All about lock-free, concurrency, multicore and parallelism

It finally happened!

作者: Dmitry Vyukov 最后更新时间: 2019/02/15 - 13:39
Article

Fast Gathering-based SpMxV for Linear Feature Extraction

This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
作者: 最后更新时间: 2018/12/12 - 18:00
博客

PGO: Let It Go (PHP)

We can hope that companies like Intel® will come along with a faster processor. (And this does tend to happen every year). Or we can improve our compilers to produce better machine code. Or we can analyze our own code and change it to run more optimally. For PHP, we do all three: We partner with the processor architects to improve the way they execute PHP; we look for changes we can make to the...
作者: David S. (Blackbelt) 最后更新时间: 2019/07/03 - 20:08
博客

Three Pieces of Advice for Code Modernization Success

What three code modernization techniques would I suggest to help a programmer improve the execution performance of her code? With too many specific things to choose from, these are three recommendations for any programmer anywhere and anytime.
作者: Clay B. (Blackbelt) 最后更新时间: 2018/12/12 - 18:08
博客

Can You Write a Vectorized Reduction Operation?

I can. And if you read this post you will also be able to write one, too. (Might be a cool party trick or a sucker bet to make a little cash.)
作者: Clay B. (Blackbelt) 最后更新时间: 2018/12/12 - 18:08
博客

Reduce Boilerplate Code in Parallelized Loops with C++11 Lambda Expressions

Parallelize loops with Intel® Threading Building Blocks using Intel® C++ Compiler for lambda expressions.
作者: gaston-hillar (Blackbelt) 最后更新时间: 2018/12/12 - 18:00
博客

Vectorized Reduction 2: Let the Compiler do that Voodoo that it do so well

As I mentioned in my previous post about writing a vectorized reduction code from Intel vector intrinsics, that part of the code was just the finishing touch on a loop computing squared difference of complex values.
作者: Clay B. (Blackbelt) 最后更新时间: 2018/12/12 - 18:08
博客

Debug Intel® Transactional Synchronization Extensions

If printf or fprintf functions cause transaction aborts, use Intel® Processor Trace as a work-around.
作者: Roman Dementiev (Intel) 最后更新时间: 2019/07/04 - 17:00
博客

Rainbows, Unicorns and Performance Portability

An old Jewish fable tells about a poor man asking for advice from the rabbi. The family is large, the house is small, and it feels very crowded.

作者: 最后更新时间: 2018/12/12 - 18:08
博客

Optimization of Classical Molecular Dynamics

CoMD is an open-source classical molecular dynamics code. One of its prime application areas is materials modeling.

作者: Andrey Vladimirov 最后更新时间: 2018/12/12 - 18:00