This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
The generation of Huffman codes is used in many applications, among them the DEFLATE compression algorithm. The classical way to compute these codes uses a heap data structure. This approach is fairly efficient, but traditional software implementations contain lots of branches that are data-dependent and thus hard for general-purpose CPU hardware to predict. On modern processors with deep...
See how the new Intel® Advanced Vector Extensions 512CD and the Intel AVX512F subsets (available in the Intel® Xeon Phi processor and in future Intel Xeon processors) lets the compiler automatically generate vector code with no changes to the code.
The Intel® Compiler provides SIMD intrinsics APIs for short vector math library (SVML) and starting with Intel® Advanced Vector Extensions
Eduardo H. M. Cruz, Matheus S. Serpa, Arthur M. Krause, Philippe O. A. Navaux
This article will describe performance considerations for CPU inference using Intel® Optimization for TensorFlow*
This is an exercise in performance optimization on heterogeneous Intel architecture systems based on multi-core processors and manycore (MIC) coprocessors.
Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.
Find out how to use the command-line interface in Intel® Advisor 2017 for a quick, initial analysis of loop performance that gives an overview of the hotspots in your code.