The specific optimization and general support for the latest Intel® AVX2 instructions have been added in the Intel MKL v11.0. This article lists the specific functions that are optimized for Intel AVX2.
I was hoping to write a brief two part overview of how to configure the various power settings for the Intel® Xeon Phi™ coprocessor.
Learn more about an in-depth analysis of code modernization performance conducted by optimizing original CPU code and re-running tests on the latest GPU/CPU hardware.
This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
The Colfax Hands On Workshop (HOW) training series is an integral part of the Intel Modern Code Developer program which supports developers in leveraging application performance in code through a systematic optimization methodology. Attendees of these workshops may receive a certificate of completion. The certificate states the Fundamental level of accomplishment in the Parallel Programming Track...
This article demonstrates techniques that software developers can use to identify and fix NUMA-related performance issues in their applications.
Get a background on vectorization and learn different techniques to evaluate its effectiveness.
Fine-Tuning Optimization for a Numerical Method for Hyperbolic Equations Applied to a Porous Media Flow Problem with Intel® ToolsThis paper presents an analysis for potential optimization for a Godunov-type semi-discrete central scheme, for a particular hyperbolic problem implicated in porous media flow, using OpenMP* and Intel® Advanced Vector Extensions 2.
This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
See how the new Intel® Advanced Vector Extensions 512CD and the Intel AVX512F subsets (available in the Intel® Xeon Phi processor and in future Intel Xeon processors) lets the compiler automatically generate vector code with no changes to the code.