博客

Parallel Universe Magazine #12: Advanced Vectorization

This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12:

作者: 最后更新时间: 2019/07/03 - 20:08
Article

Fast Gathering-based SpMxV for Linear Feature Extraction

This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
作者: 最后更新时间: 2018/12/12 - 18:00
Article

Fine-Tuning Optimization for a Numerical Method for Hyperbolic Equations Applied to a Porous Media Flow Problem with Intel® Tools

This paper presents an analysis for potential optimization for a Godunov-type semi-discrete central scheme, for a particular hyperbolic problem implicated in porous media flow, using OpenMP* and Intel® Advanced Vector Extensions 2.
作者: 最后更新时间: 2019/07/03 - 20:00
博客

Question: Does Software Actually Use New Instruction Sets?

作者: Engblom, Jakob (Intel) 最后更新时间: 2019/07/04 - 16:56
Article

Intel® Math Kernel Library Improved Small Matrix Performance Using Just-in-Time (JIT) Code Generation for Matrix Multiplication (GEMM)

    The most commonly used and performance-critical Intel® Math Kernel Library (Intel® MKL) functions are the general matrix multiply (GEMM) functions.

作者: Gennady F. (Blackbelt) 最后更新时间: 2019/03/21 - 03:01
Article

Understanding NUMA for 3D Isotropic Finite Difference (3DFD) Wave Equation Code

This article demonstrates techniques that software developers can use to identify and fix NUMA-related performance issues in their applications.
作者: Sunny G. (Intel) 最后更新时间: 2019/10/02 - 16:18
Article

Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
作者: 最后更新时间: 2019/10/15 - 15:30
Article

Recognize and Measure Vectorization Performance

Get a background on vectorization and learn different techniques to evaluate its effectiveness.
作者: David M. 最后更新时间: 2019/10/15 - 15:30
博客

Optimizing Big Data processing with Haswell 256-bit Integer SIMD instructions

Big Data requires processing huge amounts of data. Intel Advanced Vector Extensions 2 (aka AVX2) promoted most Intel AVX 128-bits integer SIMD instruction sets to 256-bits.

作者: gaston-hillar (Blackbelt) 最后更新时间: 2019/10/15 - 17:38
博客

BKMs on the Use of the SIMD Directive

We had an ask from one of the various "Birds of a Feather" meetings Intel® holds at venues such as at the Super Computing* (SC) and International Super Computing* (ISC) conferences.

作者: 最后更新时间: 2019/10/15 - 17:45