博客

Parallel Universe Magazine #12: Advanced Vectorization

This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12:

作者: 最后更新时间: 2019/07/03 - 20:08
Article

Fast Gathering-based SpMxV for Linear Feature Extraction

This algorithm can be used to improve sparse matrix-vector and matrix-matrix multiplication in any numerical computation. As we know, there are lots of applications involving semi-sparse matrix computation in High Performance Computing. Additionally, in popular perceptual computing low-level engines, especially speech and facial recognition, semi-sparse matrices are found to be very common....
作者: 最后更新时间: 2018/12/12 - 18:00
博客

Can You Write a Vectorized Reduction Operation?

I can. And if you read this post you will also be able to write one, too. (Might be a cool party trick or a sucker bet to make a little cash.)
作者: Clay B. (Blackbelt) 最后更新时间: 2018/12/12 - 18:08
Article

Fine-Tuning Optimization for a Numerical Method for Hyperbolic Equations Applied to a Porous Media Flow Problem with Intel® Tools

This paper presents an analysis for potential optimization for a Godunov-type semi-discrete central scheme, for a particular hyperbolic problem implicated in porous media flow, using OpenMP* and Intel® Advanced Vector Extensions 2.
作者: 最后更新时间: 2019/07/03 - 20:00
博客

Question: Does Software Actually Use New Instruction Sets?

作者: Engblom, Jakob (Intel) 最后更新时间: 2019/07/04 - 16:56
Article

Intel® Math Kernel Library Improved Small Matrix Performance Using Just-in-Time (JIT) Code Generation for Matrix Multiplication (GEMM)

    The most commonly used and performance-critical Intel® Math Kernel Library (Intel® MKL) functions are the general matrix multiply (GEMM) functions.

作者: Gennady F. (Blackbelt) 最后更新时间: 2019/03/21 - 03:01
Article

Understanding NUMA for 3D Isotropic Finite Difference (3DFD) Wave Equation Code

This article demonstrates techniques that software developers can use to identify and fix NUMA-related performance issues in their applications.
作者: Sunny G. (Intel) 最后更新时间: 2019/10/02 - 16:18
Article

Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
作者: 最后更新时间: 2019/10/15 - 15:30
Article

Improve Performance with Vectorization

This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.
作者: David M. 最后更新时间: 2019/10/15 - 15:30
Article

Recognize and Measure Vectorization Performance

Get a background on vectorization and learn different techniques to evaluate its effectiveness.
作者: David M. 最后更新时间: 2019/10/15 - 15:30