Article

90 errors in open-source projects

There are actually 91 errors described in the article, but number 90 looks nicer in the title. The article is intended for C/C++ programmers, but developers working with other languages may also find it interesting.
Authored by Andrey Karpov (Blackbelt) Last updated on 06/20/2019 - 22:51
Article

Threading Intel® Integrated Performance Primitives Image Resize with Intel® Threading Building Blocks

Threading Intel® IPP Image Resize with Intel® TBB.pdf (157.18 KB) :
Authored by Jeffrey M. (Intel) Last updated on 07/31/2019 - 15:05
Article

OpenMP* SIMD for Inclusive/Exclusive Scans

The Intel® C++ Compiler 19.0 and the Intel® Fortran Compiler 19.1 support the OpenMP* SIMD SCAN feature for inclusive and exclusive scans.
Authored by Varsha M. (Intel) Last updated on 07/23/2019 - 09:16
Article

Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System

Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
Authored by Last updated on 10/15/2019 - 15:30
Article

Recognize and Measure Vectorization Performance

Get a background on vectorization and learn different techniques to evaluate its effectiveness.
Authored by David M. Last updated on 10/15/2019 - 15:30
Article

Measuring performance in HPC

This is the first article in a series of articles about High Performance Computing with the Intel® Xeon Phi™ coprocessor.

Authored by Last updated on 10/15/2019 - 16:40
Article

Приводим данные и код в порядок: данные и разметка, часть 2

In this pair of articles on performance and memory covers basic concepts to provide guidance to developers seeking to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 10/15/2019 - 16:40
Article

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 10/15/2019 - 16:40
Article

整理您的数据和代码: 数据和布局 - 第 2 部分

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 10/15/2019 - 16:40
Article

面向英特尔® 架构优化的 Caffe*:使用现代代码技巧

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Authored by Last updated on 10/15/2019 - 16:50