Article

自动矢量化失败后应该怎么办?

This article completes an analysis of a problem erroneously reported on the Intel® Developer Zone forum: Vectorization failed because of unsigned integer? It provides a more detailed examination showing that unsigned integer is not impacting compiler vectorization but what methodology to use when a modern C/C++ compiler fails to auto-vectorize for-loops.
Authored by Last updated on 07/05/2019 - 14:46
Article

An update to the integration of Intel® Media SDK and FFmpeg

Introduction
Authored by Liu, Mark (Intel) Last updated on 07/03/2019 - 20:00
Article

Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System

Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
Authored by Last updated on 10/15/2019 - 15:30
Article

Recognize and Measure Vectorization Performance

Get a background on vectorization and learn different techniques to evaluate its effectiveness.
Authored by David M. Last updated on 10/15/2019 - 15:30
Article

Measuring performance in HPC

This is the first article in a series of articles about High Performance Computing with the Intel® Xeon Phi™ coprocessor.

Authored by Last updated on 10/15/2019 - 16:40
Article

面向英特尔® 架构优化的 Caffe*:使用现代代码技巧

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Authored by Last updated on 10/15/2019 - 16:50
Article

Applying Intel® Threading Building Blocks Observers for Thread Affinity on Intel® Xeon Phi™ Coprocessors

In spite of the fact that the Intel® Threading Building Blocks (Intel® TBB) library [1] [2] provides high-level task based parallelism intended to hide sof

Authored by Alex (Intel) Last updated on 10/15/2019 - 17:26
Article

Distributed Memory Coarray Programs with Process Pinning

This article describes a method to compile and run a distributed memory coarray program using Intel® Parallel Studio XE Cluster Edition for Linux . An example using Linux* is presented.
Authored by Kenneth Craft (Intel) Last updated on 10/15/2019 - 21:20
Article

Introduction to GEN Assembly

Download PDF (1.5 MB)

Download

Authored by Robert Ioffe (Intel) Last updated on 10/21/2019 - 08:18
Article

Using OpenCL™ 2.0 Read-Write Images

While Image convolution is not as effective with the new Read-Write images functionality, any image processing technique that needs be done in place may benefit from the Read-Write images. One example of a process that could be used effectively is image composition. In OpenCL 1.2 and earlier, images were qualified with the “__read_only” and __write_only” qualifiers. In the OpenCL 2.0, images can...
Authored by Last updated on 11/19/2019 - 14:05