Article

Scale-Up Implementation of a Transportation Network Using Ant Colony Optimization (ACO)

In this article an OpenMP* based implementation of the Ant Colony Optimization algorithm was analyzed for bottlenecks with Intel® VTune™ Amplifier XE 2016 together with improvements using hybrid MPI-OpenMP and Intel® Threading Building Blocks were introduced to achieve efficient scaling across a four-socket Intel® Xeon® processor E7-8890 v4 processor-based system.
Authored by Sunny G. (Intel) Last updated on 07/05/2019 - 19:10
Article

Caffe* Optimized for Intel® Architecture: Applying Modern Code Techniques

This paper demonstrates a special version of Caffe* — a deep learning framework originally developed by the Berkeley Vision and Learning Center (BVLC) — that is optimized for Intel® architecture.
Authored by Last updated on 07/06/2019 - 16:40
Article

Recipe: Building and Running YASK (Yet Another Stencil Kernel) on Intel® Processors

Yet Another Stencil Kernel (YASK), is a framework to facilitate design exploration and tuning of HPC kernels including vector folding, cache blocking, memory layout, loop construction, temporal wave-front blocking, and others.YASK contains a specialized source-to-source translator to convert scalar C++ stencil code to SIMD-optimized code.
Authored by Chuck Yount (Intel) Last updated on 03/21/2019 - 12:00
Article

Thread Parallelism in Cython*

Cython* is a superset of Python* that additionally supports C functions and C types on variable and class attributes. Cython generates C extension modules, which can be used by the main Python program using the import statement.
Authored by Nguyen, Loc Q (Intel) Last updated on 07/06/2019 - 16:30
Article

Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System

Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
Authored by Last updated on 06/14/2019 - 11:50
Article

Intel® Xeon Phi™ Processor 7200 Family Memory Management Optimizations

This paper examines software performance optimization for an implementation of a non-library version of DGEMM executing on the Intel® Xeon Phi™ processor (code-named Knights Landing, with acronym K

Authored by Last updated on 07/06/2019 - 16:30
Article

Caffe* Scoring Optimization for Intel® Xeon® Processor E5 Series

    In continued efforts to optimize Deep Learning workloads on Intel® architecture, our engineers explore various paths leading to the maximum performance.

Authored by Gennady F. (Blackbelt) Last updated on 03/21/2019 - 12:28
Article

Direct N-body Simulation

Exercise in performance optimization on Intel Architecture, including Intel® Xeon Phi™ processors.
Authored by Mike P. (Intel) Last updated on 03/21/2019 - 12:00
Article

Hybrid Parallelism: A MiniFE* Case Study

This case study examines the situation where the problem decomposition is the same for threading as it is for Message Passing Interface* (MPI); that is, the threading parallelism is elevated to the same level as MPI parallelism.
Authored by David M. Last updated on 07/06/2019 - 16:40