Article

OpenMP* SIMD for Inclusive/Exclusive Scans

The Intel® C++ Compiler 19.0 and the Intel® Fortran Compiler 19.1 support the OpenMP* SIMD SCAN feature for inclusive and exclusive scans.
Authored by Varsha M. (Intel) Last updated on 07/23/2019 - 09:16
Article

Using Tasks Instead of Threads

Tasks are a lightweight alternative to threads that provide faster startup and shutdown times, better load balancing, an efficient use of available resources, and a higher level of abstraction.
Authored by admin Last updated on 07/05/2019 - 09:41
Article

Multi-core Intermediate

Introduction
Authored by Nguyen, Khang T (Intel) Last updated on 07/13/2018 - 17:29
Article

Code Sample: Optimizing Binarized Neural Networks on Intel® Xeon® Scalable Processors

In the previous article, we discussed the performance and accuracy of Binarized Neural Networks (BNN). We also introduced a BNN coded from scratch in the Wolfram Language. The key component of this neural network is Matrix Multiplication.
Authored by Yash Akhauri Last updated on 03/21/2019 - 12:40
Article

Multithreaded Game Programming and Hyper-Threading Technology

by Will Damon

Authored by Last updated on 01/24/2018 - 12:12
Article

Measuring performance in HPC

This is the first article in a series of articles about High Performance Computing with the Intel® Xeon Phi™ coprocessor.

Authored by Last updated on 07/06/2019 - 16:10
Blog post

Unleash the Parallel Performance of Python* Programs

[updated 10/5/2018]

Authored by Anton Malakhov (Intel) Last updated on 10/05/2018 - 18:24
Article

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 07/06/2019 - 16:40
Article

Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System

Matrix multiplication (MM) of two matrices is one of the most fundamental operations in linear algebra. The algorithm for MM is very simple, it could be easily implemented in any programming language. This paper shows that performance significantly improves when different optimization techniques are applied.
Authored by Last updated on 06/14/2019 - 11:50
Article

Exploiting Data Parallelism in Ordered Data Streams

This article identifies some of these challenges and illustrates strategies for addressing them while maintaining parallel performance.
Authored by admin Last updated on 07/05/2019 - 14:50