Article

A Parallel Stable Sort Using C++11 for TBB, Cilk Plus, and OpenMP

This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics.

Authored by Last updated on 08/01/2019 - 09:30
Article

Choosing the right threading framework

This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi.

Authored by Last updated on 07/06/2019 - 16:30
Blog post

Graduate Intern at Intel - Parallel Ray-Tracing

Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough core

Authored by Last updated on 06/14/2017 - 15:37
Blog post

Graduate Intern at Intel - Parallel N-Body

The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.

Authored by Last updated on 06/14/2017 - 15:46
Article
Article

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Authored by David M. Last updated on 07/06/2019 - 16:40
Article

Why is Cilk™ Plus not speeding up my program? (Part 1)

In this article, I discuss some common performance pitfalls in Cilk™ Plus programs that prevent users from seeing speedups in their code, and describe some techniques for avoiding these pitfalls.
Authored by Last updated on 02/04/2019 - 10:40
Article

Why is Cilk™ Plus not speeding up my program? (Part 2)

This article concludes the discussion of common performance pitfalls in Cilk™ Plus programs and how to avoid them, as started in Part 1. .
Authored by Last updated on 02/04/2019 - 10:40
Article

Hybrid Parallelism: A MiniFE* Case Study

This case study examines the situation where the problem decomposition is the same for threading as it is for Message Passing Interface* (MPI); that is, the threading parallelism is elevated to the same level as MPI parallelism.
Authored by David M. Last updated on 07/06/2019 - 16:40
Article

Improve Performance with Vectorization

This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.
Authored by David M. Last updated on 07/06/2019 - 16:40