Mensajes en el blog

Some Performance Advantages of Using a Task-Based Parallelism Model

As part of my focus on software performance, I also support and consult on implementing scalable parallelism in applications.

Autor Shannon Cepeda (Blackbelt) Última actualización 04/02/2019 - 10:40
Mensajes en el blog

Graduate Intern at Intel - Parallel Ray-Tracing

Ray-tracing is a classic example of an embarrassingly parallel algorithm; since each pixel is typically independent of the rest, theoretically every pixel can be done in parallel (given enough core

Autor Última actualización 14/06/2017 - 15:37
Mensajes en el blog

Graduate Intern at Intel - Parallel N-Body

The N-Body problem is a classic example used frequently to demonstrate parallelization and how it improves performance.

Autor Última actualización 14/06/2017 - 15:46
Article

Efficient Parallelization

This article is part of the Intel® Modern Code Developer Community documentation which supports developers in leveraging application performance in code through a systematic step-by-step optimization framework methodology. This article addresses: Thread level parallelization.
Autor Ronald W Green (Blackbelt) Última actualización 21/03/2019 - 12:00
Article

高效并行化

高效并行化文档

面向英特尔® 集成众核架构的编译器方法

高效并行化

Autor Ronald W Green (Blackbelt) Última actualización 21/03/2019 - 12:00
Article

Choosing the right threading framework

This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi.

Autor Última actualización 06/07/2019 - 16:30
Article

Explicit Vector Programming – Best Known Methods

Vectorizing improves performance, and achieving high performance can save power. Introduction to tools for vectorizing compute-intensive processing.
Autor Última actualización 24/04/2019 - 11:25
Article

A Parallel Stable Sort Using C++11 for TBB, Cilk Plus, and OpenMP

This article describes a parallel merge sort code, and why it is more scalable than parallel quicksort or parallel samplesort. The code relies on the C++11 “move” semantics.

Autor Última actualización 01/08/2019 - 09:30
Video

Getting Better Performance on Dijkstra’s Shortest Path Graph Algorithm using the Intel® Compiler

We optimized a version of Dijkstra’s shortest path graph algorithm using a combination of Intel® Cilk™ Plus array notation and OpenMP* parallel for.

Autor Última actualización 04/03/2019 - 13:33
Article

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Autor David M. Última actualización 06/07/2019 - 16:40