Mensajes en el blog

Parallel Universe Magazine #12: Advanced Vectorization

This blog contains additional content for the article "Advanced Vectorization" from Parallel Universe #12:

Autor Última actualización 03/07/2019 - 20:08

Intel® System Studio - Multicore Programming with Intel® Cilk™ Plus

Intel System Studio not only provides a variety of signal processing primitives via Intel® Integrated Performance Primitives (Intel® IPP), and Intel® Math Kernel Library (Intel® MKL), but also allows developing high-performance low-latency custom code (Intel C++ Compiler with Intel Cilk Plus). Since Intel Cilk Plus is built into the compiler, it can be used where it demands an efficient threading...
Autor Hans P. (Intel) Última actualización 11/12/2017 - 10:48

Cilk Plus Solver for a Chess Puzzle or: How I Learned to Love Fast Rejection

Intel® Cilk™ Plus enabled parallelizing a chess puzzle solver with a few changes.
Autor Última actualización 07/06/2017 - 09:12

A DPRNG for Cilk™ Plus?

Continuing my previous post, I describe some of the challenges in implementing DotMix, a determinstic parallel random-number generator (DPRNG) for Intel® Cilk™ Plus.
Autor Última actualización 20/03/2019 - 13:00

Using Pedigrees in Intel® Cilk™ Plus

Pedigrees are a new feature implemented in Intel Cilk Plus and currently available in Intel® Composer XE 2013. In this post, I explain what pedigrees are, how they work, and how you can use them in Cilk Plus. Pedigrees are a key component used in the implementation of DotMix, a contributed code for a deterministic parallel random-number generator (DPRNG) discussed in my previous post.
Autor Última actualización 11/10/2017 - 11:28

Why is Cilk™ Plus not speeding up my program? (Part 1)

In this article, I discuss some common performance pitfalls in Cilk™ Plus programs that prevent users from seeing speedups in their code, and describe some techniques for avoiding these pitfalls.
Autor Última actualización 04/02/2019 - 10:40

Choosing the right threading framework

This is the second article in a series of articles about High Performance Computing with the Intel Xeon Phi.

Autor Última actualización 06/07/2019 - 16:30

Late-initialization of frame descriptors in Cilk Plus/LLVM

The Intel® Cilk™ Plus C/C++ language extensions support the expression of portable and efficient task and vector parallel programs. Cilk Plus/LLVM is an implementation of these extensions in the Clang frontend for LLVM. In this article we explain one of the optimizations that we have implemented in Cilk Plus/LLVM: late-initialization of frame descriptors[1]. With this explanation, we provide a...
Autor Última actualización 07/06/2017 - 09:11

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Autor David M. Última actualización 06/07/2019 - 16:40

Improve Performance with Vectorization

This article focuses on the steps to improve software performance with vectorization. Included are examples of full applications along with some simpler cases to illustrate the steps to vectorization.
Autor David M. Última actualización 06/07/2019 - 16:40