Article

Superscalar Programming 101 (Matrix Multiply) Part 1 of 5

Part one of a five-part series, this article teaches a methodology to interpret statistics gathered during test runs and use those interpretations to improve parallel code.
Autor jimdempseyatthecove (Blackbelt) Última actualización 04/07/2019 - 22:00
Article

Superscalar programming 101 (Matrix Multiply) Part 2 of 5

By Jim DempseyIn my last article we left off with

Autor jimdempseyatthecove (Blackbelt) Última actualización 04/07/2019 - 22:00
Article

Superscalar programming 101 (Matrix Multiply) Part 3 of 5

By Jim Dempsey

Autor jimdempseyatthecove (Blackbelt) Última actualización 04/07/2019 - 22:00
Article

Superscalar programming 101 (Matrix Multiply) Part 4 of 5

In the last installment (Part 3) we saw the effects of the QuickThread Parallel Tag Team method of Matrix Multiplica

Autor jimdempseyatthecove (Blackbelt) Última actualización 04/07/2019 - 22:00
Article

Superscalar programming 101 (Matrix Multiply) Part 5 of 5

In part 4 we saw the effects of the QuickThread Parallel Tag Team Transpose method of Matrix Multiplication performe

Autor jimdempseyatthecove (Blackbelt) Última actualización 04/07/2019 - 22:00
Article

Improving Averaging Filter Performance Using Intel® Cilk™ Plus

Intel® Cilk™ Plus is an extension to the C and C++ languages to support data and task parallelism.  It provides three new keywords to i

Autor Anoop M. (Intel) Última actualización 12/12/2018 - 18:00
Article

Vectorizing Loops with Calls to User-Defined External Functions

Introduction

Autor Anoop M. (Intel) Última actualización 12/12/2018 - 18:00
Article

Vectorization Essentials

Vectorization essentials to effectively use feature in the Intel® Xeon product family
Autor admin Última actualización 02/10/2019 - 15:11
Article

Putting Your Data and Code in Order: Data and layout - Part 2

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Autor David M. Última actualización 15/10/2019 - 16:40
Article

整理您的数据和代码: 数据和布局 - 第 2 部分

Apply the concepts of parallelism and distributed memory computing to your code to improve software performance. This paper expands on concepts discussed in Part 1, to consider parallelism, both vectorization (single instruction multiple data SIMD) as well as shared memory parallelism (threading), and distributed memory computing.
Autor David M. Última actualización 15/10/2019 - 16:40