SSE

Threading and the Intel® IPP Library – part 3 of 3

OpenMP Threading and Intel IPP

The low-level primitives within the IPP library generally represent basic atomic operations. This limits threading within the library to ~15-20% of the primitives. Intel OpenMP is used to implement internal threading and is enabled, by default, when you use one of the multi-threaded variants of the library. Multi-threaded versions of the library are only supported on Linux, Windows, and Mac OS X.

Threading and the Intel® IPP Library – part 2 of 3

Threading Choices for Your Intel IPP Application

Source code for some multi-threaded IPP application examples are included in the free sample downloads. Several of these examples implement threading at the application level, and some use the OpenMP* threading that is built into the Intel IPP library. In most cases the performance gains due to multi-threading is substantial.

Parallelization And Optimization of The Line Segment Intersection Problem

<!--[endif]--><!--[if gte mso 9]> Normal 0 false false false MicrosoftInternetExplorer4 <![endif]--><!--[if gte mso 9]> <![endif]--> <!--[endif]--><!--[if gte mso 9]> <![endif]--><!--[if gte mso 9]> <![endif]-->

Line Segment Intersection Problem


1. Problem Statement

Write a threaded code to find pairs of input line segments that intersect within three-dimensional space. Line segments are defined by 6 integers representing the two (x,y,z) endpoints.

Вкус векторизации

В трудовые будни наобщавшись с народом я понял что что то с темой векторизации (Vectorization по-английски)  еще не всем понятно.

Много всего, может быть, уже написанно однако - постараемся суммировать знания.

Как известно в C/C++ мы оперируем с операндами, которые обязаны иметь тип, что внутренне подразумевает размерность или количество байт необходимых для хранения самих операндов/переменных.

3D Running Average SSE algorithm

3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation ideas, it is simple to build versions for other averaging windows as well.

Please, find attached:

  • SSE
  • Informatique parallèle
  • 16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU

    Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.

    SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it ~6x, therefore overall speed-up SSE+OpenMP is ~18x.

    Please, find attached:

  • SSE
  • Informatique parallèle
  • Sun + Intel + OpenSolaris + 2 Years = The Year of Core

    Today is the second anniversary of the Sun and Intel joint agreement to optimize the Solaris operating system for Intel Xeon processors. Like last year, when I wrote this summary of our work, I decided to recap where we are to date.

    Like last year’s edition, this is pretty much off the top of my head.

    S’abonner à SSE