SSE

Вкус векторизации

В трудовые будни наобщавшись с народом я понял что что то с темой векторизации (Vectorization по-английски)  еще не всем понятно.

Много всего, может быть, уже написанно однако - постараемся суммировать знания.

Как известно в C/C++ мы оперируем с операндами, которые обязаны иметь тип, что внутренне подразумевает размерность или количество байт необходимых для хранения самих операндов/переменных.

3D Running Average SSE algorithm

3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation ideas, it is simple to build versions for other averaging windows as well.

Please, find attached:

  • SSE
  • Informatique parallèle
  • 2x Shrink SSE algorithm

    The uploaded presentation describes the SSE implementation of imge 2x shrink, when one pixel contains 4 bytes: 3 color components R, G & B, and 4th components - weight A.

    Speed-up (comparing with serial code) is 4.6 for Merom platform, ~7 on Penryn platform.

    Please, find attached:

    1. PowerPoint presentation, describing this algorithm.
    2. ZIP file containing C code project implementation, included into simple benchmarking application. The project is built for MS VisStudio-2005.

    Command line doesn't have any arguments - application name only.

  • SSE
  • Informatique parallèle
  • 16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU

    Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data.

    SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it ~6x, therefore overall speed-up SSE+OpenMP is ~18x.

    Please, find attached:

  • SSE
  • Informatique parallèle
  • Sun + Intel + OpenSolaris + 2 Years = The Year of Core

    Today is the second anniversary of the Sun and Intel joint agreement to optimize the Solaris operating system for Intel Xeon processors. Like last year, when I wrote this summary of our work, I decided to recap where we are to date.

    Like last year’s edition, this is pretty much off the top of my head.

    x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)

    Introduction

    This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behavior when using (Streaming SIMD Extensions) SSE and SSE2.

  • Développeurs
  • Extensions Intel® Streaming SIMD
  • SSE2
  • SSE
  • Processeurs Intel® Pentium®
  • Точность и вежливость компилятора

    В процесе нахождения высшей истины иногда приходиться спотыкать и полностью осознавать базис.

    Возьмем к примеру, следующий код

    :#include <stdio.h>

    int main (void)
    {
      double a = 3.0, b = 7.0, c;

      c = a / b;

      if (c == a / b) {
        printf ("comparison succeeds\n");
      } else {
        printf ("unexpected result\n");
      }

      return 0;
    }

    и оказываеться что например на gcc, наверное и на других компиляторах, он вполне может выдавать unexpected result.

    S’abonner à SSE