All Articles Tagged SSE

Link/UrlTags
movaps running very slow[Forums]
I was doing some simple timing tests and I noticed that movaps, and most of the sse floating point instructions, were running slow.I have my code below.  I tested doing a bunch of NOPs, another test with...

Posted: 2012-03-12 22:24:32
movaps, performance, SSE
how to optimize for loop bounds having multiple of 2 or 4?[Forums]
Hi,I have many loops in my code interacting on multiple arrays, all aligned on 32 byte boundaries.  Here is a trivial example for illustration:foo (double * __restrict__ x, double * __restrict__ y, int...

Posted: 2012-01-26 17:33:55
AVX, optimization, SSE
Rotate shift in AVX and SSE[Forums]
I need to shift values in a simd register and replace from left or right. Basically I have an array like {4,5,4,5} in SSE or {4,5,4,5,4,5,4,5} in AVX and need to convert them to {5,4,5,4} or {5,4,5,4,5,4,5,4}....

Posted: 2011-11-10 13:57:08
AVX, permutation, shifting, simd, SSE
Inconsistent results with -g flag[Forums]
Intel(R) Fortran Compiler Professional for applications running on IA-32, Version 11.0    Build 20090131I have a small program that I wrote that is producing different results based on whether or not...

Posted: 2011-03-21 12:58:25
floating-point, SSE
Kernel Template Library[Knowledgebase]
Kernel Template Library is a set of headers defining a methodology for defining High Level helper objects (Point, Matrix, etc.) and using them to express the algorithm of a Kernel.  It can then apply...

Posted: 2011-03-09 21:00:00 by Alex Wells (Intel)
AVX, C++0x, Lambda, simd, SSE, Vectorization
Free Speedup with Compiler Switches for Fast Math and Intel® Streaming SIMD Extensions[Knowledgebase]
Download Article Download Free Speedup with Compiler Switches for Fast Math and Intel® Streaming SIMD Extensions [PDF 238KB] Objective The intention of this introductory article is to make developers...

Posted: 2011-03-08 00:00:00 by Stan Melax (Intel)
game development, Intel SSE, SSE, vcsource_domain_graphics, vcsource_domain_media, vcsource_index, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_type_techarticle, visual computing
Links to instruction documentation[Forums]
The Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 2A and 2B (available here) are the instruction set reference.Haswell (2013) new instructions are in the programmer's...

Posted: 2010-12-31 07:07:46
AVX, intrinsic, sde, SSE
A Guide to Auto-vectorization with Intel® C++ Compilers[Knowledgebase]
Introduction The goal of this Guide is to provide guidelines for enabling compiler auto-vectorization with the Intel® C++ Compilers.  This document  is aimed at C/C++ programmers working on systems...

Posted: 2010-11-08 00:00:00 by mario.deilmann@intel.com
AVX, compiler, optimization, optimize, SSE, Vectorization, Vectorizer
Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform[Knowledgebase]
Download Article Download Using Intel® Advanced Vector Extensions to Implement an Inverse Discrete Cosine Transform [PDF 335KB]Download IDCT source code [Zip 22KB] Introduction Transform coding is...

Posted: 2010-06-07 21:00:00 by Richard Hubbard (Intel)
AVX, discrete cosine transform, IDCT, Intel AVX, SSE, visual computing
__m128 array becomes unaligned with IC optimization[Forums]
I'm sure this question that has been asked dozens of times. I just can't seem to figure out how to structure a search query that finds the answer.PROBLEM: I allocated an __m128 aligned data array...

Posted: 2010-05-07 15:43:18
intel compiler, optimization, SSE
Intel® IPP Library Signal Processing Domain Overview[Knowledgebase]
If the images below are not loading, or are loading too slowly, please download and review this presentation in PDF format by following this link: IPP-Signal-Processing-Overview-2010-q1.pdf. An overview...

Posted: 2010-03-24 09:00:00 by Paul Fischer (Intel)
convolution, DFT, digital filters, discrete cosine transform, fast fourier transform, FFT, FIR filters, IIR filters, IPP, signal processing, simd, SSE
Intel® IPP 7.0 Library Getting Started[Knowledgebase]
Please see the following links for the latest information regarding the Intel® Integrated Performance Primitives (Intel® IPP) library: Intel® IPP Main Product Page Intel® IPP 7.0 Library Release...

Posted: 2010-06-07 21:00:00 by Paul Fischer (Intel)
AVX, getting started, IPP, simd, SSE
Intel® IPP 6.1 Library Getting Started [Knowledgebase]
Please see the following links for the latest information regarding the Intel IPP library: Intel IPP Main Product Page Intel IPP 6.1 Library Release Notes Intel IPP 6.1 Library Installation Guide Intel...

Posted: 2010-03-08 21:00:00 by Paul Fischer (Intel)
IPP, readme, release notes, simd, SSE
Intel® IPP 7.0 Library Installation Guide[Knowledgebase]
Please see the following links for the latest information regarding the Intel® Integrated Performance Primitives (Intel® IPP) library: Intel® IPP Main Product Page Intel® IPP 7.0 Library Release...

Posted: 2010-06-07 00:00:00 by Paul Fischer (Intel)
AVX, Install Guide, IPP, simd, SSE
Intel® IPP 7.0 Release Notes[Knowledgebase]
Intel® Integrated Performance Primitives Library 7.0 Release Notes This document provides a general summary of new features and important notes about the Intel® IPP library software product. Please...

Posted: 2011-03-22 00:00:00 by Ying Song (Intel)
AVX, IPP, release notes, simd, spiral, SSE, What's new
JPEG Bug Fix Details[Knowledgebase]
In reply to this forum thread http://software.intel.com/en-us/forums/intel-integrated-performance-primitives/topic/69755/DPD200130124 - JPEG color conversion function requestAdded support for RGBA as...

Posted: 2010-01-08 00:00:00 by Paul Fischer (Intel)
bug, IPP, simd, SSE
Dynamic Volumetric Cloud Rendering for Games on Multi-Core Platforms[Knowledgebase]
by Sheng Guo, Cage Lu, Xiaochang WuSoftware and Services Group, Intel Corporation Introduction Clouds play an important role in creating images of outdoor scenery. Most games render clouds with the planar...

Posted: 2010-01-06 00:00:00 by Cage Lu (Intel)
dynamic volumetric clouds, game development, Intel SSE, Multi-threading, SSE, vcsource_domain_graphics, vcsource_index, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_type_samplecode, vcsource_type_techarticle, visual computing, volumetric cloud rendering, volumetric clouds
IPP Dispatcher Control Functions - ipp*Init*() functions[Knowledgebase]
Are you confused by the various ipp*Init*() functions? So was I. So I asked the Intel IPP engineering team for some clarification, and this is what I found. The Intel® IPP Dispatcher One of the most...

Posted: 2010-02-02 09:00:00 by Paul Fischer (Intel)
AVX, dispatch, init, IPP, simd, SSE
Use Intel® IPP on Compatible AMD* Processors[Knowledgebase]
Intel® IPP library is optimized for Intel and compatible processors. It is OK to use IPP library on either Intel processors or compatible AMD* processors, both 32-bit and 64-bit processor.Actually,...

Posted: 2010-01-23 09:00:00 by Ying H (Intel)
amd support, IPP cpu optimization, non-intel processors, simd, SSE
Interview with Anatoliy Kuznetsov, the author of BitMagic C++ Library[Knowledgebase]
Abstract In this article, Anatoliy Kuznetsov answers the questions and tells us about the open BitMagic C++ Library. Introduction In my regular browsing through 64-bit programming related websites,...

Posted: 2009-11-19 13:00:00 by Andrey Karpov
BitMagic, C++, cuda, SSE
Wiener Filtering Using Intel® Advanced Vector Extensions[Knowledgebase]
1 IntroductionIntel® Advanced Vector Extensions (Intel® AVX) is a 256 bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE) and is designed for applications that are floating...

Posted: 2010-01-26 00:00:00 by Kit Chung (Intel)
AVX, image processing, least mean square filtering, Parallel Programing, Sandy Bridge, SSE, vcsource_domain_media, vcsource_index, vcsource_os_linux, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_type_techarticle, video processing, visual computing, wiener filtering
Image Processing Acceleration Techniques using Intel® Streaming SIMD Extensions and Intel® Advanced Vector Extensions[Knowledgebase]
Introduction Modern Intel processors features acceleration through the use of SIMD (Single Instruction Multiple Data) instructions that include a wide range of available Intel® Streaming SIMD Extensions...

Posted: 2010-01-26 00:00:00 by Petter Larsson (Intel)
AVX, compiler, image processing, media, SSE, vcsource_domain_media, vcsource_index, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_type_techarticle, visual computing
Optimization of Image Processing Algorithms: A Case Study[Knowledgebase]
Abstract High quality image and video processing has become an important part in many professional and consumer applications. Unfortunately, it often comes with a high performance price. In such cases,...

Posted: 2009-09-22 00:00:00 by Guy Ben Haim (Intel)
halftoning, image processing, SSE, vcsource_domain_media, vcsource_index, vcsource_os_linux, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_product_icc, vcsource_type_casestudy, visual computing
SSE optimizations and IEEE 754[Forums]
Hi,I'm using the options below as my best tradeoff between SSE performance and IEEE 754 conformance.  Are there any other flags I could add that would bring me even closer to precise arithmetic without...

Posted: 2009-08-24 14:54:06
IEEE floating point, performance, SSE
How Special Silicon Facilitates Parallel Arithmetic[Knowledgebase]
One of the most effective forms of parallelization is found deep inside Intel® x86 processors: the ability to execute parallel calculations with a single instruction. You can do this manually or let the...

Posted: 2009-06-18 00:00:00 by Vincent Scotto (Intel)
game development, Multi-thread apps for Multi-Core, physics, Sandy Bridge, simd, SSE, vcsource_domain_graphics, vcsource_index, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_product_icc, vcsource_type_techarticle, visual computing
Code Samples for the Intel® Integrated Performance Primitives (Intel® IPP) Library 7.0*[Knowledgebase]
Note: the links to the IPP samples provided on this page are licensed under the Intel End User License Agreement, and are for Intel IPP 7.0.7. -- Follow this link if you need the Intel IPP samples...

Posted: 2011-05-05 00:00:00 by Paul Fischer (Intel)
h.264, Intel IPP sample, IPP, IPP 7.0, jpg, samples, simd, SSE, UIC sample application, UMC
SSE and sorting[Forums]
Is anyone aware of any sorting algorithms that use SSE. Not parallel sorting algorithms like Intel's TBB, but single threaded sorts that leverage the SSE functionality for comparison, swapping, etc.

Posted: 2009-02-24 20:27:52
sorting, SSE
Identifying JVM SIMD and SSE Usage with the VTune™ Performance Analyzer[Knowledgebase]
by Levent Akyil Leveraging SIMD and SSE (Streaming SIMD Extensions) support available on target processors is one of the key optimization techniques JVMs use (or should use). The question is how to identify...

Posted: 2010-09-21 18:29:33 by Levent (Intel)
JVM, simd, SSE, Vtune
Advanced Encryption Standard (AES) Instructions Set Rev 2[Knowledgebase]
Introduction Intel®’s AES instructions are a new set of processor instructions that will be introduced in Intel processors, starting from the processor called Westmere. These instructions enable fast...

Posted: 2010-01-28 00:00:00 by Shay Gueron (Intel)
AES, AES-NI, SSE, симметричное шифрование
The Intel® AVX Realization of Lanczos interpolation in Intel® IPP 2D Resize Transform[Knowledgebase]
Download PDF Download The Intel® AVX Realization of Lanczos interpolation in Intel® IPP 2D Resize Transform [PDF 174KB] Introduction This paper presents the interpolation algorithm based on...

Posted: 2010-01-24 09:00:00 by Vincent Scotto (Intel)
AVX, AVX optimization in IPP, image processing, ipp filter, ippiResize, resize, Sandy Bridge, simd, SSE
Short-vector math: Intel Compiler vs. IPP[Forums]
Background For several years, there have been articles posted on ISN about how to use the Intel Compiler's SVML (short-vector math library) manually for "manual" integration into your own code.  Last...

Posted: 2008-12-10 13:22:04
Math Library, optimization, SSE
Understanding CPU Dispatching in the Intel® IPP Library[Knowledgebase]
Note: this article describes the SIMD support in versions 5.3 thru 6.1 of the Intel IPP library. The minimum SIMD requirements have changed with release 7.0 of the Intel IPP libary. For more information...

Posted: 2010-02-01 09:00:00 by Ying Song (Intel)
AES, CPU dispatch, CPU-specific code, dispatch, IPP, simd, SSE
High Clocks Per Instruction Retired when vectorizing the loop.[Knowledgebase]
Introduction Sometimes when we vectorize a loop, we get a high Clocks Per Instruction Retired (CPI) value. This happens when there is high bus utilization and the bus gets saturated. The subtraction...

Posted: 2008-12-28 11:30:00 by Om Sachan (Intel)
BUS Saturation, hardware prefetcher, High CPI, Memoray latency, simd, SSE, SSE1, SSE2, SSE3, SSE4, Vectorizer, Vtune
Is there an IPP function to detect the processor type?[Knowledgebase]
In Intel IPP v6.0, there is a new function named ippGetCpuFeatures() that can be used to detect your processor features. It is declared in ippcore.h. This function retrieves CPU features like those returned...

Posted: 2010-01-24 09:00:00 by Ying Song (Intel)
Detect Intel CPU, IPP, ippGetCpuFeatures, ippGetCpuType, simd, SSE
Optimized Matrix Library for use with the Intel® Pentium® 4 Processor's SSE2 Instructions[Knowledgebase]
Introduction On January 2000, Intel published an optimized matrix library (4D single-precision matrix and vector classes) for use with Pentium® III Streaming SIMD (Single Instruction Multiple Data)...

Posted: 2009-01-13 00:00:00 by Linda Swink (Intel)
pentium, pentium4, SSE
Optimizing for the Intel® Pentium® 4 Processor Using Assembly Language[Knowledgebase]
by Khang NguyenIntel Corporation Introduction When talking about optimizing programs for the Intel® Pentium® 4 processor, people usually think about using Streaming SIMD Extensions (SSE) and Streaming...

Posted: 2008-10-20 00:00:00 by Khang Nguyen (Intel)
pentium, pentium4, SSE, SSE2
Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSE3_ATOM, SSSE3, SSE4.1, SSE4.2, AVX, AVX2) and processor-specific optimizations[Knowledgebase]
  What are the IA-32 and Intel® 64 processor targeting options in the 11.1, 12.0 and 12.1 compilers? Which processor-specific option is best for my processor? Which processor is targeted...

Posted: 2010-01-24 00:00:00 by Anand Mudliar (Intel)
atom, AVX, compiler, Core 2 Duo, Core 2 Quad, Core i7, dual-core, MMX, pentium, SSE, SSE2, SSE3, SSE4.1, SSE4.2, SSSE3, vcsource_domain_graphics, vcsource_domain_media, vcsource_index, vcsource_os_windows, vcsource_platform_desktoplaptop, vcsource_product_icc, xeon
3D Running Average SSE algorithm[Knowledgebase]
3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation...

Posted: 2009-01-25 23:00:00 by Zvi Danovich (Intel)
SSE
2x Shrink SSE algorithm[Knowledgebase]
The uploaded presentation describes the SSE implementation of imge 2x shrink, when one pixel contains 4 bytes: 3 color components R, G & B, and 4th components - weight A. Speed-up (comparing with...

Posted: 2009-01-25 23:00:00 by Zvi Danovich (Intel)
SSE
16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU[Knowledgebase]
Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data. SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises it...

Posted: 2009-01-25 23:00:00 by Zvi Danovich (Intel)
SSE
Fast Color Conversion Using Streaming SIMD Extensions and MMXT Technology[Knowledgebase]
Datatype color conversions are a common requirement in 3-D application pipelines. In a simple lighting scheme, these conversions happen at least once per color channel, red, green, blue (R, G, B) per vertex,...

Posted: 2009-06-18 00:00:00 by Linda Swink (Intel)
global illumination, MMX, simd, SSE, visual computing
x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)[Knowledgebase]
Introduction This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behavior...

Posted: 2008-10-17 00:00:00 by Shawn Casey (Intel)
Code, simd, SSE, SSE2
Motion Estimation Algorithms Using Streaming SIMD Extensions 3[Knowledgebase]
Introduction The Streaming SIMD Extensions 3 (SSE3) for IA-32 Intel Architecture accelerates performance of Streaming SIMD Extensions 2 (SSE2) technology, Streaming SIMD Extensions (SSE) technology,...

Posted: 2011-07-29 00:00:00 by Linda Swink (Intel)
block matching, C/C++, SSE
Intel® AVX Realization Of IIR Filter For Complex Float Data[Knowledgebase]
Download PDF Download Intel® AVX Realization Of IIR Filter For Complex Float Data [PDF 128KB] Introduction This paper describes complex Infinite Impulse Response (IIR) filter implementation...

Posted: 2010-01-26 09:00:00 by Igor Astakhov (Intel)
filter, Intel AVX, IPP, Sandy Bridge, signal processing, simd, SSE, БИХ