SSE

Link/UrlTags
Motion Estimation Algorithms Using Streaming SIMD Extensions 3 [Knowledgebase]
Introduction The Streaming SIMD Extensions 3 (SSE3) for IA-32 Intel Architecture accelerates performance of Streaming SIMD Extensions 2 (SSE2) technology, Streaming SIMD Extensions (SSE) technology ...

Posted: 2009-03-10 17:47:01 by
C/C++, block matching, SSE
x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ) [Knowledgebase]
Introduction This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behav ...

Posted: 2008-10-17 12:32:19 by Shawn Casey (Intel)
simd, SSE2, SSE, Code
16bit 3D Convolution: SSE4+OpenMP implementation on Penryn CPU [Knowledgebase]
Attached presentation describes SSE3/SSE4 implementation of 3D Convolution for 16bit original data. SSE Speed-up (comparing with serial code) is ~3x, OpenMP on 2way Harpertown (Penryn) machine rises ...

Posted: 2009-01-26 13:58:46 by Zvi Danovich (Intel)
SSE
2x Shrink SSE algorithm [Knowledgebase]
The uploaded presentation describes the SSE implementation of imge 2x shrink, when one pixel contains 4 bytes: 3 color components R, G & B, and 4th components - weight A. Speed-up (comparing with ...

Posted: 2009-01-26 14:01:01 by Zvi Danovich (Intel)
SSE
3D Running Average SSE algorithm [Knowledgebase]
3D Running Average SSE algorithm is implemented for FP (SP) input data. Averaging window is fixed as 11 - this value was requested by customer who initiated this work. Basing on current implementation ...

Posted: 2009-01-26 14:02:05 by Zvi Danovich (Intel)
SSE
Intel® compiler options for SSE generation (SSE2, SSE3, SSSE3, SSE4) and processor-specific optimizations [Knowledgebase]
What are the IA-32 and Intel® 64 processor targeting options in the 11.x compilers? Which processor-specific option is best for my processor? What set of Processor-Specific Optimization o ...

Posted: 2009-07-13 14:35:04 by
dual-core, xeon, pentium, SSE2, SSE3, SSE, Core 2 Duo, SSE4.2, SSSE3, SSE4.1, MMX, Core 2 Quad, atom, Core i7, compiler, AVX
High Clocks Per Instruction Retired when vectorizing the loop. [Knowledgebase]
Introduction Sometimes when we vectorize a loop, we get a high Clocks Per Instruction Retired (CPI) value. This happens when there is high bus utilization and the bus gets saturated. The subtrac ...

Posted: 2009-07-14 03:49:00 by
simd, SSE2, SSE3, SSE4, SSE, High CPI, Vectorizer, hardware prefetcher, SSE1, Memoray latency, BUS Saturation, Vtune
Short-vector math: Intel Compiler vs. IPP [Forums]
Background For several years, there have been articles posted on ISN about how to use the Intel Compiler's SVML (short-vector math library) manually for "manual" integration into your own code.  Las ...

Posted: 2008-12-10 13:22:04 by Eric Palmer (Intel)
Math Library, SSE, optimization
Описание стандарта шифрования AES (Advanced Encryption Standard, улучшенный стандарт шифрования) [Knowledgebase]
Введение Стандарт шифрования AES является официальным стандартом правительства США для симметричного шифро ...

Posted: 2009-05-26 04:06:57 by Shay Gueron (Intel)
AES, SSE, симметричное шифрование
Identifying JVM SIMD and SSE Usage with the VTune™ Performance Analyzer [Knowledgebase]
by Levent Akyil Leveraging SIMD and SSE (Streaming SIMD Extensions) support available on target processors is one of the key optimization techniques JVMs use (or should use). The question is how to i ...

Posted: 2009-01-21 06:22:27 by Levent (Intel)
java, simd, JVM, SSE, Vtune
Sun + Intel + OpenSolaris + 2 Years = The Year of Core [Blogs]
Today is the second anniversary of the Sun and Intel joint agreement to optimize the Solaris operating system for Intel Xeon processors. Like last year, when I wrote this summary of our work, I decid ...

Posted: 2009-01-22 23:02:24 by David Stewart (Intel)
virtualization, power management, Threading Building Blocks, SSE, IOMMU, Intel Core Microarchitecture, ACPI, OpenSolaris, IOAT, DTrace, new instructions, NUMA, Xen, xVM, Dunnington, Fault Management Architecture, Intel Centrino2 processor, Intel Xeon processor, powertop, x2APIC, Wireless, FMA, Nehalem, Solaris, visual computing
SSE and sorting [Forums]
Is anyone aware of any sorting algorithms that use SSE. Not parallel sorting algorithms like Intel's TBB, but single threaded sorts that leverage the SSE functionality for comparison, swapping, etc. ...

Posted: 2009-02-24 20:27:52 by mrosenrosen
SSE, sorting
How Special Silicon Facilitates Parallel Arithmetic [Knowledgebase]
One of the most effective forms of parallelization is found deep inside Intel® x86 processors: the ability to execute parallel calculations with a single instruction. You can do this manually or let ...

Posted: 2009-06-18 18:31:49 by
simd, Multi-thread apps for Multi-Core, SSE, game development, physics, visual computing
Using Intel® VTune™ Performance Analyzer and Intel® Integrated Performance Primitives for Real-time Media Optimization [Knowledgebase]
By Sam Siewert Real-time Media and Encoding Tools Real-time media has become pervasive with Web 2.0 viral video services, Internet Protocol Television (IPTV), mobile media for Web-enabled phones, an ...

Posted: 2009-07-01 17:24:00 by
IPP, SSE, VTune analyzer, Vtune, visual computing, video encoding
Parallelization And Optimization of The Line Segment Intersection Problem [Blogs]
Line Segment Intersection Problem 1. Problem Statement Write a threaded code to find pairs of input line segments that intersect within three-dimensional space. Line segments are defined ...

Posted: 2009-08-12 08:19:32 by Dmitriy Vyukov
parallelization, Threading Building Blocks, SSE, optimization, fork-join, line segment intersection
SSE optimizations and IEEE 754 [Forums]
Hi,I'm using the options below as my best tradeoff between SSE performance and IEEE 754 conformance.  Are there any other flags I could add that would bring me even closer to precise arithmetic witho ...

Posted: 2009-08-24 14:54:06 by jeff_keasler
SSE, performance, IEEE floating point
Optimization of Image Processing Algorithms: A Case Study [Knowledgebase]
Abstract High quality image and video processing has become an important part in many professional and consumer applications. Unfortunately, it often comes with a high performance price. In such case ...

Posted: 2009-09-23 16:34:13 by Guy Ben Haim (Intel), Victoria Zhislina (Intel), Sagi Schein
SSE, visual computing, image processing, bilateral filter, halftoning
Image Processing Acceleration Techniques using Intel® Streaming SIMD Extensions and Intel® Advanced Vector Extensions [Knowledgebase]
Introduction Modern Intel processors features acceleration through the use of SIMD (Single Instruction Multiple Data) instructions that include a wide range of available Intel® Streaming SIMD Extens ...

Posted: 2009-10-02 13:56:04 by Petter Larsson (Intel), Eric Palmer (Intel)
SSE, compiler, AVX, visual computing, image processing, media
Wiener Filtering Using Intel® Advanced Vector Extensions [Knowledgebase]
1 IntroductionIntel® Advanced Vector Extensions (Intel® AVX) is a 256 bit instruction set extension to Intel® Streaming SIMD Extensions (Intel® SSE) and is designed for applications that are float ...

Posted: 2009-10-12 16:53:48 by
SSE, AVX, visual computing, image processing, least mean square filtering, wiener filtering, video processing
Data-Parallelism Spanning From SSE to AVX to Larrabee to... [Blogs]
Greetings all, and thanks for reading my first Intel Software Network blog! I just took my wife to see the movie Julie and Julia, and was inspired to blog, and since the popcorn is still processing, ...

Posted: 2009-10-02 01:47:03 by Eric Palmer (Intel)
SSE, cuda, optimization, openCL
Интервью с Анатолием Кузнецовым, автором библиотеки BitMagic C++ Library [Knowledgebase]
Аннотация В этой статье Анатолий Кузнецов, старший инженер по разработке программного обеспечения в NCBI, отв ...

Posted: 2009-11-16 01:53:32 by Andrey Karpov
C++, SSE, cuda, BitMagic