| Link/Url | Tags |
|---|---|
| Using SSE2 to Evaluate a Hidden Markov Model with Viterbi Decoding [Knowledgebase] Introduction The Streaming SIMD Extensions 2 (SSE2) technology introduces new Single Instruction Multiple Data (SIMD) double-precision floating-point instructions and new SIMD integer instruction ... Posted: 2009-01-12 13:53:08 by | pentium, SSE2, Code |
| Block-Matching In Motion Estimation Algorithms [Knowledgebase] Introduction The Streaming SIMD Extensions 2 (SSE2) technology introduces new Single Instruction Multiple Data (SIMD) double-precision floating-point instructions and new SIMD integer instructio ... Posted: 2009-01-14 09:50:11 by | pentium, SSE2, Code |
| Reducing the Impact of Misaligned Memory Accesses [Knowledgebase] Introduction Misalignment of memory access is a problem commonly encountered when optimizing code with Streaming SIMD Extensions 2 (SSE2). An SSE2 algorithm often requires loading and storing da ... Posted: 2009-06-19 18:44:32 by Michael Stoner (Intel) | SSE2, coding, visual computing |
| How to Reduce the Impact of Misaligned Memory Accesses [Knowledgebase] Challenge Reduce the impact of misaligned memory accesses in an SSE2 algorithm. Misalignment of memory access is a problem commonly encountered when optimizing code with Streaming SIMD Extensions ... Posted: 2008-12-11 13:07:00 by | SSE2 |
| How to Vectorize Code on 32-Bit Intel® Architecture [Knowledgebase] Challenge Vectorize code for greater performance. The SIMD features of Streaming SIMD Extensions (SSE), Streaming SIMD Extensions 2 (SSE2) and MMX™ technology require new methods of coding algo ... Posted: 2008-12-10 11:02:53 by | SSE2 |
| Fast Random Number Generator on the Intel® Pentium® 4 Processor [Knowledgebase] by Kipp Owens, Applications Engineer &Rajiv Parikh, Sr. Applications EngineerSoftware Solutions Group, Intel Corporation. Abstract This paper shows how to speed up a commonly used pseudo-ran ... Posted: 2009-02-25 12:56:55 by Rajiv Parikh (Intel), kippowens | SSE2 |
| x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ) [Knowledgebase] Introduction This document details the difference between how assists are handled with x87 and Single Instruction Multiple Data (SIMD) instructions, and gives information on how to change their behav ... Posted: 2008-10-17 12:32:19 by Shawn Casey (Intel) | simd, SSE2, SSE, Code |
| Intel® compiler options for SSE generation (SSE2, SSE3, SSSE3, SSE4) and processor-specific optimizations [Knowledgebase] What are the IA-32 and Intel® 64 processor targeting options in the 11.x compilers? Which processor-specific option is best for my processor? What set of Processor-Specific Optimization o ... Posted: 2009-07-13 14:35:04 by | dual-core, xeon, pentium, SSE2, SSE3, SSE, Core 2 Duo, SSE4.2, SSSE3, SSE4.1, MMX, Core 2 Quad, atom, Core i7, compiler, AVX |
| High Clocks Per Instruction Retired when vectorizing the loop. [Knowledgebase] Introduction Sometimes when we vectorize a loop, we get a high Clocks Per Instruction Retired (CPI) value. This happens when there is high bus utilization and the bus gets saturated. The subtrac ... Posted: 2009-07-14 03:49:00 by | simd, SSE2, SSE3, SSE4, SSE, High CPI, Vectorizer, hardware prefetcher, SSE1, Memoray latency, BUS Saturation, Vtune |
| winnt.h declaration incompatible with func declared in emmintrin.h [Forums] I assume that this forum is trolled by the Intel C++ experts for answers the communal mind doesn't have.I'm using Visual C++ 2008 (9.0.21022.8). And IC C++ 11.0.066.I'm working on a module that uses o ... Posted: 2009-06-10 14:14:18 by Taylor Kidd | SSE2, winnt.h, emmintrin.h |
| clflush over the LAPIC mapping [Forums] I am remapping the LAPIC registers page to some virtual address. During the remap procedure, virtual memory subsystem invalidates the TLB for the page and then does clflush over the region, looping fr ... Posted: 2009-09-14 07:49:56 by kostikbel1 | cache, SSE2 |
| Problem with SSE2 code [Forums] Hi!I have a problem with a SSE2 code, that I can`t resolve. The piece of code is this:asm("movupd xmm1, [xp]");xp+=2;asm("movupd xmm0 , [yp]");yp+=2; asm("addpd xmm1, xmm0\n"); asm("movupd [yp], xmm1\ ... Posted: 2009-09-18 11:06:32 by ijjys | SSE2 |