Getting Started with Intel® Cilk™ Plus Array Notations
A simple introduction on how use Array Notations feature in Intel® Cilk™ Plus. Type: Performance and Optimization |
Vectorization Cilk Plus Array Notations |
03/29/2012
|
Getting Started with Intel® Cilk™ Plus SIMD Vectorization and Elemental Functions
A tutorial on how to use #pragma simd and elemental function features in Intel® Cilk™ Plus. Type: Performance and Optimization |
simd vector Vectorization pragma Cilk Plus simd loop vectorlength |
03/20/2012
|
Getting Code Ready for Parallel Execution with Intel® Parallel Composer
This article provides an overview of the methods available in Intel® Parallel Composer, along with a comparison of their key benefits. Type: Performance and Optimization |
OpenMP Vectorization Parallel Composer compiler threading auto-parallelization |
11/02/2011
|
Threading and Intel® Integrated Performance Primitives
Threading and Intel® Integrated Performance Primitives (PDF 230KB)
Abstract
There is no universal threading solution that works for all applications. Likewise, there are multiple ways for application ... Type: Performance and Optimization |
|
11/02/2011
|
A Guide to Auto-vectorization with Intel® C++ Compilers
How to use the automatic vectorizer of the Intel® C/C++ Compiler to optimize your application using Intel Streaming SIMD Extensions (Intel SSE) or Intel Advanced Vector Extensions (Intel AVX). Type: Performance and Optimization |
SSE Vectorization Vectorizer optimization optimize compiler AVX |
09/06/2011
|
Performance Tools for Software Developers - SSE generation and processor-specific optimizations continue
Can I combine the processor values and target more than one processor?
How to generate optimized code for both Intel and AMD* architecture?
Where can I find more information on processor-specific optimizations? Type: Performance and Optimization |
|
07/26/2011
|
Kernel Template Library
Template Library to express Kernels with high level objects allowing auto-vectorization Type: Performance and Optimization |
C++0x simd SSE Vectorization AVX Lambda |
03/10/2011
|
How to manually target 2nd generation Intel Core processors with support for Intel AVX
Manual cpu dispatch may be used to write code that will be executed only on Intel processors with support for Intel® Advanced Vector Extensions, such as 2nd generation Intel® Core™ processors (formerly code named “Sandy Bridge”). Type: Performance and Optimization |
CPU dispatch |
01/14/2011
|
Step-by-Step Application Performance Tuning with Intel Compilers
A step-by-step introduction to application performance tuning using the Intel® Compilers version 12 for IA-32 and Intel® 64 processors that are included with Intel® Parallel Studio 2011 and Intel® Parallel Studio XE. Type: Performance and Optimization |
Tuning optimization |
11/12/2010
|
Being Successful with the Intel® Compilers -- You Need to Know
Tips and techniques on using the Intel Compilers to maximize your application performance. Type: Performance and Optimization |
Vectorization IPO floating point optimize Od O0 O1 O2 O3 PGO interprocedural profile guided precision fp x87 compiler |
11/09/2010
|
Elemental functions: Writing data parallel code in C/C++ using Intel® Cilk™ Plus
Intel® Cilk™ Plus provides simple to use language extensions to express data and task-parallelism to the C and C++ language. This article describes one of these programming constructs: “elemental functions”. Type: Performance and Optimization |
Intel® Cilk™ Plus elemental function __declspec(vector) |
11/04/2010
|
Information about the FTC Decision and Order on the Intel® Compilers Reimbursement Fund
Info on where to go for the FTC Compiler Reimbursement Fund. Type: Performance and Optimization |
|
11/02/2010
|
Intel® Compiler Options for Intel® SSE and Intel® AVX generation (SSE2, SSE3, SSE3_ATOM, SSSE3, SSE4.1, SSE4.2, AVX, AVX2) and processor-specific optimizations
Explains which Intel® Compiler switches to use to target and optimize for a specific platform, microarchitecture, CPU or processor. Type: Performance and Optimization |
dual-core xeon pentium SSE2 SSE3 SSE Core 2 Duo SSE4.2 SSSE3 SSE4.1 MMX Core 2 Quad atom Core i7 compiler AVX vcsource_domain_media vcsource_os_windows vcsource_platform_desktoplaptop vcsource_domain_graphics vcsource_product_icc vcsource_index |
09/02/2010
|
Guided Auto-Parallel (GAP)
Guided Auto-Parallel - compiler feature providing guidance to user on what changes are necessary for the compiler to automatically add vectorization or parallelization to serial application. Type: Performance and Optimization |
GAP |
09/01/2010
|
Accelerate Your Application via IPP Image Processing in Parallel Studio - C code vs. IPP Resize
This article show how to employ IPP image processing function to accelearte application and provide a sample to shows the performance difference between IPP and general C code on resizing image, which is wide-used functionality in image processing field. Type: Performance and Optimization |
sample code Composer ippiResizeSqrPixel ippiResize image processing MSVC project msvc2010 msvc2005 IPP 7.0 parallel Studio 2011 composer 2011 |
08/30/2010
|
Intel® Integrated Performance Primitives 7.0 Beta Program
Intel IPP 7.0 beta features and registration/download/support info. Type: Performance and Optimization |
Beta intel ipp new features intel ipp beta program intel ipp 7.0 beta Latest features in IPP IPP 7.0 IPP 7.0 beta |
07/11/2010
|
Boosting OpenSSL AES Encryption with Intel® IPP
IPP crypto adopts the AES-NI in latest version, which gives users the automatic boost from new silicon without any more work. The article shows the performance gain of OpenSSL AES with IPP AES function. Type: Performance and Optimization |
AES encryption IPP Cryptography Library Westmere Cryptography OpenSSL IPP Cryptography IPP cpu optimization openssl-ipp AES-NI decryption |
03/31/2010
|
clock() or gettimeofday() or ippGetCpuClocks()?
There are various function you may use to find the computational time for IPP functions or IPP functions. The best method, we recommend is to use ippGetCpuClocks() from IPP itself. Type: Performance and Optimization |
best timing function IPP timing measure time |
03/29/2010
|
IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems
When running Intel IPP crypto sample for OpenSSL on Hyper-Threading systems, the AES benchmark application reports slow performance. Users need to use correct threading setting to avoid the problem. Type: Performance and Optimization |
Hyper-Threading OpenMP AES Multi-threading OpenSSL openssl-ipp |
02/05/2010
|
/Qvec-reportN doesn't work with /Qipo in the IDE
xilink.exe doesn't support /Qvec-report, so it's impossible to change the default vectorization messaging behavior (i.e. none) when using IPO in the Microsoft Visual Studio* IDE. Type: Performance and Optimization |
IDE Vectorizer IPO /Qvec-report /Qipo |
02/01/2010
|
Use Intel® IPP on Compatible AMD* Processors
use ipp on Intel or compatible AMD* processors Type: Performance and Optimization |
simd SSE amd support IPP cpu optimization non-intel processors |
01/28/2010
|
How to Compile for the Intel® Core™ i5 processor and Intel® Xeon® 5600 processor series with AES-NI
The Intel C/C++ Compiler version 11 supports Advanced Encryption Standard New Instructions (AES-NI) via intrinsic functions to improve performance for encryption and decryption. Type: Performance and Optimization |
AES Algorithm AES AESNI Core i5 |
01/25/2010
|
AES-NI support in Intel® IPP
Intel®’s Advanced Encryption Standard (AES) Instructions Set are supported in latest Intel IPP version. Type: Performance and Optimization |
AES Algorithm AES Westmere Cryptography Nehalem OpenSSL openssl-ipp |
01/24/2010
|
How to Compile for Intel® AVX
Use the Intel Compiler 11.1 or 12.0 with the switch /QxAVX (Windows*) or -xavx (Linux*) to compile applications for Intel® Advanced Vector Extensions (Intel® AVX). Type: Performance and Optimization |
compiler AVX Intel 2nd Generation Core processor |
07/16/2009
|
Performance Tools for Software Developers - Loop blocking
Loop blocking is a combination of strip mining and loop interchange to enhance reuse of local data. It helps the nested loops that manipulate arrays and are too large to fit into the cache. The loop blocking allows reuse of the arrays by transforming the Type: Performance and Optimization |
cache blocking Loop blocking Performance and optimization |
07/13/2009
|