<?xml version="1.0" encoding="UTF-8"?>
<!-- Generated on Sun, 12 Feb 2012 07:03:00 -0800 -->
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <atom:link href="http://software.intel.com/en-us/articles/intel-ipp-kb/type/performance-and-optimization/feed/" rel="self" type="application/rss+xml" />
    <title>Intel Software Network articles Feed</title>
    <link>http://software.intel.com/en-us/articles/intel-ipp-kb/type/performance-and-optimization/</link>
    <description></description>
    <language>en-us</language>
    <item>
      <title>Generic Static Library Dispatching with the Intel® IPP 7.0 Library</title>
      <description><![CDATA[ <p>The px_/mx_ prefixes have been restored to the static generic library (<span >as of version 7.0.4</span>) so that you can now link against the static generic library <em>and</em> the the standard Intel IPP product library within a single application.</p>
<blockquote>
<p>Note: The feature described in this article is only relevant if you need to deploy your Intel IPP application on platforms that do not support at least SSE2 for an IA-32 (32-bit) application or at least SSE3 for an Intel64 (64-bit) platform.<br /><b><br />If all of your platforms support at least SSE2 (32-bit) or SSE3 (64-bit) you do not need to use the procedure described in this article and you do not need to download the generic px/mx static library!<br /><br /></b>If you are unsure what level of SIMD instructions your target platform(s) support, please visit <a target="_blank" href="http://ark.intel.com/">ark.intel.com</a> and search for your specific processor(s).</p>
</blockquote>
<p>Unlike the dynamic library, the automatic dispatcher in the static library <em>will not</em> recognize the generic library and <em>will not</em> automatically dispatch to the generic optimizations provided in the generic px/mx add-on static library; instead, you must call the generic functions directly using the px_/mx_ prefix (as if you were calling an optimized library function directly). This means that if you choose to include the generic static library as part of your application you must decide whether to call the dispatched library or the equivalent generic library function at each Intel IPP function call within your application. Such a decision should be based on an initial evaluation of the platform that determines if you need to use the generic static library functions or if it is safe to call the standard dispatched library functions.</p>
<blockquote>
<p>Note: The <a href="http://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions/">ippInit() function</a> normally used to initialize the static library dispatcher determines the <a href="http://software.intel.com/en-us/articles/understanding-simd-optimization-layers-and-dispatching-in-the-intel-ipp-70-library/">level of SSE instructions</a> supported on the target processor at runtime using the CPUID instruction.<br /><b><br />The manufacturer string returned by the CPUID instruction is not used as part of this test; however, the CPUID results are interpreted according to Intel processor conventions.<br /></b><br />This means that if a non-Intel processor reports the SIMD instructions it supports in a way that is compatible with an Intel processor, the test passes (assuming the reported SIMD level is supported by the library); if not, the test fails. It is believed, but cannot be proven, that all x86-compatible processors report their support for SSE2 and SSE3 in a manner that is compatible with Intel processors. After SSE3 (e.g., SSSE3, SSE4.1, etc.) the SIMD instruction sets in use diverge across manufacturers and are, generally, not compatible with the Intel SSE (and AVX) instructions.<br /><br />Additionally, at this time we are not planning to restore the generic optimization library as an integral dispatched layer within the Intel IPP product. We periodically must make some difficult choices regarding what we can continue to optimize, test and validate. Given that the SSE2 instruction set has been supported by nearly every x86-compatible processor produced for nearly a decade, the number of platforms that cannot run an application that employs the Intel IPP 7.0 library today is very, very small. The generic px/mx layers are still integrated in the 6.1 version of the Intel IPP library.<br /><br />Please refer to our <a href="http://software.intel.com/en-us/articles/optimization-notice/">Optimization Notice</a> for more information regarding performance and optimization choices in Intel software products.</p>
</blockquote>
<h2 class="sectionHeading">Calling the Generic PX/MX Functions in an Application</h2>
<p>With this version of the px/mx generic add-on static library you can now call the generic functions within the same application as you call the dispatched functions. You must, however, implement an additional layer that "manually dispatches" between the generic functions and the standard functions, since the static library dispatcher cannot, for technical reasons, be integrated with the generic px/mx static add-on library. (This is not an issue with the standard dynamic library.)</p>
<p>The basic idea is best shown by a simple example for use with the px (32-bit) version of the generic static library:</p>
<p>#include "ipp.h" <br />#include "ipp_generic.h" <br /><br />Ipp64u ipp_cpuid = 0 ; <br />IppStatus ipp_init_status = ippInit() ; <br /><br />// determine processor type/status and set "ipp_cpuid" <br />// see SIMD detection example further in article... <br /><br />char src[] = "to be copied\0" ; <br />char dst[256] ; <br /><br />if( ipp_cpuid &lt; ippCPUID_SSE2 )<br />  status = px_ippsCopy_8u( src, dst, strlen(src)+1 ) ; <br />else <br />  status = ippsCopy_8u( src, dst, strlen(src)+1 ) ;</p>
<p><br />In the example above, during initialization you must determine whether the application should use the “generic” px code or the standard library. If the runtime processor only supports SIMD instructions less than SSE2 (for example, the processor only supports MMX or SSE) the application calls the generic px functions; otherwise, it calls the standard library functions.</p>
<p>If you are writing a 64-bit application you use the mx prefix on the generic function call and the conditional check is against ippCPUID_SSE3, since SSE3 is the minimum level supported by the standard dispatched library (SSE2 is the minimum level supported by the 32-bit library).</p>
<h2 class="sectionHeading">Building Your Generic Include File</h2>
<p>The example above includes a file called "ipp_generic.h," which is not distributed with either the standard product or the add-on generic library. You must build this include file for use with your application.</p>
<p>For example, assume that you are using all of the ippsCopy() functions. In that case you would copy from the standard ipps.h header file the function declarations for the ippsCopy() functions you are using. Then add "px_" (or "mx_" for 64-bit applications) to the name of each function declaration. This will provide you with the external function declarations you need in order to call the generic functions. In this case, ipp_generic.h would look like:</p>
<p>IPPAPI(IppStatus, px_ippsCopy_8u,( const Ipp8u* pSrc, Ipp8u* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_16s,( const Ipp16s* pSrc, Ipp16s* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_16sc,( const Ipp16sc* pSrc, Ipp16sc* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_32f,( const Ipp32f* pSrc, Ipp32f* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_32fc,( const Ipp32fc* pSrc, Ipp32fc* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_64f,( const Ipp64f* pSrc, Ipp64f* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_64fc,( const Ipp64fc* pSrc, Ipp64fc* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_32s,( const Ipp32s* pSrc, Ipp32s* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_32sc,( const Ipp32sc* pSrc, Ipp32sc* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_64s,( const Ipp64s* pSrc, Ipp64s* pDst, int len )) <br />IPPAPI(IppStatus, px_ippsCopy_64sc,( const Ipp64sc* pSrc, Ipp64sc* pDst, int len ))<br /> </p>
<p>Make sure you include ipp.h before you include your custom ipp_generic.h file, so all macro and data type definitions have been taken care of before you declare your generic functions.</p>
<p>Of course, you must also be sure to include the appropriate generic library in the list of libraries that your application will link against. The "USE_IPP" feature that does this automatically for you in Microsoft* Visual Studio* <em>WILL NOT</em> do this for you!</p>
<p>The ZIP file attached to this KB article is an example of how you can setup your ipp_generic.h file automatically using a C macro redifinition. The ZIP file also includes a simple test application.</p>
<h2 class="sectionHeading">Determining What Level of SIMD Your Processor Supports</h2>
<p>To determine if your processor will be supported by the standard Intel IPP 7.0 library you can use the following test:</p>
<p>Ipp64u u64FeaturesMask = ippCPUID_GETINFO_A ; <br />Ipp32u u32CpuidInfoRegs[] = { 1, 0, 0, 0 } ; <br />IppStatus ipp_status ; <br /><br />if( ipp_status = ippGetCpuFeatures( &amp;u64FeaturesMask, u32CpuidInfoRegs ) ) <br />  /* handle error condition returned by status */ ; <br /><br />ipp_cpuid = u64FeaturesMask &amp; 0x1ff ;</p>
<p>The contents of ipp_cpuid can be compared against the "CPU Features Mask" enumerations to determine which level of SIMD instructions are supported (see the sample code earlier in this article). A complete table of "CPU Features Mask" enumerations is provided here:<br /><br /><a target="_blank" href="http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/ippxe/ipp_manual_lnx/hh_goto.htm#IPPS/ipps_ch3/functn_ippGetCpuFeatures.htm">http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/ippxe/ipp_manual_lnx/hh_goto.htm#IPPS/ipps_ch3/functn_ippGetCpuFeatures.htm</a><br /><br />The definition of the "CPU Features Mask" is located inside the ippdefs.h header file.<br /><br />This is not the only method available to determine the SIMD instructions supported by your runtime processor, there are other methods, such as your compiler's cpuid intrinsic; this is just one example.</p>

<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
 ]]></description>
      <link>http://software.intel.com/en-us/articles/generic-library-dispatching-with-the-ipp-70-library/</link>
      <pubDate>Tue, 03 May 2011 09:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/generic-library-dispatching-with-the-ipp-70-library/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/generic-library-dispatching-with-the-ipp-70-library/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>Intel® Atom™ Processors support in the Intel® Integrated Performance Primitives (Intel® IPP) Library</title>
      <description><![CDATA[ <p><em>All versions of the Intel® IPP library will run on Intel® Atom™ processors. The table below represents the Intel IPP library functions that have been "hand-tuned" for optimal performance on Intel Atom processors in version 7.0.2 of the library.</em></p>
<p>Hand-tuned optimizations designed to maximize performance of the Intel IPP library on Intel Atom processors were added beginning with v6.0 of the Intel IPP library. For maximum performance on Intel Atom processors, we recommend that you upgrade to version 7.0 of the Intel IPP library.</p>
<p>Both static and dynamic/shared libraries in v7.0 of the Intel IPP library include Intel Atom processor optimizations. Applications linked with versions 7.0 of the Intel IPP library will be dispatched to the "<strong>s8</strong>" optimized library for IA-32 and the "<strong>n8</strong>" library for Intel® 64 whenever your application executes on an Intel Atom processor.</p>
<blockquote>
<p>For more information regarding dispatching please see <em><a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp">Understanding CPU Dispatching in the Intel® IPP Library</a></em> or check the Intel IPP <em>Getting_Started.htm</em> and <em>userguide_*.pdf</em> files in the <a target="_blank" href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-documentation/">Intel IPP documentation</a>.</p>
</blockquote>
<p>In the v6.x Intel IPP library, only the dynamic/shared libraries contain Intel Atom processor optimizations; there are no Intel Atom processor optimizations in the v6.x dispatched static libraries. However, the v6.x dispatched static libraries will safely run on an Intel Atom processor by dispatching to the <strong>v8/u8</strong> libraries optimized for the Merom microarchitecture (Intel Core 2 processor), which is also designed for use with the same Intel Supplemental SSE3 SIMD instruction set (SSSE3) that Intel Atom processors support. A separate non-dispatched static Intel IPP library for Linux* is available on the IA-32 platform (and as part of the Intel Atom SDK).</p>
<p>The following list of functions have been hand-optimized for the Intel Atom processor for the version of the Intel IPP library listed at the beginning of this article.</p>
<blockquote>
<p><em><strong>Note:</strong> every Intel IPP library primitive is available for use with the Intel Atom processor, this list simply shows those functions which have been specially hand-tuned for the Intel Atom processor; hand-tuning is not required to achieve optimum performance for all IPP functions. If you have some specific Intel IPP functions that are not listed in the following table, and would like to see them added to the priority list for Atom optimization, please create a thread on the <a target="_blank" href="http://software.intel.com/en-us/forums/intel-integrated-performance-primitives/">IPP forum</a> stating which functions you would like to see added to the Atom optimization priority list.</em></p>
</blockquote>
<div align="center">
<table width="600" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="left" valign="top">
<table width="275" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td>
<p><b>Signal Processing</b></p>
<pre>ippsAddProduct_32f
ippsAddProduct_32fc
ippsAddProduct_32s_Sfs
ippsAddProduct_64f
ippsAddProduct_64fc
ippsAdd_32f    
ippsAdd_32fc   
ippsAdd_64f    
ippsAdd_64fc
ippsAdd_32f_I
ippsAdd_32fc_I
ippsAdd_32s_Sfs
ippsConvert_16s32f
ippsConvert_16u32f
ippsConvert_32f16s_Sfs
ippsConvert_32f16u_Sfs
ippsConvert_32f32s_Sfs
ippsConvert_32f8s_Sfs
ippsConvert_32f8u_Sfs
ippsConvert_32s32f
ippsConvert_32s64f
ippsConvert_64f32s_Sfs
ippsConvert_8s32f
ippsConvert_8u32f
ippsCopy_16s
ippsCopy_64s
ippsDFTFwd_CToC_32fc
ippsDFTFwd_CToC_64fc
ippsDiv_16sc_Sfs
ippsDiv_16s_Sfs
ippsDiv_16u_Sfs
ippsDiv_32f
ippsDiv_32fc
ippsDiv_32s16s_Sfs
ippsDiv_32s_Sfs
ippsDiv_64f
ippsDiv_64fc
ippsDotProd_32f32fc64fc
ippsDotProd_32f64f
ippsDotProd_32fc64fc
ippsDotProd_64f
ippsDotProd_64f64fc
ippsDotProd_64fc
ippsFFTFwd_CToC_32fc
ippsFFTFwd_CToC_64fc
ippsFilterMedian_32f
ippsFilterMedian_32s
ippsFilterMedian_64f
ippsFIR32f_16s_Sfs
ippsFIR64f_16s_Sfs
ippsFIR64f_32s_Sfs
ippsFIR_32f
ippsFIR_64f
ippsJoin_32f16s_D2L
ippsLShiftC_32s_I
ippsMax_32s
ippsMean_32f
ippsMin_32s
ippsMul_32f       
ippsMul_32fc      
ippsMul_64f       
ippsMul_64fc
ippsMul_32f_I
ippsMulC_32f
ippsMulC_32f_I
ippsNorm_L2_32f
ippsNormDiff_L2_32f
ippsRShiftC_32s_I
ippsSampleUp_32f
ippsScale_32f_I
ippsSqr_32f
ippsSqr_32f_I
ippsSqr_32fc
ippsSqr_64f
ippsSqr_64fc
ippsSqrt_16s_Sfs
ippsSqrt_16u_Sfs
ippsSqrt_32f
ippsSqrt_32f_I
ippsSqrt_32fc
ippsSqrt_64f
ippsSqrt_64fc
ippsSub_32s_Sfs
ippsSub_32f       
ippsSub_32f_I
ippsSub_32fc      
ippsSub_64f       
ippsSub_64fc      
ippsSum_32f
ippsSum_32f
ippsThreshold_LTVal_32f_I
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><strong>Speech Coding</strong></p>
<pre>ippsAdaptiveCodebookSearch_RTA_32f
ippsSBADPCMEncode_G722_16s
ippsSBADPCMDecode_G722_16s
ippsDCTFwd_G7221_16s
ippsDCTInv_G7221_16s
ippsDecomposeDCTToMLT_G7221_16s
ippsDecomposeMLTToDCT_G7221_16s
ippsEnvelopFrequency_G7291_16s
ippsFilterHighpass_G7291_16s
ippsFilterLowpass_G7291_16s
ippsFIRSubbandLow_EC_32sc_Sfs
ippsFIRSubbandLowCoeffUpdate_EC_32sc_I
ippsFixedCodebookSearch_RTA_32f
ippsFixedCodebookSearchRandom_RTA_32f
ippsLSPToLPC_RTA_32f
ippsLSPQuant_RTA_32f
ippsMDCTFwd_G7291_16s
ippsMDCTPostProcess_G7291_16s
ippsQMFDecode_G722_16s
ippsQMFDecode_G7291_16s
ippsQMFEncode_G722_16s
ippsQMFEncode_G7291_16s
ippsSubbandAnalysis_16s32sc_Sfs
ippsSubbandController_EC_32f
ippsSubbandControllerUpdate_EC_32f
ippsSubbandSynthesis_32sc16s_Sfs
ippsTiltCompensation_G7291_16s
ippsToeplizMatrix_G729_32f
ippsToneDetect_EC_32f
</pre>
<p> </p>
</td>
</tr>
<tr>
<td align="left" valign="top">
<p><strong>Data Compression</strong></p>
<pre>ippsCRC32_8u
ippsEncodeRLE_BZ2_8u
ippsReduceDictionary_8u_I
ippsVLCCountBits_16s32s
ippsVLCDecodeOne_1u16s
ippsVLCDecodeUTupleBlock_1u16s
ippsVLCDecodeUTupleOne_1u16s
ippsVLCEncodeInit_32s
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><strong>Audio Coding</strong></p>
<pre>ippsMDCTInvWindow_MP3_32s
ippsPow43Scale_16s32s_Sf
ippsPow43_16s32f
ippsPredictCoef_SBR_C_32fc_D2L
ippsSynthesisDownFilter_SBR_CToR_32fc32f_D2L
ippsSynthesisDownFilter_SBR_RToR_32f_D2L
ippsSynthesisFilter_PQMF_MP3_32f
ippsSynthesisFilter_SBR_CToR_32fc32f_D2L
ippsSynthesisFilter_SBR_RToR_32f_D2L
ippsVLCDecodeEscBlock_AAC_1u16s
ippsVLCDecodeEscBlock_MP3_1u16s
ippsVLCDecodeUTupleEscBlock_AAC_1u16s
ippsVLCDecodeUTupleEscBlock_MP3_1u16s
</pre>
<p> </p>
</td>
</tr>
</tbody>
</table>
</td>
<td align="left" valign="top">
<table width="275" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="left" valign="top">
<p><b>Image Processing</b></p>
<pre>ippiAdd_16s_C1IRSfs
ippiConvert_16u32f_C1R 
ippiConvert_16u8u_C3R 
ippiConvert_8u16s_C1R
ippiConvert_8u16u_C1R 
ippiCopyReplicateBorder_16s_C1R
ippiCopy_8u_C1R 
ippiCopy_8u_C3R 
ippiCopy_8u_C4R 
ippiDilate_32f_AC4R 
ippiDilate_32f_C1R 
ippiDilate_32f_C3R 
ippiDiv_16s_AC4RSfs 
ippiDiv_16s_C1RSfs 
ippiDiv_16s_C3RSfs 
ippiDiv_16s_C4RSfs 
ippiDiv_16u_AC4RSfs 
ippiDiv_16u_C1RSfs 
ippiDiv_16u_C3RSfs 
ippiDiv_16u_C4RSfs 
ippiDiv_32f_AC4R 
ippiDiv_32f_C1R 
ippiDiv_32f_C3R 
ippiDiv_32f_C4R 
ippiErode_32f_AC4R 
ippiErode_32f_C1R 
ippiErode_32f_C3R 
ippiFilter32f_8u_AC4R
ippiFilter32f_8u_C1R 
ippiFilter32f_8u_C3R 
ippiFilter32f_8u_C4R 
ippiFilterGauss_16s_AC4R 
ippiFilterGauss_16s_C1R 
ippiFilterGauss_16s_C3R 
ippiFilterGauss_16s_C4R 
ippiFilterGauss_32f_AC4R 
ippiFilterGauss_32f_C1R 
ippiFilterGauss_32f_C3R 
ippiFilterGauss_32f_C4R 
ippiFilter_16s_AC4R 
ippiFilter_16s_C1R 
ippiFilter_16s_C3R 
ippiFilter_16s_C4R 
ippiFilter_16u_AC4R 
ippiFilter_16u_C1R 
ippiFilter_16u_C3R 
ippiFilter_16u_C4R 
ippiFilter_8u_AC4R 
ippiFilter_8u_C1R 
ippiFilter_8u_C3R 
ippiFilter_8u_C4R 
ippiGetPerspectiveQuad
ippiMirror_16u_C1IR
ippiMirror_16u_C4R 
ippiMirror_32s_C4R 
ippiMirror_8u_C4R 
ippiMul_32f_AC4R 
ippiMul_32f_C1R 
ippiMul_32f_C3R 
ippiMul_32f_C4R 
ippiSet_16u_C3R 
ippiSet_16u_C4R 
ippiSet_32f_C1R 
ippiSet_32f_C3R 
ippiSet_8u_C3R 
ippiSet_8u_C4R 
ippiSqrt_16s_AC4RSfs 
ippiSqrt_16s_C1RSfs 
ippiSqrt_16s_C3RSfs 
ippiSqrt_16u_AC4RSfs 
ippiSqrt_16u_C1RSfs 
ippiSqrt_16u_C3RSfs 
ippiSqrt_32f_AC4R 
ippiSqrt_32f_C1R 
ippiSqrt_32f_C3R 
ippiSqr_32f_AC4R 
ippiSqr_32f_C1R 
ippiSqr_32f_C3R 
ippiSqr_32f_C4R 
ippiSub_16s_C1IRSfs
</pre>
<p> </p>
</td>
</tr>
<tr>
<td align="left" valign="top">
<p><b>Color Conversion</b></p>
<pre>ippiBGR555ToYCbCr420_16u8u_C3P3R
ippiBGR555ToYCbCr422_16u8u_C3C2R
ippiBGR555ToYCbCr422_16u8u_C3P3R
ippiBGR555ToYCrCb420_16u8u_C3P3R
ippiBGR555ToYUV420_16u8u_C3P3R
ippiBGR565ToYCbCr411_16u8u_C3P3R
ippiBGR565ToYCbCr420_16u8u_C3P3R
ippiBGR565ToYCbCr422_16u8u_C3C2R
ippiBGR565ToYCbCr422_16u8u_C3P3R
ippiBGR565ToYCrCb420_16u8u_C3P3R
ippiBGR565ToYUV420_16u8u_C3P3R
ippiBGRToCbYCr422_8u_AC4C2R
ippiBGRToHLS_8u_AC4P4R
ippiBGRToHLS_8u_AP4C4R
ippiBGRToHLS_8u_AP4R
ippiBGRToHLS_8u_C3P3R
ippiBGRToHLS_8u_P3C3R
ippiBGRToHLS_8u_P3R
ippiBGRToYCbCr422_8u_AC4C2R
ippiBGRToYCbCr422_8u_AC4P3R
ippiBGRToYCbCr422_8u_C3C2R
ippiBGRToYCbCr422_8u_C3P3R
ippiCbYCr422ToBGR_8u_C2C4R
ippiHLSToBGR_8u_AC4P4R
ippiHLSToBGR_8u_AP4C4R
ippiHLSToBGR_8u_AP4R
ippiHLSToBGR_8u_C3P3R
ippiHLSToBGR_8u_P3C3R
ippiHLSToBGR_8u_P3R
ippiRGB565ToYUV422_16u8u_C3P3R
ippiRGBToCbYCr422Gamma_8u_C3C2R
ippiRGBToCbYCr422_8u_C3C2R
ippiRGBToYCbCr422_8u_C3C2R
ippiRGBToYCbCr422_8u_C3P3R
ippiRGBToYCbCr_8u_P3R
ippiRGBToYCrCb422_8u_P3C2R
ippiRGBToYUV420_8u_P3
ippiRGBToYUV420_8u_P3R
ippiRGBToYUV422_8u_C3C2R
ippiRGBToYUV422_8u_C3P3
ippiRGBToYUV422_8u_C3P3R
ippiRGBToYUV422_8u_P3
ippiRGBToYUV422_8u_P3R
ippiRGBToYUV_8u_AC4R
ippiRGBToYUV_8u_C3R
ippiRGBToYUV_8u_P3R
ippiYCbCr422To420_Interlace_8u_P3R
ippiYCbCr422ToBGR_8u_C2C3R
ippiYCbCr422ToBGR_8u_C2P3R
ippiYCbCrToRGB_8u_P3R
ippiYUV422ToRGB_8u_P3C3
ippiYUV422ToRGB_8u_P3C3R
ippiYUVToRGB_8u_P3C3R
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><strong>Video Coding</strong></p>
<pre>ippiReconstructLumaIntra4x4_H264High_32s16u_IP1R
ippiFilterDeblockingLumaVerEdge_H264_16u_C1IR
ippiFilterDeblockingLumaHorEdge_H264_16u_C1IR
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><strong>Miscellaneous</strong></p>
<pre>ippsFindCAny_8u
ippmInvert_m_32f
ippmMul_tm_32f
</pre>
<p> </p>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
<p><br />Functions not listed above are either hand-optimized for the Merom microarchitecture (SSSE3) or for prior SIMD instruction sets that are compatible with the Intel Atom processor (such as SSE2). In addition, the entire Intel Atom optimized library is <em>compiler-optimized</em> for the Intel Atom processor using the Intel Compiler <em>xSSE3_ATOM</em> switch (enable Atom optimizations) in order to take advantage of features unique to the Intel Atom processor.</p>
<p>Please see <em><a target="_blank" href="http://software.intel.com/en-us/articles/atom-optimized-compiler/">Optimized for the Intel® Atom™ Processor with Intel's Compiler</a></em> for more information and check out the <a href="http://software.intel.com/en-us/intel-parallel-studio-home/">Intel Parallel Studio web site</a> where you can learn more about the tools available to develop, debug, and tune your multi-threaded applications.</p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/new-atom-support/</link>
      <pubDate>Mon, 31 Jan 2011 09:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/new-atom-support/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/new-atom-support/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)</title>
      <description><![CDATA[ <ul>
<blockquote><i>
<li>The table below reflects the Intel AVX support provided in the Intel IPP 7.0.2 library release.</li>
<li>Intel AVX optimized code is available in both the 32-bit and 64-bit editions of the 7.0 library. </li>
<li>There is very limited support for Intel AVX in the 6.1 library; if you plan to use Intel IPP on an Intel AVX platform you should upgrade to the 7.0 version of the Intel IPP library. </li>
</i></blockquote>
</ul>
<p><a target="_blank" href="http://www.intel.com/software/avx">Intel® AVX (Intel® Advanced Vector Extensions)</a> is a 256-bit instruction set extension to SSE designed to provide even higher performance for applications that are floating-point intensive. Intel AVX adds new functionality to the the existing Intel SIMD instruction set (based on SSE) and includes a more compact SIMD encoding format. A large number (200+) of Intel SSEx instructions have been "upgraded" in AVX to take advantage of features like a distinct destination operand and flexible memory alignment. Approximately 100 of the legacy 128-bit Intel SSEx instructions have been promoted to process 256-bit vector data. In addition, approximately 100 new data processing and arithmetic operations, not present in the legacy Intel SSEx SIMD instruction set, have been added.</p>
<p>The primary benefits of Intel AVX are:</p>
<ul>
<li>Support for wider vector data (up to 256-bit). </li>
<li>Efficient instruction encoding scheme that supports 3 and 4 operand instruction syntaxes. </li>
<li>Flexible programming environment, ranging from branch handling to relaxed memory alignment requirements. </li>
<li>New data manipulation and arithmetic compute primitives, including broadcast, permute, fused-multiply-add, etc.<br /><span ><span ><br /></span></span></li>
</ul>
<hr />
<p><em><span ><br /></span>ippGetCpuFeatures()</em> reports information regarding the SIMD features available to your processor. Alternatively, <em>ippGetCpuType()</em> detects the processor type in your system. A return value of <em>ippCpuAVX</em> means your processor supports the Intel AVX instruction set. These functions are declared in <i>ippcore.h</i>.</p>
<p>Mask the value returned by <i>ippGetCpuFeatures()</i> with <em>ippCPUID_AVX<span > (0x0100</span></em>) to determine if the Intel AVX SIMD instructions are supported by your processor (ippGetCpuFeatures() &amp; ippCPUID_AVX is TRUE). To determine if your operating system <span >also</span> supports the Intel AVX instructions (saves the extended SIMD registers), mask the returned value from <i>ippGetCpuFeatures()</i> with <i>ippAVX_ENABLEDBYOS </i>(0x0200). <span >Both</span> conditions (i.e., CPU and OS support) must be met before your application can utilize the Intel AVX SIMD instructions.</p>
<hr />
<p><br />The Intel IPP library has been optimized for a variety of SIMD instruction sets. Automatic "dispatching" detects the SIMD instruction set that is available on the running processor and selects the optimal SIMD instructions for that processor. Please review <i><a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/">Understanding CPU Dispatching in the Intel® IPP Library</a><span > for more information regarding dispatching.</span></i></p>
<p>Intel AVX optimization in the Intel IPP library consists of "hand-optimized" and "compiler-tuned" functions – code that has been directly optimized for the Intel AVX instruction set. Given the large number of primitives in the Intel IPP library, it is impossible to directly optimize every Intel IPP function for the large set of new instructions represented by the Intel AVX instruction set within the period of a single product release or update (processor-specific optimizations may also take into consideration cache size and number of cores/threads). Therefore, the functions in the table below represent those that either receive the greatest benefit from the new Intel AVX instructions or are the most widely used by Intel IPP customers.</p>
<blockquote>
<p>If you have some specific Intel IPP functions that are not listed in the following table, and would like to see them added to the priority list for future AVX optimization, please create a thread on the <a target="_blank" href="http://software.intel.com/en-us/forums/intel-integrated-performance-primitives/">IPP forum</a> stating which functions you would like to see added to the AVX optimization priority list.</p>
</blockquote>
<p>Functions directly optimized for Intel AVX are added to the table below as they become available with each new release or update of the library.</p>
<p>The following conventions are used in the table below to allow multiple similar functions to be denoted on a single line:</p>
<ul>
<li>{x} - Braces enclose a required (function name) element. </li>
<li>[x] - Square brackets enclose an optional (function name) element. </li>
<li>| - A vertical line indicates an exclusive choice within a set of optional or required elements. </li>
<li>{x|y|z} - Example of three mutually exclusive choices within a required element in the function name. </li>
<li>[x|y|z] - Example of three mutually exclusive choices within an optional element in the function name. </li>
</ul>
<div align="center">
<table width="700" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="left" valign="top">
<table width="300" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td>
<p><b>Signal Processing</b></p>
<pre>ippsAbs_{16s|32s|32f|64f}[_I] 
ippsAdd_{32f|32fc|64f|64fc}[_I] 
ippsAddC_{32f|64f}[_I] 
ippsAddProductC_32f 
ippsAddProduct_{32fc|64f|64fc} 
ippsAutoCorr_{32f|64f}
ippsConv_32f 
ippsConvert_{8s|8u|16s|16u|32s|64f}32f 
ippsConvert_{32s|32f}64f 
ippsConvert_32f{8s|8u|16s|16u}_Sfs 
ippsConvert_64f32s_Sfs 
ippsCopy_{16s|32s|32f|64f} 
ippsCrossCorr_{32f|64f} 
ippsDFTFwd_CToC_{32f|32fc|64f|64fc} 
ippsDFTFwd_RTo{CCS|Pack|Perm}_{32f|64f} 
ippsDFTInv_CCSToR_{32f|64f} 
ippsDFTInv_CToC_{32f|32fc|64f|64fc} 
ippsDFTInv_{Pack|Perm}ToR_{32f|64f} 
ippsDFTOutOrd{Fwd|Inv}_CToC_{32fc|64fc} 
ippsDiv[C]_32f[_I] 
ippsDotProd_32f64f 
ippsFFTFwd_CToC_{32f|32fc|64f|64fc}[_I] 
ippsFFTFwd_RTo{CCS|Pack|Perm}_{32f|64f}[_I] 
ippsFFTInv_CCSToR_{32f|64f}[_I] 
ippsFFTInv_CToC_{32f|32fc|64f|64fc}[_I] 
ippsFFTInv_{Pack|Perm}ToR_{32f|64f}[_I] 
ippsFIR64f_32f[_I] 
ippsFIR64fc_32fc[_I] 
ippsFIRLMS_32f 
ippsFIR_{32f|32fc|64f|64fc}[_I] 
ippsIIR32fc_16sc_[I]Sfs 
ippsIIR64fc_32fc[_I] 
ippsIIR_32f[_I] 
ippsLShiftC_16s_I 
ippsMagnitude_16sc_Sfs 
ipps{Min|Max}Indx_{32f|64f} 
ippsMul_32fc[_I] 
ippsMul[C]_{32f|32fc|64f|64fc}[_I] 
ippsMulC_64f64s_ISfs 
ipps{Not|Or}_8u 
ippsPhase_{16s|16sc|32sc}_Sfs 
ippsPowerSpectr_{32f|32fc} 
ippsRShiftC_16u_I 
ippsSet_{8u|16s|32s} 
ippsSqr_{8u|16s|16u|16sc}_[I]Sfs 
ippsSqr_{32f|32fc|64f|64fc}[_I] 
ippsSqrt_32f[_I] 
ippsSub_{32f|32fc|64f|64fc}[_I] 
ippsSubC_{32f|32fc|64f|64fc}[_I] 
ippsSubCRev_{32f|32fc|64f|64fc}[_I] 
ippsSum_{32f|64f} 
ippsThreshold_{32f|GT_32f|LT_32f}_[_I] 
ippsThreshold_{GT|LT}Abs_{32f|64f}[_I] 
ippsThreshold_GTVal_32f[_I] 
ippsWinBartlett_{32f|32fc|64f|64fc}[_I] 
ippsWinBlackman_{32f|64f|64fc}[_I] 
ippsWinBlackmanOpt_{32f|64f|64fc}[_I] 
ippsWinBlackmanStd_{32f|64f|64fc}[_I] 
ippsWinKaiser_{32f|64f|64fc}[_I] 
ippsZero_{8u|16s|32f}
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><b>SPIRAL (GEN) Functions</b></p>
<pre>ippgDFTFwd_CToC_8_64fc ippgDFTFwd_CToC_12_64fc 
ippgDFTFwd_CToC_16_{32fc|64fc}
ippgDFTFwd_CToC_20_64fc
ippgDFTFwd_CToC_24_64fc
ippgDFTFwd_CToC_28_64fc 
ippgDFTFwd_CToC_32_{32fc|64fc}
ippgDFTFwd_CToC_36_64fc
ippgDFTFwd_CToC_40_64fc
ippgDFTFwd_CToC_44_64fc 
ippgDFTFwd_CToC_48_{32fc|64fc}
ippgDFTFwd_CToC_52_64fc 
ippgDFTFwd_CToC_56_64fc 
ippgDFTFwd_CToC_60_64fc 
ippgDFTFwd_CToC_64_{32fc|64fc} 
ippgDFTInv_CToC_8_64fc 
ippgDFTInv_CToC_12_64fc 
ippgDFTInv_CToC_16_{32fc|64fc} 
ippgDFTInv_CToC_20_64fc 
ippgDFTInv_CToC_24_64fc 
ippgDFTInv_CToC_28_64fc 
ippgDFTInv_CToC_32_{32fc|64fc} 
ippgDFTInv_CToC_36_64fc 
ippgDFTInv_CToC_40_64fc 
ippgDFTInv_CToC_44_64fc 
ippgDFTInv_CToC_48_{32fc|64fc} 
ippgDFTInv_CToC_52_64fc 
ippgDFTInv_CToC_56_64fc 
ippgDFTInv_CToC_60_64fc 
ippgDFTInv_CToC_64_{32fc|64fc}
</pre>
<p> </p>
</td>
</tr>
<tr>
<td align="left" valign="top">
<p><b>Audio Coding</b></p>
<pre>iippsDeinterleave_32f
</pre>
<p> </p>
</td>
</tr>
<tr>
<td align="left" valign="top">
<p><b>Speech Coding</b></p>
<pre>ippsAdaptiveCodebookSearch_RTA_32f
ippsFixedCodebookSearch_RTA_32f
ippsFixedCodebookSearchRandom_RTA_32f
ippsHighPassFilter_RTA_32f
ippsLSPQuant_RTA_32f
ippsLSPToLPC_RTA_32f
ippsPostFilter_RTA_32f_I
ippsQMFDecode_RTA_32f
ippsSynthesisFilter_G729_32f
</pre>
<p> </p>
</td>
</tr>
<tr>
<td align="left" valign="top">
<p><b>Color Conversion</b></p>
<pre>ippiRGBToHLS_8u_AC4R
ippiRGBToHLS_8u_C3R
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><b>Realistic Rendering</b></p>
<pre>ipprCastEye_32f
ipprCastShadowSO_32f
ipprDot_32f_P3C1M
ipprHitPoint3DEpsM0_32f_M
ipprHitPoint3DEpsS0_32f_M
ipprMul_32f_C1P3IM
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><b>Computer Vision</b></p>
<pre>ippiEigenValsVecs_[8u]32f_C1R 
ippiFilterGaussBorder_32f_C1R 
ippiMinEigenVal_[8u]32f_C1R 
ippiNorm_Inf_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNorm_L1_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNorm_L2_{8u|8s|16u|32f}_C{1|3C}MR 
ippiNormRel_L2_32f_C3CMR 
ippiUpdateMotionHistory_[8u|16u]32f_C1IR
</pre>
<p> </p>
</td>
</tr>
</tbody>
</table>
</td>
<td align="left" valign="top">
<table width="350" cellpadding="2" cellspacing="0" border="0">
<tbody>
<tr>
<td align="left" valign="top">
<p><b>Image Processing</b></p>
<pre>ippiAddC_32f_C1[I]R 
ippiConvert_32f* 
ippiCopy_16s* 
ippiCopy_8u* 
ippiConvFull_32f_{AC4|C1|C3}R 
ippiConvValid_32f_{AC4|C1|C3}R 
ippiCrossCorrFull_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_64f_C1R 
ippiCrossCorrFull_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrFull_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrFull_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrSame_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrSame_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrSame_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrValid_{8u32f|8s32f|16u32f|32f}_C1R 
ippiCrossCorrValid_NormLevel_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_64f_C1R 
ippiCrossCorrValid_NormLevel_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_NormLevel_8u_{AC4|C1|C3|C4}RSfs 
ippiCrossCorrValid_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiCrossCorrValid_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiDCT8x8FwdLS_8u16s_C1R 
ippiDCT8x8Fwd_16s_C1[I|R] 
ippiDCT8x8Fwd_32f_C1[I] 
ippiDCT8x8Fwd_8u16s_C1R 
ippiDCT8x8InvLSClip_16s8u_C1R 
ippiDCT8x8Inv_16s8u_C1R 
ippiDCT8x8Inv_16s_C1[I|R] 
ippiDCT8x8Inv_2x2_16s_C1[I] 
ippiDCT8x8Inv_32f_C1[I] 
ippiDCT8x8Inv_4x4_16s_C1[I] 
ippiDCT8x8Inv_A10_16s_C1[I] 
ippiDCT8x8To2x2Inv_16s_C1[I] 
ippiDCT8x8To4x4Inv_16s_C1[I] 
ippiDFTFwd_CToC_32fc_C1[I]R 
ippiDFTFwd_RToPack_32f_{AC4|C1|C3|C4}[I]R 
ippiDFTFwd_RToPack_8u32s_{AC4|C1|C3|C4}RSfs 
ippiDFTInv_CToC_32fc_C1[I]R 
ippiDFTInv_PackToR_32f_{AC4|C1|C3|C4}[I]R 
ippiDFTInv_PackToR_32s8u_{AC4|C1|C3|C4}RSfs 
ippiDilate3x3_32f_C1[I]R 
ippiDilate3x3_64f_C1R 
ippiDivC_32f_C1[I]R 
ippiDiv_32f_{C1|C3}[I]R 
ippiDotProd_32f64f_{C1|C3}R 
ippiErode3x3_64f_C1R 
ippiFFTFwd_CToC_32fc_C1[I]R 
ippiFFTFwd_RToPack_32f_{AC4|C1|C3|C4}[I]R 
ippiFFTFwd_RToPack_8u32s_{AC4|C1|C3|C4}RSfs 
ippiFFTInv_CToC_32fc_C1[I]R 
ippiFFTInv_PackToR_32f_{AC4|C1|C3|C4}[I]R 
ippiFFTInv_PackToR_32s8u_{AC4|C1|C3|C4}RSfs 
ippiFilter_32f_{C1|C3|C4}R 
ippiFilter_32f_AC4R 
ippiFilter_64f_{C1|C3}R 
ippiFilter32f_{8s|8u|16s|16u|32s}_C{1|3|4}R 
ippiFilter32f_{8u|16s|16u}_AC4R 
ippiFilter32f_{8s|8u}16s_C{1|3|4}R 
ippiFilterBox_8u_{C1|C3}R 
ippiFilterBox_32f_{C1|C4|AC4}R 
ippiFilterColumn32f_{8u|16s|16u}_{C1|C3|C4|AC4}R 
ippiFilterColumn_32f_{C1|C3|C4|AC4}R 
ippiFilterGauss_32f_{C1|C3}R 
ippiFilterHipass_32f_{C1|C3|C4|AC4}R 
ippiFilterLaplace_32f_{C1|C3|C4|AC4}R 
ippiFilterLowpass_32f_{C1|C3|AC4}R 
ippiFilterMax_32f_{C1|C3|C4|AC4}R 
ippiFilterMedian_32f_C1R 
ippiFilterMin_32f_{C1|C3|C4|AC4}R 
ippiFilterRow_32f_{C1|C3|C4|AC4}R 
ippiFilterRow32f_{8u|16s|16u}_{C1|C3|C4|AC4}R 
ippiFilterSobelHoriz_32f_{C1|C3}R 
ippiFilterSobelVert_32f_{C1|C3}R 
ippiMean_32f_{C1|C3}R 
ippiMulC_32f_C1[I]R 
ippiMul_32f_{C1|C3|C4}[I]R 
ippiResizeSqrPixel_{32f|64f}_{C1|C3|C4|AC4}R 
ippiResizeSqrPixel_{32f|64f}_{P3|P4}R 
ippiSqrDistanceFull_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceFull_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrDistanceSame_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceSame_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrDistanceValid_Norm_16u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8s32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8u32f_{AC4|C1|C3|C4}R 
ippiSqrDistanceValid_Norm_8u_{AC4|C1|C3|C4}RSfs 
ippiSqrt_32f_C1R 
ippiSqrt_32f_C3IR 
ippiSubC_32f_C1[I]R 
ippiSub_32f_{C1|C3|C4}[I]R 
ippiSum_32f_C{1|3}R 
ippiTranspose_32f_C1R
</pre>
<p> </p>
</td>
</tr>
<tr>
<td>
<p><b>Image Compression</b></p>
<pre>ippiPCTFwd_JPEGXR_32f_C1IR 
ippiPCTFwd16x16_JPEGXR_32f_C1IR 
ippiPCTFwd8x16_JPEGXR_32f_C1IR 
ippiPCTFwd8x8_JPEGXR_32f_C1IR 
ippiPCTInv_JPEGXR_32f_C1IR_128 
ippiPCTInv16x16_JPEGXR_32f_C1IR 
ippiPCTInv8x16_JPEGXR_32f_C1IR 
ippiPCTInv8x8_JPEGXR_32f_C1IR
</pre>
<p> </p>
</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</div>
<p>Those functions that have not been directly optimized for AVX (i.e., functions that do not appear in the table) have been compiled using the Intel Compiler "xG" switch (enable AVX optimization). Additional performance improvements are achieved by adherence to an AVX ABI (application binary interface) feature that inserts the special AVX "vzeroupper" instruction after any function with AVX code to eliminate any AVX to SSE transition penalties.</p>
<p>For those functions that are not directly optimized for AVX, the g9/e9 library utilizes optimizations from prior compatible SSE optimizations, such as those tuned for the p8/y8 libraries and preceding SIMD optimizations (e.g., SSE4.x, AES-NI and SSE2/3). Thus, functions not listed above will include the highest level of directly optimized code based on the AES-NI, SSE4.x, SSSE3, SSE3 and SSE2 SIMD instruction sets, wherever applicable.</p>
<p>For more information about the g9/e9 optimization layer and Intel AVX in the Intel IPP library, please refer to the <i><a target="_blank" href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-documentation/">Intel Integrated Performance Primitives for Windows* OS on Intel® 64 Architecture 'User's Guide'</a></i>.</p>
<p>Review <a href="http://software.intel.com/en-us/articles/how-to-compile-for-intel-avx/"><em>How to Compile for Intel® AVX</em></a> for more information and check out the <a href="http://software.intel.com/en-us/intel-parallel-studio-home/">Intel Parallel Studio web site</a> where you can learn more about the tools available to develop, debug, and tune your multi-threaded applications.</p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/intel-ipp-functions-optimized-for-intel-avx-intel-advanced-vector-extensions/</link>
      <pubDate>Mon, 31 Jan 2011 09:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/intel-ipp-functions-optimized-for-intel-avx-intel-advanced-vector-extensions/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/intel-ipp-functions-optimized-for-intel-avx-intel-advanced-vector-extensions/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>Information about the FTC Decision and Order on the Intel® Compilers Reimbursement Fund</title>
      <description><![CDATA[ Information on the Intel Compiler Reimbursement Fund referenced in Section VII.D of the FTC Decision and Order is available now. Please see the site, <a href="http://www.CompilerReimbursementProgram.com">www.CompilerReimbursementProgram.com</a>, for further information. ]]></description>
      <link>http://software.intel.com/en-us/articles/information-about-the-ftc-decision-and-order-on-the-intel-compilers-reimbursement-fund/</link>
      <pubDate>Mon, 01 Nov 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/information-about-the-ftc-decision-and-order-on-the-intel-compilers-reimbursement-fund/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/information-about-the-ftc-decision-and-order-on-the-intel-compilers-reimbursement-fund/</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Software Development Tool Suites for Intel® Atom™ Processor Knowledge Base</category>
      <category>Intel® Fortran Compiler for Linux* Knowledge Base</category>
      <category>Intel® Fortran Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
      <category>Intel® Math Kernel Library Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
      <category>Intel® Visual Fortran Compiler for Windows* Knowledge Base</category>
    </item>
    <item>
      <title>A Tool for Listing the Intel IPP Functions used by Your Application</title>
      <description><![CDATA[ <blockquote>
<p><strong>20 sep 2011 – article update</strong> – the ZIP file has been updated to include a missing DLL and it also now contains both AWK and GAWK executables. If you already have the necessary UNIX utilities on your Windows system (e.g., Cygwin or MinGW) you may not need the contents of the ZIP file. <em><strong>In any case please download the "ipp-fn-survey.awk" script file, it is required regardless of the UNIX utilities you use to run the script!</strong></em> On Windows, this AWK script has only been verified against the old GnuWin32 utils provided here. The GnuWin32 utils distributed in the attached ZIP file can be found here: <a href="http://gnuwin32.sourceforge.net/packages.html" title="http://gnuwin32.sourceforge.net/packages.html">http://gnuwin32.sourceforge.net/packages.html</a>.</p>
</blockquote>
<p>In the interest of developing a list of "most frequently used" functions by customers of the Intel IPP library, we have created a simple <a href="http://software.intel.com/file/31656">AWK script</a> that can be used on either a Windows or Linux system to extract the names of the Intel IPP functions used in your application, without having to reveal any of your application source.</p>
<p>The number of Intel IPP functions present in the library is quite large; as a result, optimizing the entire library for new SIMD architectures must be performed in phases. In general, we would like to optimize first those functions that are most popular and relevant to our customers, in order to insure that your applications receive the maximum benefit as new SIMD architectures and/or extensions to those architectures are introduced.</p>
<p>Attached to this KB article are two files:</p>
<ul>
<li><a target="_blank" href="http://software.intel.com/file/31656">ipp-fn-survey.awk</a> </li>
<li><a target="_blank" href="http://software.intel.com/file/38595">GnuWin32-utils.zip</a> </li>
</ul>
<p>The AWK script scans C/C++ source code to identify any function that conforms to the standard naming scheme used by the Intel IPP library. This AWK script is provided in source format, so you can inspect the script to determine exactly what it does.</p>
<p>In essence, this script:</p>
<ol>
<li>removes /* */ and // style comments to avoid false detects within comments </li>
<li>removes " " string constants to avoid false detects within strings </li>
<li>searches for and lists the names of all Intel IPP "core" functions </li>
<li>searches for and lists the names of all Intel IPP "administrative" functions </li>
</ol>
<p>The script is not perfect and may miss a few Intel IPP functions. Please see the comments inside the AWK script for more details on known or suspected issues.</p>
<blockquote>
<p>If the source tree you scan includes the IPP header files (ippac.h, ippcc.h, ippch.h, etc.) be sure to exclude those files from the search; otherwise, you will include every IPP function listed in those headers, even if you are not using them in your source code.</p>
</blockquote>
<p class="sectionHeading">Basic Operation on Windows:</p>
<p>To run this script on a Windows machine you will need a copy of the GnuWin32 GAWK application (or a compatible AWK). In order to build a concise and sorted list of the Intel IPP functions used within your application it is also helpful to utilize the UNIX-compatible find, sort and uniq functions. All of these functions are provided in the <a target="_blank" href="http://software.intel.com/file/38595">GnuWin32-utils.zip attachment</a> to this KB article. With the GnuWin32 applications installed on your Windows system, use the following command line:</p>
<p>&gt;find src_dir -regex ".+\.[ch]p*" -exec gawk -f ipp-fn-survey.awk {} ; | sort | uniq &gt;ipp-survey.txt</p>
<p>where "src_dir" is the root of the directory you wish to scan source for names of IPP functions -- simplest way to do this is to copy the AWK file and the contents of the ZIP file (GnuWin32-utils-zip) to "src_dir" and type:</p>
<p>&gt;find . -regex ".+\.[ch]p*" -exec gawk -f ipp-fn-survey.awk {} ; | sort | uniq &gt;ipp-survey.txt</p>
<blockquote>
<p>Note: on a Windows system "find" needs to be a UNIX compatible version of find, not the Microsoft find. To avoid name conflicts with the Microsoft find.exe you might need to rename the GnuWin32 "find.exe" file to "uxfind.exe" and then run the samples above by referencing "uxfind" rather than "find" at the beginning of each script execution command.</p>
</blockquote>
<p>If you used the command line above, the results of the scan will be found in the file named "ipp-survey.txt" in the starting directory.</p>
<p class="sectionHeading">Basic Operation on Linux:</p>
<p>To run this script on a Linux machine you will need the standard Gnu AWK, find, sort, uniq and xargs utilities that are normally present on your Linux system. Use the following command line:</p>
<p>$find src_dir -iregex ".+\.[ch]p*" | xargs ./ipp-fn-survey.awk | sort | uniq &gt;ipp-survey.txt</p>
<p>where "src_dir" is the root of the directory you wish to scan source for names of IPP functions -- simplest way to do this is to copy this AWK file to the "src_dir" and type:</p>
<p>$find . -regextype posix-awk -iregex ".+\.[ch]p*" | xargs ./ipp-fn-survey.awk | sort | uniq &gt;ipp-survey.txt</p>
<blockquote>
<p>Note: GAWK on some Linux systems may not honor IGNORECASE -- resulting in a few false finds. Also, make sure the "executable" bits are set properly on the AWK script.</p>
</blockquote>
<p>If you used the command line above, the results of the scan will be found in the file named "ipp-survey.txt" in the starting directory.</p>
<blockquote>
<p>On some Linux systems you may not have gawk but awk installed on your system. The first line of the ipp-fun-survey.awk file references /usr/bin/gawk as the script interpreter. If your system instead only contains /usr/bin/awk you need to change that first line. To determine which application is on your system type "which gawk" at a command prompt. If nothing is returned type "which awk" at the command line. If your system only contains awk, or the application is located someplace other than /usr/bin, you will have to edit the first line of the ipp-fn-survey.awk script.</p>
</blockquote>
<p class="sectionHeading">Terms and Conditions</p>
<p>This script will not reveal any source code or parameter names, only the names of the IPP functions referenced in the source code that it scans.</p>
<blockquote>
<p><strong>Note</strong> that this script is being provided under <a href="http://software.intel.com/sites/products/documentation/EULA/Intel_SW_Dev_Products_EULA.pdf"><strong>the terms and conditions of the Intel IPP EULA.</strong></a> Please <em>insure you agree to those terms</em> before using the script to provide us with a list of the Intel IPP primitives you are using.</p>
</blockquote>
<p>If you are able to run this script (or something similar) on your application source code and generate a list of IPP functions that you would like to share with us, please reply to this posting (using the private option, if you prefer) with some information about your company and your application, as well as the list of Intel IPP functions created by the script or an email address you may use to contact you directly. Your input is very valuable for the prioritization of future optimizations within the Intel IPP library.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/a-tool-for-listing-ipp-apis-used-by-your-application/</link>
      <pubDate>Tue, 19 Oct 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/a-tool-for-listing-ipp-apis-used-by-your-application/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/a-tool-for-listing-ipp-apis-used-by-your-application/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>Understanding SIMD Optimization Layers and Dispatching in the Intel® IPP 7.0 Library</title>
      <description><![CDATA[ <p>This article describes the Intel® Integrated Performance Primitives (Intel® IPP) optimization layers present in the 7.0 version of the library. The article titled <a target="_blank" href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/"><em>Understanding CPU Dispatching in the Intel® IPP Library</em></a> describes the same features for previous versions of the library (5.3 thru 6.1).</p>
<blockquote>
<p><strong>IMPORTANT!</strong> <em>The minimum SIMD instruction levels supported by version 7.0 of the Intel IPP library has changed!</em> Applications built with this version of the library require that processors must support at least the Intel® Streaming SIMD Extensions 2 (Intel® SSE2) instruction set when built for Intel IA-32 processors (ia32) and the Intel® Streaming SIMD Extensions 3 (Intel® SSE3) instruction set when built for Intel® 64 processors (intel64). The non-optimized layers of the library (px on ia32 and mx on intel64) have been removed; the w7 and m7 optimization layers are now the <em>default</em> optimization layers.</p>
</blockquote>
<p>The standard distribution of the Intel IPP library contains multiple, functionally-identical, SIMD-specific, optimized libraries (or layers) that are automatically “dispatched” at run-time. The “dispatcher” directs your calls to the appropriate optimized library layer based on SIMD capabilities discovered during library initialization. This is done to maximize each function’s use of the runtime processor's underlying SIMD instructions and other architecture-specific features.</p>
<blockquote>
<p>Note: you can build custom processor-specific libraries that do not require the dispatcher, but that is outside the scope of this article. Please read this <a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-intel-ipp-linkage-models-quick-reference-guide/">IPP linkage models article</a> for information on how to build custom versions of the IPP library.</p>
</blockquote>
<p>Dispatching selects the Intel IPP optimized library layer that corresponds to the runtime CPU's SIMD instruction set. For example, on a Windows installation, the <em>$(IPPROOT)\..\redist\intel64\ipp</em> directory contains a file named <em >ippiu8-7.0.dll</em> which contains version ‘7.0’ of the optimized image processing libraries for processors that support the Intel SSE3 instructions on 64-bit processors; ‘ippi’ denotes the image processing domain, ‘u8’ denotes the SSSE3 instructions set for 64-bit processors and ‘7.0’ denotes the library’s version number.</p>
<p>In the general case, the “dispatcher” identifies the run-time processor only once, at library initialization time, and sets up a variable internal to the library that directs your calls to the SIMD-specific functions that match the runtime processor. For example, <em>ippsCopy_8u()</em>, has multiple implementations stored in the library, with each version optimized to a specific SIMD instruction set. The <em>u8_ippsCopy_8u()</em> version of <em>ippsCopy_8u()</em> is called by the dispatcher when running on an Intel® Core 2 Duo® processor in 64-bit addressing mode, because <em>u8_ippsCopy_8u()</em> is optimized for the SSSE3 instruction set architecture supported by that processor in 64-bit addressing mode.</p>
<blockquote>
<p>Note: IPP architectures generally correspond to SIMD (MMX, SSE, AES, etc.) instructions sets, with some minor variations (see the p8 and y8 optimization layers).</p>
</blockquote>
<p><b>Initializing the IPP Dispatcher</b></p>
<p>Identifying the runtime processor and initializing the dispatcher should be the first action you take with the Intel IPP library. If you are using the standard dynamic link library this process is handled automatically when the Intel IPP shared library is initialized. If you are using a static library you must perform this step manually. <a href="http://software.intel.com/en-us/articles/ipp-dispatcher-control-functions-ippinit-functions/">See this article on the ipp*Init*() functions</a> for more information on how to do this.</p>
<p>Because the minimum SIMD instruction set is SSE2 on IA-32 and SSE3 on Intel 64 processors it is recommended that you <em>ALWAYS</em> call the the <code>ippInit()</code> function before making any other calls to the Intel IPP library. This advice applies regardless of whether you are linking against the static or dynamic form of the library (even though the dynamic library will also perform this call). <br /><br />Calling the <code>ippInit()</code> function with the shared libraries (DLL and SO) will generate an error message to a dialog box or error console if the <code>ippInit()</code> function detects that the runtime CPU is not supported by the Intel IPP library. Calling the <code>ippInit()</code> function in the static versions of the library will not generate a console or dialog message. Both versions of the <code>ippInit()</code> function will return an error code when a non-supported CPU is detected.</p>
<blockquote>
<p>It is important that you call the <code>ippInit()</code> function at the beginning of your application to insure that the processor on which your application is running will support the Intel IPP library. If the <code>ippInit()</code> function returns an error code you should close your application gracefully in order to avoid an unexpected termination of your application by an <em>invalid instruction fault</em> because your application is running on an unsupported processor.</p>
</blockquote>
<p>The following table lists the SIMD architecture codes supported by version 7.0 of the Intel IPP library.</p>
<table width="700" cellpadding="0" cellspacing="0" border="1">
<tbody>
<tr>
<td width="114"><strong>Platform</strong></td>
<td width="84" ><strong>Architecture</strong></td>
<td width="238"><strong>SIMD Requirements</strong></td>
<td width="163"><strong>Processor / µarchitecture</strong></td>
<td width="100"><strong>Notes</strong></td>
</tr>
<tr>
<td>IA-32</td>
<td >w7</td>
<td>SSE2</td>
<td>P4, Xeon, Centrino</td>
<td>SSE2 default</td>
</tr>
<tr>
<td></td>
<td >v8</td>
<td>Supplemental SSE3</td>
<td>Core 2, Xeon® 5100, Atom</td>
<td></td>
</tr>
<tr>
<td></td>
<td >s8</td>
<td>Supplemental SSE3 (<a href="http://software.intel.com/en-us/articles/new-atom-support/">compiled for Atom</a>)</td>
<td>Atom</td>
<td></td>
</tr>
<tr>
<td></td>
<td >p8</td>
<td>SSE4.1, SSE4.2 and AES-NI</td>
<td>Penryn, Nehalem, Westmere</td>
<td>see next section</td>
</tr>
<tr>
<td></td>
<td >g9</td>
<td><a href="http://www.intel.com/software/avx">AVX</a></td>
<td>Sandy Bridge µarchitecture</td>
<td></td>
</tr>
<tr>
<td></td>
<td ></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Intel® 64 (EM64T)</td>
<td >m7</td>
<td>SSE3</td>
<td>Prescott</td>
<td>SSE3 default</td>
</tr>
<tr>
<td></td>
<td >u8</td>
<td>Supplemental SSE3</td>
<td>Core 2, Xeon® 5100, Atom</td>
<td></td>
</tr>
<tr>
<td></td>
<td >n8</td>
<td>Supplemental SSE3 (<a href="http://software.intel.com/en-us/articles/new-atom-support/">compiled for Atom</a>)</td>
<td>Atom</td>
<td></td>
</tr>
<tr>
<td></td>
<td >y8</td>
<td>SSE4.1, SSE4.2, AES-NI</td>
<td>Penryn, Nehalem, Westmere</td>
<td>see next section</td>
</tr>
<tr>
<td></td>
<td >e9</td>
<td><a href="http://www.intel.com/software/avx">AVX</a></td>
<td>Sandy Bridge µarchitecture</td>
<td></td>
</tr>
</tbody>
</table>
<p><br />For non-Intel based processors support, please read <a target="_blank" href="http://software.intel.com/en-us/articles/use-ipp-on-amd-processor/"><em>Use Intel® IPP on Intel or Compatible AMD* Processors</em></a>.</p>
<p>If you compare this dispatch table above to the <a target="_blank" href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/">5.3 thru 6.1 dispatch table</a> you will note that the Intel SSE3 optimization layer (t7) has been removed from the 32-bit edition (ia32) of the library. 32-bit applications built with the 7.0 version of the library that execute on an SSE3 processor will automatically use the Intel SSE2 optimization layer (w7). In most cases, the impact of this change is minor, since the performance difference between the Intel SSE3 (t7) and Intel SSE2 (w7) optimization layers in the Intel IPP library is minimal. Processors that support the Intel SSSE3 instruction set (v8 and s8 optimization layers) are not affected by this change. (Note: this change does not impact applications built using the 64-bit edition of the library, which now uses the Intel SSE3 optimization layer (m7) as its default path.)</p>
<p><b>P8/Y8 Internal Run-Time Dispatcher</b></p>
<p>Within the 32-bit p8 and equivalent 64-bit y8 architectures there is an additional "runtime dispatcher," a mini-dispatcher. The Nehalem and Westmere processor microarchitectures add additional SIMD instructions beyond those defined by SSE4.1. The Nehalem processor microarchitecture added SSE4.2 SIMD instructions and the Westmere processor microarchitecture added Inte® AES-NI.</p>
<p>Creating two separate optimization layers within the IPP library for the small set of instructions added by SSE4.2 and AES-NI would be very space inefficient, so they are bundled into the SSE4.1 library (p8/y8) as minor variants to that optimization layer. When you call a function that includes, for example, AES-NI optimizations, an additional jump directs your call to the AES-NI version within the p8/y8 library if your runtime processor supports these instructions. Because the enhancements affect the optimization of only a small number of Intel IPP functions, this additional overhead occurs infrequently and only when your application is executing on a p8/y8 architecture processor that supports these extra instructions.</p>
<p><b>S8/N8 (Atom) Dispatch</b></p>
<p>Unlike preceding versions of the library, the 7.0 version of the Intel IPP library <em>does</em> include Atom-optimized variants of the library within all formats (static and dynamic) of the library. For this reason, the Linux distribution of the 7.0 version of the Intel IPP library no longer includes a separate Atom-specific version of the library, since Atom-specific optimizations have been fully merged into all formats of the standard library files. <br /><br />Please read <a href="http://software.intel.com/en-us/articles/new-atom-support/"><em>Intel® Atom™ Processors Support in the Intel® Integrated Performance Primitives (Intel® IPP) Library</em></a> for more information regarding Atom optimizations in the IPP library.</p>
<p><strong>Processor Architecture Table</strong></p>
<p><span >The following table was copied from an <a target="_blank" href="http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/" >Intel Compiler Pro options article</a> describing some compiler architecture options. It contains a list of Intel processors showing which processors support which SIMD instructions. For the latest table please refer to the original article; it gets updated on a regular basis. Please note that the behavior of the Intel Compiler SIMD dispatcher described in <a target="_blank" href="http://software.intel.com/en-us/articles/performance-tools-for-software-developers-intel-compiler-options-for-sse-generation-and-processor-specific-optimizations/" >that article</a> does not apply to the Intel IPP library.</span></p>
<blockquote>The Intel IPP library dispatching mechanism behaves differently than that found in the Intel Compiler products, and may also behave differently than other Intel library products.</blockquote>
<p>Additional information regarding dispatching and how it relates to <a target="_blank" href="http://software.intel.com/en-us/articles/use-ipp-on-amd-processor/">non-Intel processors can be found here</a>. How to identify your specific processor is <a target="_blank" href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-is-there-any-function-to-detect-processor-type/">described here</a>. To correlate a processor family name with an Intel CPU brand name, use the following web site: <a target="_blank" href="http://ark.intel.com/">ark.intel.com</a>.</p>
<p><b><b>SSE</b>4.2</b><br />Intel® Core™ i7 processors<br />Intel® Core™ i5 processors<br />Intel® Core™ i3 processors<br />Intel® Xeon® 55XX series</p>
<p><b><b>SSE</b>4.1<br /></b>Intel® Xeon® 74XX series<br />Quad-Core Intel® Xeon 54XX, 33XX series<br />Dual-Core Intel® Xeon 52XX, 31XX series<br />Intel® Core™ 2 Extreme 9XXX series<br />Intel® Core™ 2 Quad 9XXX series<br />Intel® Core™ 2 Duo 8XXX series<br />Intel® Core™ 2 Duo E7200</p>
<p><b><b>SSSE</b>3</b><br />Quad-Core Intel® Xeon® 73XX, 53XX, 32XX series<br />Dual-Core Intel® Xeon® 72XX, 53XX, 51XX, 30XX series<br />In tel® Core™ 2 Extreme 7XXX, 6XXX series<br />Intel® Core™ 2 Quad 6XXX series<br />Intel® Core™ 2 Duo 7XXX (except E7200), 6XXX, 5XXX, 4XXX series<br />Intel® Core™ 2 Solo 2XXX series<br />Intel® Pentium® dual-core processor E2XXX, T23XX series</p>
<p><b><b>SSE</b>3</b><br />Dual-Core Intel® Xeon® 70XX, 71XX, 50XX Series<br />Dual-Core Intel® Xeon® processor (ULV and LV) 1.66, 2.0, 2.16<br />Dual-Core Intel® Xeon® 2.8<br />Intel® Xeon® processors with SSE3 instruction set support<br />Intel® Core™ Duo<br />Intel® Core™ Solo<br />Intel® Pentium® dual-core processor T21XX, T20XX series<br />Intel® Pentium® processor Extreme Edition<br />Intel® Pentium® D<br />Intel® Pentium® 4 processors with SSE3 instruction set support</p>
<p><b><b>SSE</b>2</b><br />Intel® Xeon® processors<br />Intel® Pentium® 4 processors<br />Intel® Pentium® M</p>
<p><b>IA32</b><br />Intel® Pentium® III Processor<br />Intel® Pentium® II Processor<br />Intel® Pentium® Processor</p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p>
<br />*Other names and brands may be claimed as the property of others. ]]></description>
      <link>http://software.intel.com/en-us/articles/understanding-simd-optimization-layers-and-dispatching-in-the-intel-ipp-70-library/</link>
      <pubDate>Mon, 04 Oct 2010 09:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/understanding-simd-optimization-layers-and-dispatching-in-the-intel-ipp-70-library/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/understanding-simd-optimization-layers-and-dispatching-in-the-intel-ipp-70-library/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>Accelerate Your Application via IPP Image Processing in Parallel Studio - C code vs. IPP Resize</title>
      <description><![CDATA[ <p align="left"><strong>Summary</strong><br />Intel®<strong> </strong>Parallel Studio 2011 release recently. IPP as one key component of Intel®<strong> </strong>Parallel Composer provide user a easy and faster way to accelarate digital application. This article shows how to employ IPP image processing function to develop parallel ready application and provide a sample to shows the performance difference between IPP and general C code on resizing image, which is wide-used functionality in image processing field. Test show that the IPP function can run 44x faster than corresponding C code. If enabling parallel, the speed up will high 50x on Core 2 Quad 2.66GHz machine. <br /><br /><a href="http://software.intel.com/file/29998"><strong>Attached</strong></a> is the sample project, one Parallel Composer 2011 project in MicroSoft Visual Studio 2005 IDE. <br />Some developers may install Intel Parallel Composer with Microsoft Visual Studio 2010. <a href="http://software.intel.com/file/32831"><strong>Here</strong></a> is the project. <br /><b><br />How to build the Sample</b></p>
<p>1. Build system requirement</p>
<p>Software:<br />•   <a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-for-windows-compiling-and-linking-with-microsoft-visual-c-and-intel-c-compilers/#10#10">Intel Parallel Studio 2011 and Microsoft* Visual Studio 2005 and later</a><br />•   (optional)  install static ipp library separately from http://software.intel.com/en-us/articles/intel-ipp-static-libraries/ <b></b></p>
<p>Hardware:  The latest dual-core/Quad Core machine with Windows xp/Windows Vista/Windows 7</p>
<p>2. Download and Unzip the Resize_Image_PS_VS2005.zip to a directory, let's name &lt;WorkDIR&gt;</p>
<p>3. Go to &lt;WorkDIR&gt; and double click the Resize Image.sln.  The msvc2005 IDE will prompt automatically.</p>
<p>4. From the <b>main toolbar</b> select <b>Project&gt;&gt;</b> <b>Intel Parallel Composer 2011 »</b> <b>Select Build Component.</b></p>
<p>(or right-click the Project in Solution Explorer) , check <b>Use IPP. </b>click OK<b></b></p>
<p>5. Then build the application, from the <b>main toolbar</b> select<strong> Build &gt;&gt; Build solution<br /></strong><br />Please see the build details in<strong>  </strong><a href="http://software.intel.com/en-us/articles/use-intel-ipp-in-intel-parallel-composer/"><strong>Use Intel IPP in Intel® Parallel Composer</strong></a></p>
<p><b>How to run the application</b> </p>
<p>1. Run the application<br />From the <strong>main toolbar</strong>, select <b>Debug</b> &gt;&gt;<strong> Start Without Debugging. </strong>The application windows start, Click Open File, Select LennaC1.bmp <br /><strong><img src="http://software.intel.com/file/29994" alt="ReadLenna.JPG" title="ReadLenna.JPG" /></strong></p>
<p>2. click menu "Process =&gt; Resize image" to Resize the image. <br /> Enter the zoom factor in horizontal (x) and vertical (y) directory in Resize dialog box.  Click OK  <img src="http://software.intel.com/file/29995" alt="Process.JPG" title="Process.JPG" /></p>
<p>3: Click lennC1.bmp and repeat step 2 again, make sure click button USE_IPP. Then get the below image  <strong><img src="http://software.intel.com/file/29997" alt="result1.JPG" title="result1.JPG" /></strong></p>
<p><b>IPP Function Adoption: <br /></b>Assume the sample is the application we want to improve the performance via IPP function.  <br />1.  Find the c code resize image function in RESIZE.cpp</p>
<p>unsigned long C_Code_Resize(unsigned char * src, int srcWidth, int srcHeight,   int srcStep, unsigned char* dst, int dstWidth, int dstHeight, int dstStep, double m_zoom_x, double m_zoom_y, int interpolation)</p>
<p> It is called by function ProcessImage(CSampleDoc *pSrc) in ippiAddC.cpp<br /><br />2. Check ipp manual ippiman.pdf and find the function ippiResizeSqrPixel have same functionality.  Then replace the C function with IPP function.   <br />Declare a similiar function in RESIZE.cpp<br />unsigned long IPP_Resize( void* src, int srcWidth, int srcHeight,int srcStep,  void* dst,  int dstWidth, int dstHeight, int dstStep, double m_nzoom_x, double m_nzoom_y, int interpolation)</p>
<p align="left"> And call it in ProcessImage(CSampleDoc *pSrc) in ippiAddC.cpp instead of call C_Code_Resize().  (In order to compare the performance, we keep the c function call here.)</p>
<p> if (m_USE_IPP)<br />{<br />             ippStaticInit();<br />       //---- perform IPP Funtion Code to rotate a image  -----//<br />         run_time = IPP_Resize(pSrc-&gt;DataPtr(),pSrc-&gt;Width(),pSrc-&gt;Height(),pSrc-&gt;Step(),(Ipp8u*)pDst-&gt;DataPtr(),        pDst-&gt;Width(),pDst-&gt;Height(),pDst-&gt;Step(),m_zoom_x,m_zoom_y,m_Interpolation);<br />}<br />else{         //---- perform C Code to rotate a image  -----//<br />         run_time = C_Code_Resize((unsigned char *)pSrc-&gt;DataPtr(),pSrc-&gt;Width(),<br />         pSrc-&gt;Height(),pSrc-&gt;Step(), (unsigned char *)pDst-&gt;DataPtr(), pDst-&gt;Width(),pDst-&gt;Height(),pDst-&gt;Step(),m_zoom_x,m_zoom_y,m_Interpolation);<br />}     <br /><br />3. Write the IPP code to replace the C code.  The table show the original C code and the IPP code </p>
<p>
<table width="588" cellpadding="0" cellspacing="0" border="1">
<tbody>
<tr>
<td width="284" valign="top">
<p>Tthe C code</p>
</td>
<td width="304" valign="top">
<p>The IPP code</p>
</td>
</tr>
<tr>
<td width="284" valign="top">
<p>unsigned long C_Code_Resize(unsigned char * src, int srcWidth, int srcHeight,int srcStep, unsigned char* dst, int dstWidth, int dstHeight, int dstStep, double m_zoom_x, double m_zoom_y, int interpolation)</p>
<p align="left">{//---------- Perform 1 order linear ---<br />     //define record time variable<br />     unsigned long start_clock,stop_clock;    start_clock = RUNTIME;</p>
<p align="left">     const unsigned char *tmpSrc;<br />    unsigned char *tmpRef;<br />    int width = srcWidth;<br />    int height = srcHeight;<br />    double xInv = 1.0 /  m_zoom_x;<br />    double yInv = 1.0 /  m_zoom_y;</p>
<p align="left">    int colInd, rowInd;<br />    int i, j, xSrc0, xSrc1, ySrc0, ySrc1, wdroi, hdroi;<br />    int idxl, idyt, icol, jrow;<br />    double row, col;<br />    double y1, y2, y3, y4, v, v1, v2, tempV,tempV2;</p>
<p align="left">     idxl=0;<br />     idyt=0; <br />    wdroi = dstWidth;<br />    hdroi = dstHeight;</p>
<p align="left">     tmpSrc = src;<br />for(int kloop=0;kloop&lt;LOOP;kloop++) </p>
<p align="left">{  <br />  tmpRef = dst ;<br />    for (j = 0, jrow = idyt; j &lt; hdroi; j++, jrow++) {         row = (jrow + 0.5) * yInv - 0.5;</p>
<p align="left">        rowInd = (int)floor(row);<br />        ySrc0 = ts_iGetCoord_vs(rowInd, rowInd,  0, srcHeight, srcHeight);<br />        ySrc1 = ts_iGetCoord_vs(rowInd, rowInd + 1, 0, srcHeight, srcHeight);<br />        for (i = 0, icol = idxl; i &lt; wdroi; i++, icol++) { <br />            col = (icol + 0.5) * xInv - 0.5;<br />            colInd = (int)floor(col);<br />            xSrc0 = ts_iGetCoord_vs(colInd, colInd,   0, srcWidth, srcWidth);<br />            xSrc1 = ts_iGetCoord_vs(colInd, colInd + 1, 0, srcWidth, srcWidth);<br />            y1 = (double)tmpSrc[ySrc0 * srcStep + xSrc0];<br />            y2 = (double)tmpSrc[ySrc0 * srcStep + xSrc1];<br />            y3 = (double)tmpSrc[ySrc1 * srcStep + xSrc0];<br />            y4 = (double)tmpSrc[ySrc1 * srcStep + xSrc1];  <br /> ts_iLinearCalcSP_vs(col + 0.5, colInd + 0.5, colInd + 1.5, y1, y2, &amp;v1);            ts_iLinearCalcSP_vs(col + 0.5, colInd + 0.5, colInd + 1.5, y3, y4, &amp;v2);<br />ts_iLinearCalcSP_vs(row + 0.5, rowInd + 0.5, rowInd + 1.5, v1, v2, &amp;v);<br />              //(ts_isaturate_vs(v);<br />            tempV = (int)(v + EXP + 0.5);             tmpRef[i] =(unsigned char)((tempV &gt; 255) ? 255 : (tempV &lt; 0) ? 0 : tempV);<br />        }<br />        tmpRef += dstStep;<br />  }  <br />}</p>
<p align="left">     stop_clock = RUNTIME;</p>
<p align="left">     int mhz;</p>
<p align="left">    ippGetCpuFreqMhz(&amp;mhz);</p>
<p align="left">     return (stop_clock - start_clock)/mhz/LOOP;</p>
<p>}</p>
</td>
<td width="304" valign="top">
<p align="left">unsigned long IPP_Resize(void* src, int srcWidth, int srcHeight,int srcStep,  void* dst,  int dstWidth, int dstHeight, int dstStep,   double m_nzoom_x, double m_nzoom_y, int interpolation)</p>
<p align="left">  {</p>
<p align="left">      //   define record time variable<br />    unsigned long start_clock,stop_clock;     start_clock= RUNTIME;</p>
<p align="left"> // define IPP function parameter</p>
<p align="left">     IppiRect srcRoi = {0,0, srcWidth, srcHeight};</p>
<p align="left">     IppiRect dstRoi={0,0, dstWidth,dstHeight};</p>
<p align="left"> </p>
<p align="left">     IppiSize srcSize = {srcWidth, srcHeight};</p>
<p align="left">    IppiSize dstSize = {dstWidth, dstHeight};</p>
<p align="left"> </p>
<p align="left">     int BufferSize;</p>
<p align="left">     ippiResizeGetBufSize(srcRoi, dstRoi, 1, interpolation, &amp;BufferSize);</p>
<p align="left">     Ipp8u* pBuffer=ippsMalloc_8u(BufferSize);</p>
<p align="left"> <br /><br />     for(int i=0;i&lt;LOOP;i++)    </p>
<p align="left">     //---------- Perform IPP function:ippiResizeSqrPixel_8u_C1R  -------------------------------------------//</p>
<p align="left">     ippiResizeSqrPixel_8u_C1R((Ipp8u*)src, srcSize, srcStep, srcRoi, (Ipp8u*)dst, dstStep, dstRoi, m_nzoom_x,m_nzoom_y,0, 0, interpolation, pBuffer);</p>
<p align="left">    ippsFree(pBuffer);<br />    stop_clock = RUNTIME;<br />      int mhz;<br />    ippGetCpuFreqMhz(&amp;mhz);<br />     return (stop_clock - start_clock)/mhz/LOOP;</p>
</td>
</tr>
</tbody>
</table>
</p>
<p> </p>
<p><b>Performance Gain</b> </p>
<p>On one test machine (core 2 Quad 2.66GHz), as the result image show that the performance gain is 15654/353=<strong>44x</strong>.</p>
<p>The test is linking serial IPP static library.  As the ippiResize is threaded in dynamic library and threaded IPP static library. If enable the multithread, the performance gain will be more than <strong>50x</strong> (depends on the core numbers and image size).<br /><br /><strong>Conclusion<br /></strong>Intel® Parallel Studio 2011 provide developer a first suit of tool for easy developing parallel application on multi-core platform. IPP is part of key component of Intel® Parallel Studio. It provide over thousands highly-optimizated functions that offer the support for for developing high performance digital media application. This article describes a brief way to adopt IPP function instead of source code via Parallel Studio Project and gain over<strong> 40x</strong> performance speed up outright.  </p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/accelerate-your-application-via-ipp-image-processing-in-parallel-studio-c-code-vs-ipp-resize/</link>
      <pubDate>Sun, 29 Aug 2010 09:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/accelerate-your-application-via-ipp-image-processing-in-parallel-studio-c-code-vs-ipp-resize/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/accelerate-your-application-via-ipp-image-processing-in-parallel-studio-c-code-vs-ipp-resize/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>OpenMP and the Intel® IPP Library</title>
      <description><![CDATA[ <p><strong>Introduction</strong></p>
<p>The low-level <em>primitives</em> within the Intel IPP library generally represent basic atomic operations. This limits threading within the library to ~15-20% of the functions. <a target="_blank" href="http://openmp.org/">OpenMP</a> is enabled by default when you use one of the multi-threaded variants of the Intel IPP library. A list of the threaded primitives in the IPP library is provided in the <em>ThreadedFunctionsList.txt</em> file located in the library’s doc directory.</p>
<p>The quickest way to multi-thread an Intel IPP application is to use the built-in OpenMP threading of the library. There’s no significant code rework required on your part and, depending on the IPP primitives you use, it may provide additional performance improvements.</p>
<p align="center"><a href="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/threading-ipp.png"><img height="431" width="580" src="http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/12/threading-ipp.png" alt="IPP Internal Threading Model" /></a></p>
<p align="center">IPP Internal Threading Model</p>
<p>If you use multiple threads in your own application (above the Intel IPP library) we generally recommend that you disable the library’s built-in threading. Doing so eliminates competition between the library’s OpenMP threading and your application’s threading, and avoids oversubscription of software threads to the available hardware threads.</p>
<p>Disabling internal IPP library threading in a multi-threaded application is not a hard and fast rule. For example, if your application has just two threads (e.g., a GUI thread and a background thread) and the IPP library is only being used by the background thread, using the internal IPP threading probably makes sense.</p>
<p>For a quick summary of the differences between OpenMP and other threading technologies please read <a href="http://software.intel.com/en-us/articles/intel-threading-building-blocks-openmp-or-native-threads/">Intel® Threading Building Blocks, OpenMP, or native threads?</a></p>
<p><strong>Controlling OpenMP Threading in the Intel IPP Primitives</strong></p>
<p>The default <em>maximum</em> number of OpenMP threads used by the multi-threaded IPP primitives is equal to the number of <em>hardware threads</em> in the system, which is determined by the number and type of CPUs in your system. That means that a quad-core processor with <a href="http://software.intel.com/en-us/articles/intel-hyper-threading-technology-your-questions-answered/">Intel® HT</a> has eight hardware threads (four cores, each core has two threads), and a dual-core CPU without Intel HT has only two hardware threads.</p>
<p>There are two IPP primitives for control and status of the OpenMP threading used within the library: <em>ippSetNumThreads()</em> and <em>ippGetNumThreads()</em>. You call <em>ippGetNumThreads</em> to determine the current <em>thread cap</em> and <em>ippSetNumThreads</em> to change the thread cap. <em>ippSetNumThreads</em> will not allow you to set the thread cap beyond the number of available hardware threads. This thread cap is an upper bound on the number of threads that can be used within a multi-threaded primitive. Some IPP functions may use fewer threads than specified by the thread cap, but they will never use more than the thread cap.</p>
<p>To disable OpenMP threading within the library you need to call <em>ippSetNumThreads(1)</em> near the beginning of your application. Or, you can link your application with the single-threaded variant of the library.</p>
<p>The OpenMP library used by the IPP library references several <a href="http://software.intel.com/sites/products/documentation/hpc/compilerpro/en-us/cpp/lin/compiler_c/optaps/common/optaps_par_var.htm">configuration environment variables</a>. In particular, OMP_NUM_THREADS sets the default number of threads (the thread cap) to be used by the OpenMP library at run time. However, the IPP library will override this setting by limiting the number of OpenMP threads used by your application to be either the number of hardware threads in the system, as described above, or the value specified by a call to <em>ippSetNumThreads</em>, whichever is smaller. OpenMP applications on your system that <em>do not</em> use the Intel IPP library might still be affected by the OMP_NUM_THREADS environment variable; likewise, any such OpenMP applications <em>will not</em> be affected by a call to the <em>ippSetNumThreads</em> function within your Intel IPP application.</p>
<p><strong>Nested OpenMP</strong></p>
<p>If your application that is using the Intel IPP library also implements multi-threading via OpenMP, the threaded Intel IPP primitives your application calls may execute as single-threaded primitives. This happens when an IPP primitive is called within an OpenMP parallelized section of code and if <em>nested parallelization</em> has been disabled, which is the default case for the Intel OpenMP library.</p>
<p>By nesting parallel OpenMP regions you risk creating a large number of threads that can effectively <em>oversubscribe</em> the number of hardware threads available. Creating parallel region always incurs overhead, and the overhead associated with nesting parallel OpenMP regions may outweigh the benefit.</p>
<p>In general, OpenMP threaded applications that use the IPP library should disable multi-threading within the library, either by calling <em>ippSetNumThreads(1)</em> or by using the single-threaded static Intel IPP library.</p>
<p><strong>Core Affinity</strong></p>
<p>Some of the Intel IPP primitives in the signal processing domain are designed to execute parallel threads that exploit a merged L2 cache. These functions (single and double precision FFT, Div, Sqrt, etc.) need a shared cache in order to achieve their maximum multi-threaded performance. In other words, the threads within these primitives should, ideally, execute on CPU cores located on a single die with a shared or unified cache. To insure this condition is met, the following OpenMP environment variable should be set before an application using the Intel IPP library runs:</p>
<p>KMP_AFFINITY=compact</p>
<p>On processors with two or more cores on a single die, this condition is satisfied automatically and the environment variable is superfluous. However, for those systems with more than two dies (e.g., a Pentium D or a multi-socket motherboard), where the cache serving each die is not shared, failing to set this OpenMP environmental variable can actually result in performance degradation for this class of multi-threaded Intel IPP primitives.</p>
<p>Additionally, some IPP functions require that Intel Hyper-Threading Technology is disabled or not used by the multiple threads within the Intel IPP mult-threaded library. This has been seen to negatively impact, for example, the performance of the IPP cryptography sample based on OpenSSL. In this case you should follow the instructions in this KB article:</p>
<p><a href="http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/">IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems</a></p>
<p><b>Multi-threaded FFT Functions</b></p>
<p>The multi-threaded FFT functions were originally developed as part of the v8/u8 libraries (Core 2 with a shared cache architecture). These functions specifically exploit a shared-cache architecture in order to achieve higher performance in a multi-threaded environment. If this shared-cache condition is not met you may see a performance <i>degradation</i>.</p>
<p>As noted above, for processors that use libraries higher than the v8/u8 optimization (e.g., p8/y8), you <i>must</i> set the  KMP_AFFINITY environmental variable equal to "compact" (as shown above) to avoid this potential performance degradation.</p>
<p>Within the FFT functions, threading starts with and order of 12 for 64fc data types and an order of 13 for 32fc data types.</p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/</link>
      <pubDate>Wed, 14 Apr 2010 09:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/openmp-and-the-intel-ipp-library/</guid>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
    </item>
    <item>
      <title>clock() or gettimeofday() or ippGetCpuClocks()?</title>
      <description><![CDATA[ <p>When using IPP, mainly the three different functions used by users to measure timing of a computation or an application or a function in Intel® IPP are clock(), gettimeofday() and ippGetCpuClocks(). Details of each function are listed below and why you should be using ippGetCpuClocks() in your IPP applications instead of clock() or gettimeofday().</p>
<p>clock():   The granularity of clock() function is dependent on implementation by various compiler vendors.  The C standard does not say anything about the granularity of clock() - a compiler can have it check time once a second and increment the variable by CLOCKS_PER_SEC. This means it is possible that, depending on different compiler implementation, you can get zero, CLOCKS_PER_SEC, CLOCKS_PER_SEC * 2 and so on, never getting any intermediate value. Don't use clock() if you need high granularity.</p>
<p><br />gettimeofday():  It returns time in milliseconds or the wall clock time. The precision of gettimeofday is also very bad, for example, for a 3 GHz machine that means precision == 3 million of cpu clocks only. If your application does only calculations, clock() and gettimeofday() would be fairly close. Any time, if the application starts waiting for something  (for  e.g: DISK  I/O), clock() will lag behind  compared to the gettimeofday().  clock() can also go faster than gettimeofday() if you have multiple threads running in the same process.</p>
<p><br />ippGetCpuClocks():  The IPP function ippGetCpuClocks() provides precision equals to 1 cpu clock.  If you want to get the highest granularity or precision, we highly recommend you to use ippGetCpuClock(). This can be used even your program is parallel and runs on multiple cores - all TSC counters are synchronized and show the same clocks as like there is the only one counter in a system.</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/</link>
      <pubDate>Sat, 27 Mar 2010 00:00:00 -0700</pubDate>
      <comments>http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/best-timing-function-for-measuring-ipp-api-timing/</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
    <item>
      <title>AES-NI support in Intel® IPP</title>
      <description><![CDATA[ <p>The Advanced Encryption Standard New Instructions (AES-NI) introduced in the new generation of Core i7 processors (Westmere microarchitecture) offer a significant increase in performance on cryptography and data compression. Please see this <a href="http://software.intel.com/en-us/articles/advanced-encryption-standard-aes-instructions-set/">AES techinal article </a>for more information about AES-NI.<br /><br />Intel IPP 6.1 update 2 include optimizations for the AES-NI instructions, which are improved consistantly in later version. Discussions in the article I<a href="http://software.intel.com/en-us/articles/new-nehalem-support/">ntel® Core<sup>TM</sup> i7 processor Support</a> and in forum <a href="http://software.intel.com/en-us/forums/showthread.php?t=71133">AES-NI support for Westmere</a> are clarified below.<br /><br /><strong>1.The "p8" (IA32) and "y8" (Intel 64) IPP architectures include AES-NI optimizations for Westmere.</strong> <br /><br />If you build your application with IPP 6.1 update 2 or higher on a Westmere microarchitecture processor, the p8/y8 code will use code that has been optimized for your processor. The following functions in the Intel IPP cryptography add-on library are optimizied for Westmere (in IPP 6.1 update 2 and later):<br /><br />ippsRijndael128{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }<br />ippsRijndael128CCM{Encrypt|Decrypt},<br />ippsRijndael128GCMProcess{IV|AAD}<br /><br />ippsRijndael192{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }<br />ippsRijndae256{Encrypt|Decrypt{ECB|CBC|CFB|OFB|CTR} }<br /><br />ippsDAARijndael128Update, ippsDAARijndael128Final<br />ippsDAARijndael192Update, ippsDAARijndael192Final<br />ippsDAARijndael256Update, ippsDAARijndael256Final<br /><br />ippsXCBCRijndael128Update, ippsXCBCRijndael128Final<br /><br />The functions below are also optimized for Westmere and starting in IPP 6.1 update 3:<br /><br />ippsCRC32_8u<br />ippsCRC32_BZ2_8u<br /><br />You may need to update your IPP version to take full benefit of IPP library optimizations for the Westmere microarchitecture. Run the <em>cpuinfo</em> sample in the <em>ipp-samples/advanced-usage/cpuinfo</em> folder on your Westmere processor to ensure the p8 or y8 code is recommended as the library architecture to be used.<br /><br /><strong>2. Penryn (SSE4.1), Nehalem (SSE4.2) and Westmere (AES-NI) share the same optimized IPP library: "p8" (for IA32) and "y8" (for Intel 64).</strong> <br /><br />Please see the article <a href="http://software.intel.com/en-us/articles/intel-integrated-performance-primitives-intel-ipp-understanding-cpu-optimized-code-used-in-intel-ipp/">Understanding CPU Dispatching in the Intel® IPP Library</a> for more information about the IPP library dispatching mechanism.<br /><br />The AES-NI instructions introduced with the Westmere microarchitecture processors are beneficial mostly to cryptography algorithms and a small subset of data compression algorithms. Rather than increase the size of the IPP library by adding a new IPP architecture with a limited set of functions that can take advantage of these new instructions, we extended the run-time dispatcher to check for support of AES-NI and branch to AES-optimized code within the Core i7 optimized library. <br /><br /><strong>3. AES performance test on Westmere<br /></strong>The application note <a href="http://software.intel.com/en-us/articles/boosting-openssl-aes-encryption-with-intel-ipp/"><b>Boosting OpenSSL AES Encryption with Intel® IPP </b></a><a target="_blank" href="http://software.intel.com/en-us/articles/boosting-openssl-aes-encryption-with-intel-ipp/feed/"></a> provide some performance data of IPP AES functions comparing with OpenSSL AES Encryption. <br /><br />Also the article <a href="http://software.intel.com/en-us/articles/performance-of-crypto-sample-for-openssl-slowing-down-on-hyper-threading-systems/">IPP Crypto Sample Performance for OpenSSL too Slow on Hyper-Threading Systems</a> describe more information on performance test method. In summary, use one of the following solutions to insure appropriate results:</p>
<li>disable Intel HT Technology (usually done via a configuration switch in the BIOS) and set the KMP_AFFINITY=compact</li>
<li>disable multi-threading by linking with the static single-threaded version of the Intel IPP library if HT is enable</li>
<li>disable multi-threading within the multi-threaded Intel IPP libraries by calling <em>ippSetNumThreads(1)</em> if HT is enable</li>
<li>configure OpenMP to use 1/2 of the available logical threads if HT is enable <em>and</em> set the KMP_AFFINITY environment variable as follows: <em>KMP_AFFINITY=granularity=fine, compact,1,0</em> </li>
<p><br />The Intel® C/C++ Compiler version 11 also includes support for AES-NI, see <a href="http://software.intel.com/en-us/articles/how-to-compile-for-the-intel-core-i5-processor-with-aes-ni/">How to Compile for the Intel® Core<sup>TM</sup> i5 processor with AES-NI</a>, as does Microsoft* Visual Studio* 2008 Service Pack 1 compiler and gcc version 4.4.<br /><br /><strong>How to Download the Cryptography Library Add-on for the Intel IPP Library</strong></p>
<p>The cryptography component of the IPP library is subject to US Export Administration Regulations and other US laws. To obtain the Intel IPP cryptography libraries, which must be downloaded separately, <a s_oid="https://registrationcenter.intel.com/regcenter/dplrequestgen.aspx?productid=1338" s_oidt="0" target="_blank" href="https://registrationcenter.intel.com/regcenter/dplrequestgen.aspx?productid=1338">register for eligibility</a> and follow the instructions you receive in the registration email. If you have additional questions review this knowledge base article on <a target="_blank" href="http://software.intel.com/en-us/articles/download-ipp-cryptography-libraries">how to download the cryptography library</a> component of the IPP library.</p>
<p>You must have a valid Intel IPP license key to install and use the Intel IPP libraries.</p>
<p>To see an advantage of AES-NI optimization in the crypto engine, refer to <a href="http://software.intel.com/en-us/articles/demo-advantage-of-westmere-crypto-acceleration-engine/">AES-NI demo</a>. </p>
<p>
<table cellpadding="5" cellspacing="0" rules="none" border="1">
<tbody>
<tr>
<th align="left" valign="middle" >Optimization Notice</th>
</tr>
<tr bgcolor="#ccecff">
<td>
<p>Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.</p>
<p align="right">Notice revision #20110804</p>
</td>
</tr>
</tbody>
</table>
</p> ]]></description>
      <link>http://software.intel.com/en-us/articles/aes-ni-support-in-intel-ipp/</link>
      <pubDate>Fri, 05 Feb 2010 09:00:00 -0800</pubDate>
      <comments>http://software.intel.com/en-us/articles/aes-ni-support-in-intel-ipp/#comments</comments>
      <guid isPermaLink="true">http://software.intel.com/en-us/articles/aes-ni-support-in-intel-ipp/</guid>
      <category>Intel® C++ Compiler for Linux* Knowledge Base</category>
      <category>Intel® C++ Compiler for Mac OS X* Knowledge Base</category>
      <category>Intel® C++ Compiler for Windows* Knowledge Base</category>
      <category>Intel® Integrated Performance Primitives Knowledge Base</category>
      <category>Intel® Parallel Composer Knowledge Base</category>
    </item>
  </channel></rss>
