Intel IPP support for Intel® AVX2

Haswell is the codename next generation x86 processor micro architecture (tock).  Haswell's new instructions accelerate a broad category of applications and usage models. Download the full Intel® Advanced Vector Extensions Programming Reference (319433). This new instruction set is built upon the instructions of Intel® microarchitecture code-named Ivy Bridge, including the digital random number generator, half-float (float16) accelerators, and an extended set of Intel® Advanced Vector extensions (Intel® AVX) instructions.

The instructions fit into the following categories:

AVX2 - Integer data types expanded to 256-bit SIMD. AVX2's integer support is particularly useful for processing visual data commonly encountered in consumer imaging and video processing workloads. With Haswell, we have both Intel® Advanced Vector Extensions (Intel® AVX) for floating point data types as well as AVX2 for integer data types.

Bit manipulation instructions are useful for compressed databases, hashes, large number arithmetic, and a variety of general purpose codes.

Gather instructions are useful for vectorized code that accesses non-adjacent data elements. Haswell gather operations are mask-based for safety (like conditional loads and stores introduced in Intel® AVX). Gather operations are favorable to clip values, to clamp boundaries, or similar conditional operations.

Any-to-Any permutes are incredibly useful shuffle operations. Haswell adds support for DWORD and QWORD granularity and allows to permute across an entire 256-bit register.

Vector-Vector Shifts are added to shift vectors where the amount of shift is controlled by vector. These are critical in vectorized loops with variable shifts.

Floating Point Multiply Accumulate - Our floating-point multiply accumulate significantly increases peak flops and provides improved precision to further improve transcendental mathematics. They are broadly usable in high performance computing, professional quality imaging, and face detection. They operate on scalars, 128-bit packed single and double precision data types, and 256-bit packed single and double-precision data types. [These instructions were described previously, in the initial Intel® AVX specification].

 


The following functions are optimized to benefit from Haswell's new instructions.

Video Coding

ippiSAD16x16_8u32s

ippiDecodeCAVLCCoeffs_H264_1u16s

ippiInterpolateBlock_H264_8u_P3P1R  (8x8)

ippiCopy_8u_C1R

ippiTransformQuantFwd4x4_H264_16s_C1

ippiFilterDeblockingChroma_HorEdge_H264_8u_C1IR

ippiSATD8x8_8u32s_C1R

ippiReconstructLumaIntra8x8MB_H264_16s8u_C1R

ippiSATD16x16_8u32s_C1R

ippsSet_8u

ippiInterpolateBlock_H264_8u_P3P1R  (16x16)

ippiDecodeExpGolombOne_H264_1u32s

ippiSAD8x8_8u32s_C1R

ippiInterpolateLuma_H264_8u_C1R

ippiTransformQuantInvAddPred4x4_H264_16s_C1IR

ippiFilterDeblockingChroma_HorEdge_H264_8u_C1IR

ippiInterpolateBlock_H264_8u_P2P1R

ippiReconstructLumaIntra4x4MB_H264_16s8u_C1R

ippiFilterDeblockingLuma_VerEdge_H264_8u_C1IR

ippiInterpolateChroma_H264_8u_C1R

ippiFilterDeblockingLuma_HorEdge_H264_8u_C1IR

 

 

JPEG

               ippiDCTQuantFwd8x8LS_JPEG_8u16s_C1R

                ippiDCTQuantInv8x8LS_JPEG_16s8u_C1R

 

Data Compression

ippsDecodeHuff_BZ2_8u16u

ippsDeflateHuff_8u

ippsEncodeHuff_8u

 

Cryptography

ippsSHA256Update

ippsSHA512Update

ippsHMACSHA256Update

ippsSHA512MessageDigest

ippsSHA256MessageDigest

ippsHMACSHA512MessageDigest

ippsHMACSHA256MessageDigest

ippsHMACSHA384MessageDigest

ippsHMACSHA224MessageDigest

 

 

Computer Vision

ippiAbsDiff_8u_C1R

ippiMean_StdDev_32f_C1R

ippiMean_StdDev_8u_C1R

 

Color Conversion

ippiYCbCr420ToBGR_8u_P3C3R

ippiBGRToCbYCr422_709HDTV_8u_AC4C2R

ippiYCbCr422ToBGR_8u_C2C3R

ippiYCbCr420ToBGR_709HDTV_8u_P3C4R

ippiYCbCr422ToBGR_8u_C2C4R

ippiBGRToYCbCr420_709HDTV_8u_AC4P3R

ippiCbYCr422ToBGR_709HDTV_8u_C2C3R

ippiBGRToYCbCr420_709CSC_8u_C3P3R

ippiCbYCr422ToBGR_709HDTV_8u_C2C4R

ippiBGRToYCrCb420_709CSC_8u_C3P3R

ippiBGRToCbYCr422_709HDTV_8u_C3C2R

ippiYCbCrToBGR_709CSC_8u_P3C3R

 ippiBGRToCbYCr422_8u_AC4C2R

 ippiBGRToYCbCr422_8u_AC4C2R

 ippiBGRToYCbCr422_8u_C3C2R

 ippiBGRToYCrCb420_8u_C3P3R

 ippiRGBToGray_8u_C3C1R

  ippiCbYCr422ToBGR_8u_C2C4R

 ippiYCbCr422ToBGR555_8u16u_C2C3R

 ippiYCbCr422ToBGR565_8u16u_C2C3R

 ippiYCbCr420ToBGR444_8u16u_P3C3R

 ippiYCbCr420ToBGR555_8u16u_P3C3R

 ippiYCbCr420ToBGR565_8u16u_P3C3R

 ippiYCbCr420ToBGR_709CSC_8u_P3C3R

 ippiBGRToYCbCr420_709CSC_8u_AC4P3R

 ippiBGRToYCrCb420_709CSC_8u_AC4P3R

 ippiBGRToYCbCr420_709CSC_8u_C3P2R

 ippiYCbCrToBGR_709CSC_8u_P3C4R

 ippiCbYCr422ToYCbCr422_8u_C2R

 ippiCbYCr422ToYCbCr420_8u_C2P2R

 ippiCbYCr422ToYCrCb420_8u_C2P3R

 ippiYCbCr420ToCbYCr422_8u_P2C2R

 ippiYCbCr422ToCbYCr422_8u_C2R

 ippiYCbCr422ToYCbCr420_8u_C2P2R

 ippiYCbCr420ToYCbCr422_8u_P2C2R

 ippiYCrCb420ToCbYCr422_8u_P3C2R

 ippiYCrCb420ToYCbCr422_8u_P3C2R

 

 

Image Processing

ippiAdd_32f_C1IR

ippiMinMax_16s_C1R

ippiAdd_32f_C1R

ippiMinMax_32f_C1R

ippiAdd_8u_C1RSfs

ippiMinMax_8u_C1R

ippiAddC_16s_C1RSfs

ippiMirror_8u_C1R

ippiAddC_32f_C1R

ippiMirror_8u_C1IR

ippiAddC_8u_C1RSfs

ippiMirror_8u_C3R

ippiAnd_8u_C1R

ippiMirror_8u_C3IR

ippiAndC_16u_C1R

ippiMul_32f_C1R

ippiCompareC_32f_C1R

ippiMulC_16s_C1RSfs

ippiCompareC_8u_C1R

ippiMulC_8u_C1RSfs

ippiConvert_16s8u_C1R

ippiNot_8u_C1R

ippiConvert_16u32f_C1R

ippiOr_8u_C1R

ippiConvert_16u8u_C1R

ippiScale_32f8u_C1R

ippiConvert_32f16u_C1R

ippiSqrt_32f_C1R

ippiConvert_32f8u_C1R

ippiSub_32f_C1R

ippiConvert_8u16s_C1R

ippiSub_32f_C1IR

ippiConvert_8u16u_C1R

ippiSub_8u_C1RSfs

ippiConvert_8u32f_C1R

ippiSubC_32f_C1R

ippiCountInRange_8u_C1R

ippiSum_32f_C1R

ippiDCT8x8Fwd_16s_C1I

ippiSum_8u_C1R

ippiDeinterlaceFilterCAVT_8u_C1R

ippiSwapChannels_8u_C3R

ippiDilate3x3_8u_C1R

ippiZigzagInv8x8_16s_C1

ippiDiv_32f_C1R

ippiDCT8x8Inv_16s_C1

ippiDivC_32f_C1R

ippiDCT8x8Inv_16s_C1I

ippiErode3x3_8u_C1R

ippiDCT8x8Inv_A10_16s_C1

ippiFilter_8u_C1R

ippiDCT8x8Inv_A10_16s_C1I

ippiFilterMedian_8u_C1R

ippiAdd_8u_C3RSfs

ippiMax_32f_C1R

ippiAdd_8u_C4RSfs

ippiMax_8u_C1R

ippiAdd_8u_C1IRSfs

ippiMaxIndx_32f_C1R

ippiAdd_8u_C3IRSfs

ippiMean_32f_C1R

ippiAdd_8u_C4IRSfs

ippiMean_8u_C1R

ippiAdd_16u_C1RSfs

ippiAdd_16u_C3RSfs

ippiSub_16u_C4IRSfs

ippiAdd_16u_C4RSfs

ippiSub_16s_C1RSfs

ippiAdd_16u_C1IRSfs

ippiSub_16s_C3RSfs

ippiAdd_16u_C3IRSfs

ippiSub_16s_C4RSfs

ippiAdd_16u_C4IRSfs

ippiSub_16s_C1IRSfs

ippiAdd_16u_AC4IRSfs

ippiSub_16s_C3IRSfs

ippiAdd_16s_C1RSfs

ippiSub_16s_C4IRSfs

ippiAdd_16s_C3RSfs

ippiSub_16sc_C1RSfs

ippiAdd_16s_C4RSfs

ippiSub_16sc_C3RSfs

ippiAdd_16s_C1IRSfs

ippiSub_16sc_C1IRSfs

ippiAdd_16s_C3IRSfs

ippiSub_16sc_C3IRSfs

ippiAdd_16s_C4IRSfs

ippiSub_32sc_C1RSfs

ippiAdd_16sc_C1RSfs

ippiSub_32sc_C3RSfs

ippiAdd_16sc_C3RSfs

ippiSub_32sc_C1IRSfs

ippiAdd_16sc_C1IRSfs

ippiSub_32sc_C3IRSfs

ippiAdd_16sc_C3IRSfs

ippiSubC_8u_C1RSfs

ippiAdd_32sc_C1RSfs

ippiSubC_8u_C1IRSfs

ippiAdd_32sc_C3RSfs

ippiSubC_16u_C1RSfs

ippiAdd_32sc_C1IRSfs

ippiSubC_16u_C1IRSfs

ippiAdd_32sc_C3IRSfs

ippiSubC_16s_C1RSfs

ippiAddC_8u_C3RSfs

ippiSubC_16s_C1IRSfs

ippiAddC_8u_C4RSfs

ippiSubC_16sc_C1RSfs

ippiAddC_8u_C1IRSfs

ippiSubC_16sc_C1IRSfs

ippiAddC_8u_C3IRSfs

ippiSubC_32sc_C1RSfs

ippiAddC_16u_C1RSfs

ippiSubC_32sc_C1IRSfs

ippiAddC_16u_C1IRSfs

ippiSqrt_32f_C3R

ippiAddC_16s_C1IRSfs

ippiSqrt_32f_AC4R

ippiAddC_16sc_C1RSfs

ippiSqrt_32f_C1IR

ippiAddC_16sc_C1IRSfs

ippiSqrt_32f_C3IR

ippiAddC_32sc_C1RSfs

ippiSqrt_32f_AC4IR

ippiAddC_32sc_C1IRSfs

ippiSqrt_32f_C4IR

ippiSub_8u_C3RSfs

ippiMinMax_32f_C3R

ippiSub_8u_C4RSfs

ippiMinMax_32f_C4R

ippiSub_8u_C1IRSfs

ippiMinMax_32f_AC4R

ippiSub_8u_C3IRSfs

ippiAnd_8u_C3R

ippiSub_8u_C4IRSfs

ippiAnd_8u_C4R

ippiSub_16u_C1RSfs

ippiNot_8u_C3R

ippiSub_16u_C3RSfs

ippiNot_8u_C4R

ippiSub_16u_C4RSfs

ippiOr_8u_C3R

ippiSub_16u_C1IRSfs

ippiOr_8u_C4R

ippiSub_16u_C3IRSfs

ippiXorC_8u_C1R

ippiXorC_8u_C1IR

ippiFilterMedianVerrt_f8u_C3R,

ippiScale_32f8u_C3R

ippiFilterMedianVerrt_f8u_C4R

ippiScale_32f8u_C4R

ippiMulC_16s_C1IRSfs

ippiFilterMedianHoriz_f8u_C1R,

ippiScale_32f8u_C3R

ippiFilterMedianVerrt_f8u_C1R,

ippiScale_32f8u_C4R

 

 

Signal Processing

ippsAbs_32f

ippsSubC_16s_Sfs

ippsAdd_16s_Sfs

ippsSubC_16s_I

ippsAdd_32f

ippsSubC_16s_ISfs

ippsAdd_32s_Sfs

ippsSubC_16sc_Sfs

ippsAddC_16s_Sfs

ippsSubC_16sc_ISfs

ippsAddC_32f

ippsSubC_32s_Sfs

ippsAddProductC_32f

ippsSubC_32s_ISfs

ippsConvert_16s32f

ippsSubC_32sc_Sfs

ippsConvert_32f16s_Sfs

ippsSubC_32sc_ISfs

ippsConvert_32f32s_Sfs

ippsMulC_8u_Sfs

ippsConvert_32f64f

ippsDivC_8u_Sfs

ippsConvert_32s32f_Sfs

ippsFIR64fc_16sc_Sfs

ippsConvert_64f32f

ippsFIR64fc_32sc_Sfs

ippsCopy_16s

ippsFIR64fc_32fc

ippsCopy_32f

ippsFIR_32fc

ippsCopy_32fc

ippsSum_32f

ippsCopy_32s

ippsSum_32s_Sfs

ippsCopy_64f

ippsSum_64f

ippsCopy_8u

ippsSwapBytes_16u

ippsDiv_32f

ippsMin_32s

ippsDivC_32f

ippsMin_16s

ippsDivC_64f

ippsMaxIndx_32s

ippsDotProd_32f

ippsLShiftC_16u

ippsExp_32f

ippsLShiftC_16u_I

ippsFFTFwd_CToC_32fc

ippsLShiftC_16s_I

ippsFFTFwd_RToCCS_32f

ippsAddC_32sc

ippsFIR_32f

ippsAdd_8u_Sfs

ippsFlip_32f

ippsAdd_8u_ISfs

ippsLn_32f

ippsAdd_16u

ippsLShiftC_16s

ippsAdd_16u_Sfs

ippsLShiftC_32s

ippsAdd_16u_ISfs

ippsMagnitude_32fc

ippsAddC_8u_Sfs

ippsMax_16s

ippsAddC_8u_ISfs

ippsMax_32f

ippsThreshold_GT_32f

ippsMax_32s

ippsThreshold_LT_32f

ippsMaxEvery_32f

ippsAutoCorr_16s_Sfs

ippsMaxIndx_32f

ippsAutoCorr_32f

ippsMean_32f

ippsAutoCorr_32fc

ippsMin_32f

ippsAutoCorr_64f

ippsMinIndx_32f

ippsAutoCorr_64fc

ippsMinIndx_32s

ippsAutoCorr_NormA_32f

ippsMinMax_32f

ippsAutoCorr_NormA_32fc

ippsMove_16s

ippsAutoCorr_NormA_64f

ippsMove_32f

ippsAutoCorr_NormA_64fc

ippsMove_8u

ippsAutoCorr_NormB_32f

ippsMul_32f

ippsAutoCorr_NormB_32fc

ippsMul_32fc

ippsAutoCorr_NormB_64f

ippsMul_64f

ippsAutoCorr_NormB_64fc

ippsMulC_16s_Sfs

ippsConv_32f

ippsMulC_32f

ippsConv_64f

ippsMulC_64f

ippsCrossCorr_32f

ippsPowerSpectr_32fc

ippsCrossCorr_32fc

ippsRealToCplx_32f

ippsCrossCorr_64f

ippsRShiftC_16s

ippsCrossCorr_64fc

ippsRShiftC_32s

ippsDCTFwd_32f

ippsSampleUp_32f

ippsDCTFwd_32f_I

ippsSet_16s

ippsDCTFwd_64f

ippsSet_32f

ippsDCTFwd_64f_I

ippsSet_32fc

ippsDCTInv_32f

ippsSet_32s

ippsDCTInv_32f_I

ippsSet_64f

ippsDCTInv_64f

ippsSet_8u

ippsDCTInv_64f_I

ippsSortAscend_32s

ippsDFTFwd_CToC_32f

ippsSqr_32f

ippsDFTFwd_CToC_32fc

ippsSqrt_32f

ippsDFTFwd_CToC_64f

ippsStdDev_32f

ippsDFTFwd_CToC_64fc

ippsSub_32f

ippsDFTFwd_RToCCS_32f

ippsSub_32fc

ippsDFTFwd_RToCCS_64f

ippsSub_32s_Sfs

ippsDFTFwd_RToPack_32f

ippsSubC_32f

ippsDFTFwd_RToPack_64f

ippsDFTFwd_RToPerm_32f

ippsFFTInv_CToC_64f_I

ippsDFTFwd_RToPerm_64f

ippsFFTInv_CToC_64fc

ippsDFTInv_CCSToR_32f

ippsFFTInv_CToC_64fc_I

ippsDFTInv_CCSToR_64f

ippsFFTInv_PackToR_32f

ippsDFTInv_CToC_32f

ippsFFTInv_PackToR_32f_I

ippsDFTInv_CToC_32fc

ippsFFTInv_PackToR_64f

ippsDFTInv_CToC_64f

ippsFFTInv_PackToR_64f_I

ippsDFTInv_CToC_64fc

ippsFFTInv_PermToR_32f

ippsDFTInv_PackToR_32f

ippsFFTInv_PermToR_32f_I

ippsDFTInv_PackToR_64f

ippsFFTInv_PermToR_64f

ippsDFTInv_PermToR_32f

ippsFFTInv_PermToR_64f_I

ippsDFTInv_PermToR_64f

ippsFIR_64f

ippsFFTFwd_CToC_32f

ippsFIR_64fc

ippsFFTFwd_CToC_32f_I

ippsIIR_32f

ippsFFTFwd_CToC_32fc

ippsIIR_32fc

ippsFFTFwd_CToC_32fc_I

ippsIIR_64f

ippsFFTFwd_CToC_64f

ippsIIR_64fc

ippsFFTFwd_CToC_64f_I

auto

ippsFFTFwd_CToC_64fc

ippsMinEvery_32f

ippsFFTFwd_CToC_64fc_I

ippsDiv_32f_I

ippsFFTFwd_RToCCS_32f

ippsAdd_8u16u

ippsFFTFwd_RToCCS_32f_I

ippsAdd_16s

ippsFFTFwd_RToCCS_64f

ippsAdd_16s_I

ippsFFTFwd_RToCCS_64f_I

ippsAdd_16s_ISfs

ippsFFTFwd_RToPack_32f

ippsAdd_16sc_Sfs

ippsFFTFwd_RToPack_32f_I

ippsAdd_16sc_ISfs

ippsFFTFwd_RToPack_64f

ippsAdd_32s_ISfs

ippsFFTFwd_RToPack_64f_I

ippsAdd_32sc_Sfs

ippsFFTFwd_RToPerm_32f

ippsAdd_32sc_ISfs

ippsFFTFwd_RToPerm_32f_I

ippsAddC_16s_Sfs

ippsFFTFwd_RToPerm_64f

ippsAddC_16s_I

ippsFFTFwd_RToPerm_64f_I

ippsAddC_16s_ISfs

ippsFFTInv_CCSToR_32f

ippsAddC_16sc_Sfs

ippsFFTInv_CCSToR_32f_I

ippsAddC_16sc_ISfs

ippsFFTInv_CCSToR_64f

ippsAdd_32f_I

ippsFFTInv_CCSToR_64f_I

ippsAdd_32fc

ippsFFTInv_CToC_32f

ippsAdd_32fc_I

ippsFFTInv_CToC_32f_I

ippsAdd_64f

ippsFFTInv_CToC_32fc

ippsAdd_64f_I

ippsFFTInv_CToC_32fc_I

ippsAdd_64fc

ippsFFTInv_CToC_64f

ippsAdd_64fc_I

ippsAddC_32f_I

ippsAddC_16u_Sfs

ippsDiv_32f_I

ippsAddC_16u_ISfs

ippsDivC_32f_I

ippsAddC_32s_Sfs

ippsDivC_64f_I

ippsAddC_32s_ISfs

ippsAddC_32sc_Sfs

ippsSubC_16u_ISfs

ippsAddC_32sc_ISfs

ippsSubC_16s_Sfs

ippsSub_8u_Sfs

ippsSubC_16s_I

ippsSub_8u_ISfs

ippsSubC_16s_ISfs

ippsSub_16u_Sfs

ippsSubC_16sc_Sfs

ippsSub_16u_ISfs

ippsSubC_16sc_ISfs

ippsSub_16s

ippsSubC_32s_Sfs

ippsSub_16s_Sfs

ippsSubC_32s_ISfs

ippsSub_16s_I

ippsSubC_32sc_Sfs

ippsSub_16s_ISfs

ippsSubC_32sc_ISfs

ippsSub_16sc_Sfs

ippsMulC_8u_Sfs

ippsSub_16sc_ISfs

ippsDivC_8u_Sfs

ippsSub_32s_ISfs

ippsFIR64fc_16sc_Sfs

ippsSub_32sc_Sfs

ippsFIR64fc_32sc_Sfs

ippsSub_32sc_ISfs

ippsFIR64fc_32fc

ippsSubC_8u_Sfs

ippsFIR_32fc

ippsSubC_8u_ISfs

ippsSubC_16u_Sfs

Note: Few domain functions are deprecated in IPP 8.2 and later versions. To get information about additional domains installation, refer to Knowledge Base article - https://software.intel.com/en-us/articles/install-the-additional-domains-for-intel-ipp-82  

 

 Reference:

 

Intel® Integrated Performance Primitives (Intel® IPP) Functions Optimized for Intel® Advanced Vector Extensions (Intel® AVX)

Haswell New Instruction Descriptions Now Available!

Intel® Advanced Vector Extensions Programming Reference

 

 

 

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.