IPP initialization issue with .NET

IPP initialization issue with .NET

When I run this code, 7 is written to the console (SSE 2 is enabled). If I run some FFT methods and re-run initialization, I get 3295 (SSE 4.2). What can I do to get consistent results from this API?

I am using .NET 4.0 with P/Invoke signatures for all methods used below:

class Program
{
static void Main(string[] args)
{
IppCpuType cpuType = core.ippGetCpuType();
core.ippInitCpu(cpuType);
if (cpuType == IppCpuType.ippCpuAVX)
core.ippEnableCpu(cpuType);

core.ippInit();

ulong features = core.ippGetEnabledCpuFeatures();

Console.WriteLine(features);
}
}

Thank you,
Greg Chernis

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

As far as I know there is an issue with ippInit function ( a bug was detected several days ago ). What instruction set do you want to use, SSE2 or SSE4?

This is a follow up and please take a look at:

Forum Topic: Load and Unload issues with Waterfall DLLs ( Instruction Set specific )
Web-link: software.intel.com/en-us/forums/topic/385488

Hello Sergey,

I have access to Westmere and Sandy Bridge Xeon-based machines. I'd like to use SSE 4.2 or AVX, whichever is available.

Thank you,

-Greg Chernis

Also, I don't see how the mentioned issue is related to the issue I am having...

Hi Gregory,

>>...If I run some FFT methods and re-run initialization, I get 3295 (SSE 4.2).

I checked ippdefs.h header:
...
typedef enum {
...
ippCpuSSE42 = 0x45, /* Processor supports Streaming SIMD Extensions 4.2 instruction set */
ippCpuAVX = 0x46, /* Processor supports Advanced Vector Extensions instruction set */
...
} IppCpuType;
...
and I don't see any code / number that matches to 3295. So, could you explain how did you get it?

>>...What can I do to get consistent results from this API?

Ideally, I would use initialization ( with ippInit ) at the beginning and would not do re-initialization until all processing is completed. I'd like to understand why do you need to re-initialize IPP libraries after some processing is done?

IPP Architecture Reference Manual, Volume 1 talks about GetEnabledCpuFeatures() as a method that returns a set of flags, also described in ippcore.h.  They are the same flags as in GetCpuFeatures().  3295 or CDF (Hexadecimal) has the bit for SSE 4.2 set.  That's how I know that all is well.  It appears that I can get 7 (represents SSE 2 only) right after initialization, but things get better (SSE 4.2) when I run initialization code again.

Thanks for looking at this with me,

-Greg

I should also mention that the problem is intermittent, though reproducible.

Hi Greg,

you should use ippInit() function only, don't use EnableCPU at all - this one has been already deprecated and does nothing. Also it is not clear for me from your code the purpose of calling InitCPU - you are mixing 2 different methods - CpuType (deprecated approach - don't use it) and CpuFeatures. All what you need - (1) call ippInit (2) then call GetCpuFeatures - all other calls in your initialization code are unnecessary

regards, Igor

Igor,

You're probably looking at a manual different from the one I am inspecting ( Document number A24968-036US).  This particular manual does not specify deprecation in the same way as you do.

I am using redistributable DLLs.

If I simply run GetEnabledCpuFeatures() before and after a call to FFTGetSize_C_32fc(), I get 7 (SSE2 enabled) before and hex CDF (SSE 4.2 enabled) after the call.  On a Sandy Bridge machine, I get hex FDF (AVX enabled).

Does this look like the correct way to do things?

>>...If I simply run GetEnabledCpuFeatures() before and after a call to FFTGetSize_C_32fc(), I get 7 (SSE2 enabled) before and
>>hex CDF (SSE 4.2 enabled) after the call. On a Sandy Bridge machine, I get hex FDF (AVX enabled)...

I wonder if you could execute pure C/C++ tests ( without .NET ) on your computers?

I surely can run native code, but I prefer .NET code as I will have to inter-operate with native code from a large existing .NET application.

>>...I surely can run native code, but I prefer .NET code...

Gregory, I've asked to do a simple test ( implemented in C/C++ ) if it is possible. I understand that re-implementation of some .NET codes is Not an option.

I will also follow up with some advises later.

It looks like checking for enabled features after a call to GetSize() routines works well.  Case closed.  Sergey and Igor, thank you for all the help!

Hi Gregory,

>>...It looks like checking for enabled features after a call to GetSize() routines works well...

Thanks for the update. Let me do one more post.

You should always watch out for CPU Dispatching DLLs ( also known as Waterfall DLLs ) and it is applicable for IPP and MKL libraries. So, if incorrect set of CPU Dispatching DLLs is used it usually affect performance of applications. Here is an example with MKL:

[ Test 1 - 64-bit Windows 7 - Default SSE2 DLLs are used ]

> Test1153 Start <
Sub-Test 1.1 - Runtime binding of MKL functions
Dynamic library mkl_rt.dll loaded
Initialization Done
Sub-Test 1.3
Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130124
Processor optimization: Default processor
Sub-Test 3.2 - SGEMM
Matrix multiplication C[ 8192x8192 ] = A[ 8192x8192 ] * B[ 8192x8192 ]
Allocating memory for matrices ( 32-byte alignment )
Intializing matrix data
Measuring performance of SGEMM function
Iteration 01 - Completed in 17.847 secs
Iteration 02 - Completed in 16.895 secs
Iteration 03 - Completed in 16.614 secs
Iteration 04 - Completed in 16.661 secs
Iteration 05 - Completed in 17.515 secs
Deallocating memory
Dynamic library mkl_rt.dll unloaded
> Test1153 End <

[ Test 2 - 64-bit Windows 7 - AVX DLLs are used ]

> Test1153 Start <
Sub-Test 1.1 - Runtime binding of MKL functions
Dynamic library mkl_rt.dll loaded
Initialization Done
Sub-Test 1.3
Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Major version : 11
Minor version : 0
Update version : 2
Product status : Product
Build : 20130124
Processor optimization: Intel(R) Advanced Vector Extensions (Intel(R) AVX) Enabled Processor
Sub-Test 3.2 - SGEMM
Matrix multiplication C[ 8192x8192 ] = A[ 8192x8192 ] * B[ 8192x8192 ]
Allocating memory for matrices ( 32-byte alignment )
Intializing matrix data
Measuring performance of SGEMM function
Iteration 01 - Completed in 8.237 secs
Iteration 02 - Completed in 7.457 secs
Iteration 03 - Completed in 7.566 secs
Iteration 04 - Completed in 7.488 secs
Iteration 05 - Completed in 7.550 secs
Deallocating memory
Dynamic library mkl_rt.dll unloaded

As you can see Test 2 runs almost twice faster (!). Sorry for a test with MKL but it clearly demonstrates how performance is negatively affected.

When properly initialized, I see near-doubling performance on otherwise similar machines with AVX with IPP!

Leave a Comment

Please sign in to add a comment. Not a member? Join today