ippiFilterGauss_8u_C1R 5x5 crashes on Core Duo/Xeon

ippiFilterGauss_8u_C1R 5x5 crashes on Core Duo/Xeon

Hi ALL,

ippiDemo.exe crashes in ippiFilterGauss_8u_C1R 5x5on Core Duo/Xeon. This bug affects our customers. It crashes every time. Any one else sees this?

Are there any newer updates than IPP 5.1.1?

Can you force IPP runtime to install and use px dlls on Core Duo/Xeon as a workaround?

(PX DLLs work fine on above CPUs).

Thanks!

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello,

could you please provide more info about your issue, what image sizes, what parameters do you use in function call? I've just tried this function on Pentium M with 256x256 8u C1 image Jaehne and detected no problem with that. Could you also describe what IPP componets versions are reported in Help->About dialog?

Well, will also try on Core 2 Duo...

Regards,
Vladimir

tested ippiDemo.exe on Intel Core 2 Duo 2.66 GHz, Win32, IPP v5.1.1, Jaehne image, 256x256, 8u, C1, ippiFilterGauss_8u_C1R, 5x5 kernel- no problem detected

Vladimir

Hi Vladimir,

Thanks for the response.

Image size is 640x480 mono 8. It seems to crash with any image size. I have two customers reporting crash on has T2500 2 GHz cpu, the other has z Xeon (details below). Both run IPP in DLL form (they use IPP 5.1.1 runtime install). I have reproduced crash on Xeon. crash goes away if you:

In windowssystem32 locate "ippiw7-5.1.dll" and rename it to "ippiw7-5.1-org.dll"

In windowssystem32 locate "ippipx-5.1.dll" and copy it over old "ippiw7-5.1.dll"

Seems clean-cut bug in ippiw7-5.1.dll. Please try on the T2500 or Xeon.

Thanks!

CPU-Z 1.37 report file

Processor(s)

Number of processors
2

Number of cores
1 per processor

Number of threads
2 (max 2) per processor

Name
Intel Xeon

Code Name
Prestonia

Specification
Intel Xeon CPU 2.40GHz

Package
Socket 604 mPGA

Family/Model/Stepping
F.2.5

Extended Family/Model
F.2

Brand ID
11

Core Stepping
M0

Technology
0.13 um

Core Speed
2399.2 MHz

Multiplier x Bus speed
18.0 x 133.3 MHz

Rated Bus speed
533.2 MHz

Stock frequency
2400 MHz

Instruction sets
MMX, SSE, SSE2

L1 Data cache
8 KBytes, 4-way set associative, 64-byte line size

Trace cache
12 Kuops, 8-way set associative

L2 cache
512 KBytes, 8-way set associative, 64-byte line size

Chipset & Memory

Northbridge
Serv
erWorks ID0017 rev. 32

Southbridge
ServerWorks ID0225 rev. 00

Memory Type

Memory Size
1024 MBytes

System

System Manufacturer
Dell Computer Corporation

System Name
PowerEdge 1600SC

System S/N
382PM21

BIOS Vendor
Dell Computer Corporation

BIOS Version
A12

BIOS Date
10/19/2004

Memory SPD

Software

Windows Version
Microsoft Windows XP Professional Service Pack 2 (Build 2600)

DirectX Version
9.0c

Well, theoretically it can be a bug, but we need somehow to reproduce this. I've tried V8 and W7 DLLs on Core 2 Duo, with your image size and can't reproduce issue.

Could you also check if there is no old version of IPP on your system? What is reported in ippiDemo Help->About dialog?

Vladimir

Hi Vladimir,

The image size is 640x480. Any ROI size crashes. IPP version is 5.1.1 runtime(the latest on your web site). Systems: T2500 (2GHz) and Dual CPUXeon 2.4 GHz. OS: XP Pro.

In ippiDemo I do the following:

1. Load image

2. Select rectangular ROI

3. Select crashing filter.

4. Observe crash.

I'll try to get "ippiDemo Help->About dialog" info. I don't have access to the system with the problem right now. Need to ask customer to provide this info.

Initial crash reported with our product that uses IPP. Then reproduced with ippiDemo.

Thank you.

Thanks for additional info. I recommend you also submit that issue to Intel Premier Support, then you will be notified when any update will be available. As a workaround for that issue you might use removing ippiw7-5.1.dll from system. In that case application will automatically chose lower optimized version (A6 in that case), so you should not loose in performance dramatically.

Vladimir

Please also try to ensure that there is no conflict between two versions of IPP on target system (or conflict with some other software). We still not able to reproduce that issue, so I can't confirm that there is a bug in IPP v5.1.1

Vladimir

Thanks Vladimir.

After further investigation I've found that there was similar issue withother filtering function (ippiFilterLaplace_8u_C1R). Issue was related to OpenMP threading inside of IPP DLLs. These functions use similar approach in internal threading so it might be the same issue.

Before IPP v5.2 beta, where this issue was fixed, you have several options to workaround this:

1. Use lower optimized DLLs (A6 or PX), in this case you will loose in performance, especially for PX case

2. If your application was compiled with Intel C/C++ compiler you can recompile it with following changes (in this case you will not loose performance at all):

The bug in the code of ippiFilterLaplace_8u_C1R has been found. It will be fixed in the next lib version. We can suggest a temporary solution for a customer.
For successful application execution one should insert at the beginning of the program three code lines:

int numThreads;
ippGetNumThreads(&numThreads);
omp_set_num_threads(numThreads);

For successful building also one should insert omp.h:

#include

3. If your application was build with non Intel compiler you can disable IPP internal threading through call of ippSetNumThreads(1). In this case only one thread will be launched, but most appropriate optimized code will be runned

4. You can also disable IPP internal threading without recompiling your application. In that case you need to set environment variable OMP_NUM_THREADS=1 before launching application.

Regards,
Vladimir

All makes sense now. Thanks again.

This is a life saver. I was forced to use IPP5.1.0....

Will solution #2 works with VC2005, which support OpenMP?

Since Mac OS X dylib also have problem with threading, I strongly suggest a bug fix release for Ipp5.1.1 since it is pretty muchno good for high performance platforms such as our Dual Xeon systems and Mac Pro.

I can understand the IPP5.2 is coming. But I would be reluctant to use it right away in our product release until it is proven stable. I think many of your customers will really appreciate a bug fix release for IPP5.1.1.

Best regards,

Albert

Hello Albert,

I've submitted your request to Intel Premier Support, so you can expect someone will contact you soon.

Regarding your previous question I think that option #2 should work with VC2005.

By the way, do you have any other comments regarding IPP? How do you find its functionality/performance/usability? Do you see any missed functionality which you may want to have in IPP?

Regards,
Vladimir

Hi Vladimir,

Thanks for the response and ask me my commemts!

For meone of the most impotantthing is its internalthreading in major image processing fumctions. This will really get the high performance out of multiple processor system without complicate the development. I hope more effort are on it to make it more efficient.

I am using IPP5.1 one both Windows and Mac. I cannot make the threading works on 2 Xeon system or Mac Pro. I tried your fix, doesNOT work for me.

I don't knowif IPP5.2 have max/min(a,b), where a and b are images? How about color space conversion for data type other than _8u?

Since more and more processor are added to high end system, I dont' know how much improvement you can get by threadhing single functions. For example ippiAdd for large images must be more about memory access than the computing. I suspect many of the image processing function cannot take advantage of the cache due to large image size.

If youcan find a way to group functions together, performance may increase signaficantly. Say I need to do C=k*A+B/A, where A,B,C are large images:

ippiMulC_(k, A, C);

ippiDiv_ (A, B, TMP); // TMP is a temp image

ippiAddC_(TMP, C);

Since the images are large and the three operation run through the whole images one by one, it is pretty much outside of cache, accessing image 7 times. Threading MulC, Div will not help too much.

If you can have a function:

ippiFunction op[3];

op[0] = ippiMulC_

op[1] =ippiDiv_

op[2]=ippdAdd_

ippiApplyFunctions(A,B,C, op..);

Then you will have many ways to do the functions inside at pixel level, or tile level, so that memory only be access 3 times, instead of 7.

Just an idea, don't know if you can even do it this way...

Albert

Hi, Albert,

I could answer your second question. We are now doing the add-on IPP feature that could allow for executing sequences of image processing operations. The code for the sequence of operations will look close to natural writingand for big images (more than L2 cache) will work 2-3 times faster.

I hope to seethis featurein the next release. For out-of-cycle please ask via premier.intel.com.

Thanks,

Alexander

Login to leave a comment.