Regard ippiResize* function usage

Regard ippiResize* function usage

We recently noticeda common forum topic regarding ippiResize* function, I would like to share the following information with forum participants, please reply this thread if you have anyadditional comments.

First, start Intel IPP v6.0, there are several APIs deprecated including ippiResize(), please visit this article in Intel IPP Knowledge Base for more details.

Second, there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference. It also includes a C code sample on ippiResizeSqrPixel() usage.

Additionally,you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytesmay not always equal tochannel*Width.
It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with4 bytes-aligned ora multiple of 32 when use ippiMalloc.

Best Regards,
Ying

20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Could you provide more information on ippiResizeSqrPixel's usage of an external buffer? They seem to be sufficiently large to require a separate allocation for every resized image, as opposed to using the destination buffer or the stack.
The deprecated ippiResize did not require one; did it allocate internally, or was it more efficient in this regard?

Hi,

Each language (C++, C#, Delphi, other), each platform has own memory manager. An user will control the allocation of memory in your applicationitself usingthe function with external buffer (ippiResizeSqrPixel). Therefore usage of such functionality is more preferable, than usage old (ippiResize) with internal allocation and clearing of the memory.

Thanks,
Beg

Dear Mr. Ying,
I am new to Ipp community. I was using OpenCV for Image Processing software development. But, due to limitation of 8 bit(my image data is of 16 bit ), I started trying Ipp from last week with Trial version. I am using the function
ippiFilterMedian_16u_C1R for median filter.

Regarding your suggestion,
"Additionally,you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytesmay not always equal tochannel*Width.
It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with4 bytes-aligned ora multiple of 32 when use ippiMalloc.",
I have a query for my 16 bit single channel data as set in my program.
int dstStep = dstWidth * 2; // 1 WORD size = 2 BYTE, wheredstWidth = srcWidth - (nKernelWidth - 1);

Am I doing correctly? Pls. suggest.
The function build is OK and it is working fine.

Thanks in advance.
John

Quoting - Ying Hu (Intel)

We recently noticeda common forum topic regarding ippiResize* function, I would like to share the following information with forum participants, please reply this thread if you have anyadditional comments.

First, start Intel IPP v6.0, there are several APIs deprecated including ippiResize(), please visit this article in Intel IPP Knowledge Base for more details.

Second, there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference. It also includes a C code sample on ippiResizeSqrPixel() usage.

Additionally,you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytesmay not always equal tochannel*Width.
It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with4 bytes-aligned ora multiple of 32 when use ippiMalloc.

Best Regards,
Ying

Hi John,

Could you tell how do you creat/load or store your dst image data ?
for example, if you use OpenCV API cvCreateImage() to create a dst image,

int dstWidth=7;
int dstHeight= 1;
CvSize tempSize={dstWidth,dstHeight};
dst = cvCreateImage( tempSize, 16, 1 );
Then therow ofdstis 4 byte aligned, thus dst->widthStep=16, not 7x2=14, so to call dst->widthStep may be safe here
If you are using malloc()
dst= (short *) malloc(dstWidth*dstHeight*sizeof(short));
Then the dstStep is dstWidth*2.

Best Regards,
Ying

Hi,

We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.

Can anyone please enlighten on this issue?

Quoting - I2R D&T Team
Hi,

We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.

Can anyone please enlighten on this issue?

Hello,

One quick thought, the delaymay berelated tothe muil-thread (threads overhead). Could you please attach a small test code (especially the calll order and time measure)so we can reproduce the problem?

Thanks
Ying

Hi Ying Hu,

Galdly! ;-) Please locate the 2 C++ attached source files. I am using OpenCV 1.0 to load the image into buffer and run the various resize function in IPPLibTester class. I am using IPP 6.1.1.035 library to run this example.

Regards,

I2R D&T Team

Attachments: 

Hi Ying Hu,

Did you managed to resolve the issue I raised?

Regards,

I2R D&T Team

Quoting - I2R D&T Team
Hi Ying Hu,

Did you managed to resolve the issue I raised?

Regards,

I2R D&T Team

Hello I2R D&T Team

Sorry, I happen to notice your reply till now.Could you please tell me more informations, like
1. the IPP version andthe library you are linking, your OS and CPU type?
2. And how do you measure your observation, by task manager : cpu usageor other time measure funtions?

But generally, it is expected that high performance corresponding high cpuusage. What is your concerns aboutippResizeSqrPixel? it take more times than ippiResizeor CPU usage is high?

Additionally, about the parallel code, if you are calling ipp 6.1 and dynamic library, the ippiResizeSqrPixel is threaded by OpenMP internally. So in most cases, you don't need to write OpenMP parallel code to wrapper it again. IPPwill disable the IPP internal thread if it detectexternal parallel region. Is there any reason for your create OpenMP thread by yourself?

Best Regards,
Ying

Hi Ying Hu,

Thank you for your reply. I thought my concern will ends with no answer from here.

The program I am working uses IPP 6.1.1.035 and also links to OpenCV 1.0 for image retrieval. The OS and CPU I am using is Win XP 32bit, running on Intel Xeon E5430 2.66Ghz with 3GB memory. I observe the high CPU usage from task manager for the process I am using the example. The time measurement for the resize functions are calculated in the program.

My concern is the unusual high CPU usage for ippiResizeSqrPixel function that ippiResize function does not produce. I am testing using OpenMP and non-OpenMP version to test if the ippiResizeSqrPixel function does exhibits high CPU usage and indeed it does.

I understand that ippiResizeSqrPixel function takes over ippiResize function because it has more variety of resize techniques under one function. However, if the ippiResizeSqrPixel function exhibits high CPU usage, it is not attractive and practical to use. Performing image resizing is handy in many applications we are developing but it just one part of the whole operations running in parallel. Having high CPU usage for a single opeartion slows down significantly other parallel operations in our design. In this case, we have to switch back and use the deprecated ippiResize function.

As long as the ippiResize function is still supported in future IPP releases, this issue is just a concern. You understand my point here?

Regards,

I2R D&T Team

Best Reply

Hello,

as Ying have mention before, high cpu usage usually corresponds to better performance. If you will count number of cpu clocks spent in ippiResize and ippiResizeSqrPixel you may notice that the second function takes less clocks to do the same work. That basically left more time for other processing you may do in parallel.

Note, if you do threading on top of IPP (i.e. application has several threads calling IPP functions in parallel) itmay make sense to disable IPP internal threading with ippSetNumThreads(1) call to avoid threads oversubscription.

Regards,
Vladimir

Hi Vladimir,

Thank you for rehighlighting the point of high performance usually corresponds to high cpu usage.

If I may and interests to ask which software did you used to get the number CPU clocks executed by a specific function?

Regards,

I2R D&T Team

Hi I2R D&T Team,

We usually use IPP function ippGetCpuClocks() to get CPU clocks.

for example,

Ipp64u start = ippGetCpuClocks();
for (int k=0;k<1000; k++)
//ipplib->map_deprecatedCenter((byte*)img->imageData, img->widthStep, szSrc, destImg, nDestStride, szDest);
ipplib->map_serial((byte*)img->imageData, img->widthStep, szSrc, destImg, nDestStride, szDest);

//Sleep(99);
//}
Ipp64u end = ippGetCpuClocks();

double ippCPE= double(end-start)/(szDest.width*szDest.height*3*1000.);
double ippClock= double(end-start)/(double(pMhz)*1000. *100.0);
printf( " IPP cpe = %gn", ippCPE );
printf( " IPP time = %gmsn", ippClock );

I did a test on my lapatop with your test program.
Core2Duo-2.0GHz, T7300, 2 cores, 1.96RAM, linking 32bitdynamic library, ippi.lib ippcore.lib

ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809

map_deprecatedCenter :
IPP cpe = 42.6039
IPP time = 338.239ms
calculating time = 338.413ms
cpu usage= 50%, TAKE FULL ONE core

map_serial:

IPP cpe = 32.0714
IPP time = 254.62ms
calculating time = 254.752ms
cpu uage=100%, take full two core.

Aswe mentioned, the ippiReiszeSqrPixel is threaded internally by OpenMP, it will start 2 threads on 2 core machines automaticlly. You don't need to write thread code for calling the function. And if you'd like to reduce the CPU usage and use only one thread, you can set ippSetNumThreads (1) before call ippiReiszeSqrPixel().

There are someerrors in themap_deprecatedCenter,i did a little modification and attach the modified code here for your reference.

(For the same conditions we must specify the coordinates of the center in the middle of dst roi.

Otherwise it will be processed the non-whole image. I change

x = (int)(szDest.width / 2.0);

y = (int)(szDest.height / 2.0);)

Best Regards,
Ying

Attachments: 

AttachmentSize
Download IPPConsole_m.cpp2.48 KB
Download IPPConsole_m.h5.26 KB

Hi Ying et al,

We would be using the non-threaded version due to higher-level threading in our software. What would the performance numbers be for ippiReiszeSqrPixel ('map_serial') when just using one thread (maxing out one core)?

In my view - and for our usage pattern - it is still essential that new functions replacing older ones also perform at-least comparable to (preferably better than) the functions they replace for single-threaded use. The use of multi-threading should be an option (as it also is) that just enables higher performance in the right usages.

Thanks.

- Jay

Hi Ying Hu,

Thank you for sharing your views on the issue. Now I have a better idea how to make comparisons between similar ipp functions.

I do share the same thought as jay. This means setting manually the number of cpu cores to use in order have a better performance overall.

Regards,

I2R D&T Team

Hi I2R D&T Team and Jay,

Right, you can set IPP threads manually based on your real application mode, for example, on 4 core machine, ippSetNumThreads (2) and leave 2 core for your other job.

And in sirail mode (ippSetNumThreads=1), the ippiResizeSqrPixel function is comparable to the disprecated functions in performance.
for exmaple,

ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809
ippiResizeCenter:
IPP cpe = 42.1132
IPP time = 334.343ms
ippiResize:
IPP cpe = 62.7621
IPP time = 498.279ms
ippiResizeSqrPix
IPP cpe = 54.6182
IPP time = 433.622ms
Press any key to continue . . .
ippiResizSqrPixel should corresponding to IPPiResize(shift). If processed region is the same, comparing the ippiResizeCenter, the calculationsare almostthe same.

Regards,
Ying

Hi Mr. Ying,

As far as I understand it the stepBytes parameter could not be a single one, as it is now, for images like YUV (YCbCr), where luma and chroma has different width.

I suspect this is the reason I'm crashed when I use ippiResizeSqrPixel()

Thanks

Gilad

Hi Gilad,

Right, the current ippiResizeSqrPixel function only support single stepBytes. It assume the input iamge isgeneral image, like RGB image or YUV4:4:4, which have same widths in multi-channel. As I understand, you may handle theimage like YUV(YCbCr, i.e 4:2:0), then you need call the function planar by planar if the stepBytes is different planer by planer.

For example, ifresizethe frame in video,which is YUV 4:2:0, 3 planar, the resize function will be called three times.

Please try them and check the stepBytes1,stepBytes2,astepBytes3, and let us know if any problem.

Regards,
Ying

PS.one more resizefunction, ippiResizeYUV422_8u_C2R() can handle image format like YUV422 or YCrCb422

Just for who have high cpu usage issue when multi-thread is on , the below article may provide one possible cause,
High CPU usage and Intel IPP threaded function

Regards,
Ying

Login to leave a comment.