Forum Jump

Select Group :
Select Forum :
Sorted By :
Sort Order :
From The :
 
Thread Tools  Search this thread 
Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
June 10, 2009 9:15 AM PDT
Regard ippiResize* function usage

We recently noticed a common forum topic regarding ippiResize* function,  I would like to share the following information with forum participants, please reply this thread if you have any additional comments.

First,  start Intel IPP v6.0, there are several APIs deprecated including ippiResize(),  please visit this article in Intel IPP Knowledge Base for more details.

Second,  there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference.  It also includes a C code sample on ippiResizeSqrPixel() usage.

Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width.
It may be  ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.

Best Regards,
Ying

oxydius
Total Points:
270
Status Points:
220
Green Belt
June 15, 2009 8:11 PM PDT
Rate
 
#1
Could you provide more information on ippiResizeSqrPixel's usage of an external buffer? They seem to be sufficiently large to require a separate allocation for every resized image, as opposed to using the destination buffer or the stack.
The deprecated ippiResize did not require one; did it allocate internally, or was it more efficient in this regard?


Yuri Tikhomirov (Intel)
Total Points:
410
Status Points:
360
Green Belt
June 16, 2009 2:41 AM PDT
Rate
 
#2 Reply to #1
Hi,

Each language (C++, C#, Delphi, other), each platform has own memory manager. An user will control the allocation of memory in your application itself using the function with external buffer (ippiResizeSqrPixel). Therefore usage of such functionality is more preferable, than usage old (ippiResize) with internal allocation and clearing of the memory.

Thanks,
  Beg


johnarg15yahoo.com
Total Points:
20
Registered User
June 17, 2009 1:31 AM PDT
Rate
 
#3
Dear Mr. Ying,
I am new to Ipp community. I was using OpenCV for Image Processing software development. But, due to limitation of 8 bit(my image data is of 16 bit ), I started trying Ipp from last week with Trial version. I am using the function
ippiFilterMedian_16u_C1R for median filter.

Regarding your suggestion,
"Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width.
It may be  ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.",
I have a query for my 16 bit single channel data as set in my program.
int dstStep   = dstWidth * 2;        // 1 WORD size = 2 BYTE, where dstWidth  = srcWidth  - (nKernelWidth - 1);

Am I doing correctly? Pls. suggest.
The function build is OK and it is working fine.

Thanks in advance.
John

Quoting - Ying Hu (Intel)

We recently noticed a common forum topic regarding ippiResize* function,  I would like to share the following information with forum participants, please reply this thread if you have any additional comments.

First,  start Intel IPP v6.0, there are several APIs deprecated including ippiResize(),  please visit this article in Intel IPP Knowledge Base for more details.

Second,  there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference.  It also includes a C code sample on ippiResizeSqrPixel() usage.

Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width.
It may be  ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.

Best Regards,
Ying




Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
June 22, 2009 11:19 PM PDT
Rate
 
#4 Reply to #3

Hi John,

Could you tell how do you creat/load or store your dst image data ?
for example, if you use OpenCV API cvCreateImage() to create a dst image, 

  int dstWidth=7;
  int dstHeight= 1;
  CvSize tempSize={dstWidth,dstHeight};
  dst = cvCreateImage( tempSize, 16, 1 );
Then the row of dst is 4 byte aligned, thus dst->widthStep=16, not 7x2=14, so to call dst->widthStep may be safe here
If you are using malloc()
  dst = (short *) malloc(dstWidth*dstHeight*sizeof(short));
Then the dstStep is dstWidth*2.

Best Regards,
Ying

I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
July 14, 2009 3:52 AM PDT
Rate
 
#6
Hi,

We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.

Can anyone please enlighten on this issue?


Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
July 21, 2009 9:11 PM PDT
Rate
 
#7 Reply to #6
Quoting - I2R D&T Team
Hi,

We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.

Can anyone please enlighten on this issue?

Hello,

One quick thought, the delay may be related to the muil-thread (threads overhead). Could you please attach a small test code (especially the calll order and time measure) so we can reproduce the problem?

Thanks
Ying

I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
July 23, 2009 9:10 PM PDT
Rate
 
#8 Reply to #7
Hi Ying Hu,

Galdly! ;-) Please locate the 2 C++ attached source files. I am using OpenCV 1.0 to load the image into buffer and run the various resize function in IPPLibTester class. I am using IPP 6.1.1.035 library to run this example.

Regards,

I2R D&T Team


 Attachments 
I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
August 20, 2009 10:52 PM PDT
Rate
 
#9 Reply to #8
Hi Ying Hu,

Did you managed to resolve the issue I raised?

Regards,

I2R D&T Team


Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
August 25, 2009 3:55 AM PDT
Rate
 
#10 Reply to #9
Quoting - I2R D&T Team
Hi Ying Hu,

Did you managed to resolve the issue I raised?

Regards,

I2R D&T Team

Hello I2R D&T Team

Sorry, I happen to notice your reply till now.Could you please tell me more informations, like
1. the IPP version and the library you are linking, your OS and CPU type? 
2. And how do you measure your observation, by task manager : cpu usage or other time measure funtions?

But generally, it is expected that high performance corresponding high cpu usage. What is your concerns about ippResizeSqrPixel? it take more times than ippiResize or CPU usage is high?  


Additionally, about the parallel code, if you are calling ipp 6.1 and dynamic library, the ippiResizeSqrPixel is threaded by OpenMP internally.  So in most cases, you don't need to write OpenMP parallel code to wrapper it again. IPP will disable the IPP internal thread if it detect external parallel region. Is there any reason for your create OpenMP thread by yourself?

Best Regards,
Ying


I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
August 25, 2009 7:34 PM PDT
Rate
 
#11 Reply to #10

Hi Ying Hu,

Thank you for your reply. I thought my concern will ends with no answer from here.

The program I am working uses IPP 6.1.1.035 and also links to OpenCV 1.0 for image retrieval. The OS and CPU I am using is Win XP 32bit, running on Intel Xeon E5430 2.66Ghz with 3GB memory. I observe the high CPU usage from task manager for the process I am using the example. The time measurement for the resize functions are calculated in the program.

My concern is the unusual high CPU usage for ippiResizeSqrPixel function that ippiResize function does not produce. I am testing using OpenMP and non-OpenMP version to test if the ippiResizeSqrPixel function does exhibits high CPU usage and indeed it does.

I understand that ippiResizeSqrPixel function takes over ippiResize function because it has more variety of resize techniques under one function. However, if the ippiResizeSqrPixel function exhibits high CPU usage, it is not attractive and practical to use. Performing image resizing is handy in many applications we are developing but it just one part of the whole operations running in parallel. Having high CPU usage for a single opeartion slows down significantly other parallel operations in our design. In this case, we have to switch back and use the deprecated ippiResize function.

As long as the ippiResize function is still supported in future IPP releases, this issue is just a concern. You understand my point here?

Regards,

I2R D&T Team


Vladimir Dudnik (Intel)
Total Points:
26,380
Status Points:
26,380
Black Belt
August 25, 2009 11:10 PM PDT
Rate
 
|Best Answer
#12 Reply to #11
Hello,

as Ying have mention before, high cpu usage usually corresponds to better performance. If you will count number of cpu clocks spent in ippiResize and ippiResizeSqrPixel you may notice that the second function takes less clocks to do the same work. That basically left more time for other processing you may do in parallel.

Note, if you do threading on top of IPP (i.e. application has several threads calling IPP functions in parallel) it may make sense to disable IPP internal threading with ippSetNumThreads(1) call to avoid threads oversubscription.

Regards,
  Vladimir


I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
August 26, 2009 8:09 PM PDT
Rate
 
#13 Reply to #12

Hi Vladimir,

Thank you for rehighlighting the point of high performance usually corresponds to high cpu usage.

If I may and interests to ask which software did you used to get the number CPU clocks executed by a specific function?

Regards,

I2R D&T Team


Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
August 27, 2009 12:51 AM PDT
Rate
 
#14 Reply to #13

Hi I2R D&T Team,

We usually use IPP function ippGetCpuClocks() to get CPU clocks.

for example,

 Ipp64u start = ippGetCpuClocks();
 for (int k=0;k<1000; k++)
     //ipplib->map_deprecatedCenter((byte*)img->imageData, img->widthStep, szSrc, destImg, nDestStride, szDest);
     ipplib->map_serial((byte*)img->imageData, img->widthStep, szSrc, destImg, nDestStride, szDest);

 // Sleep(99);
 //}
  Ipp64u end = ippGetCpuClocks();

    double ippCPE= double(end-start)/(szDest.width*szDest.height*3*1000.);
    double ippClock= double(end-start)/(double(pMhz)*1000. *100.0);
    printf( " IPP cpe = %g\n", ippCPE );
 printf( " IPP time = %gms\n", ippClock );

I did a test on my lapatop with your test program.  
Core2Duo-2.0GHz, T7300, 2 cores, 1.96RAM, linking 32bit dynamic library, ippi.lib ippcore.lib

ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809

map_deprecatedCenter :
 IPP cpe = 42.6039
 IPP time = 338.239ms
 calculating time = 338.413ms
 cpu usage= 50%, TAKE FULL ONE core

map_serial:

 IPP cpe = 32.0714
 IPP time = 254.62ms
 calculating time = 254.752ms
 cpu uage=100%, take full two core.

As we mentioned, the ippiReiszeSqrPixel is threaded internally by OpenMP, it will start 2 threads on 2 core machines automaticlly. You don't need to write thread code for calling the function. And if you'd like to reduce the CPU usage and use only one thread, you can set ippSetNumThreads (1) before call ippiReiszeSqrPixel().

There are some errors in the map_deprecatedCenter , i did a little modification and attach the modified code  here for your reference.

(For the same conditions we must specify the coordinates of the center in the middle of dst roi.

Otherwise it will be processed the non-whole image. I change

 x = (int)(szDest.width / 2.0);

 y = (int)(szDest.height / 2.0);)

Best Regards,
Ying



 Attachments 
j_miles
Total Points:
380
Status Points:
330
Green Belt
August 27, 2009 2:15 AM PDT
Rate
 
#15 Reply to #14
Hi Ying et al,

We would be using the non-threaded version due to higher-level threading in our software. What would the performance numbers be for ippiReiszeSqrPixel ('map_serial') when just using one thread (maxing out one core)?

In my view - and for our usage pattern - it is still essential that new functions replacing older ones also perform at-least comparable to (preferably better than) the functions they replace for single-threaded use. The use of multi-threading should be an option (as it also is) that just enables higher performance in the right usages.

Thanks.

- Jay


I2R D&T Team
Total Points:
145
Status Points:
95
Green Belt
August 27, 2009 10:59 PM PDT
Rate
 
#16 Reply to #14
Hi Ying Hu,

Thank you for sharing your views on the issue. Now I have a better idea how to make comparisons between similar ipp functions.

I do share the same thought as jay. This means setting manually the number of cpu cores to use in order have a better performance overall.

Regards,

I2R D&T Team


Ying H (Intel)
Total Points:
5,187
Status Points:
4,687
Brown Belt
August 28, 2009 12:58 AM PDT
Rate
 
#17 Reply to #16
Hi I2R D&T Team and Jay,

Right, you can set IPP threads manually based on your real application mode, for example, on 4 core machine, ippSetNumThreads (2) and leave 2 core for your other job. 

And in sirail mode (ippSetNumThreads=1), the ippiResizeSqrPixel function is comparable to the disprecated functions in performance.
for exmaple,

ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809
ippiResizeCenter:
 IPP cpe = 42.1132
 IPP time = 334.343ms
ippiResize:
 IPP cpe = 62.7621
 IPP time = 498.279ms
ippiResizeSqrPix
 IPP cpe = 54.6182
 IPP time = 433.622ms
 Press any key to continue . . .
ippiResizSqrPixel should corresponding to IPPiResize(shift). If processed region is the same, comparing the ippiResizeCenter,  the calculations are almost the same.

Regards,
Ying

 





Intel Software Network Forums Statistics

8442 users have contributed to 31549 threads and 100378 posts to date.
In the past 24 hours, we have 11 new thread(s) 34 new posts(s), and 47 new user(s).

In the past 3 days, the most popular thread for everyone has been /fpp interferes with breakpoints/stepping through code - again The most posts were made to Help with hitting maximum record length in the compiler with debug info? The post with the most views is You could save the pre-proce

Please welcome our newest member mrnm