June 10, 2009 9:15 AM PDT
Regard ippiResize* function usage
We recently noticed a common forum topic regarding ippiResize* function, I would like to share the following information with forum participants, please reply this thread if you have any additional comments.
First, start Intel IPP v6.0, there are several APIs deprecated including ippiResize(), please visit this article in Intel IPP Knowledge Base for more details.
Second, there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference. It also includes a C code sample on ippiResizeSqrPixel() usage.
Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width. It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.
Could you provide more information on ippiResizeSqrPixel's usage of an external buffer? They seem to be sufficiently large to require a separate allocation for every resized image, as opposed to using the destination buffer or the stack. The deprecated ippiResize did not require one; did it allocate internally, or was it more efficient in this regard?
Each language (C++, C#, Delphi, other), each platform has own memory manager. An user will control the allocation of memory in your application itself using the function with external buffer (ippiResizeSqrPixel). Therefore usage of such functionality is more preferable, than usage old (ippiResize) with internal allocation and clearing of the memory.
Dear Mr. Ying, I am new to Ipp community. I was using OpenCV for Image Processing software development. But, due to limitation of 8 bit(my image data is of 16 bit ), I started trying Ipp from last week with Trial version. I am using the function ippiFilterMedian_16u_C1R for median filter.
Regarding your suggestion, "Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width. It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.", I have a query for my 16 bit single channel data as set in my program. int dstStep = dstWidth * 2; // 1 WORD size = 2 BYTE, where dstWidth = srcWidth - (nKernelWidth - 1);
Am I doing correctly? Pls. suggest. The function build is OK and it is working fine.
We recently noticed a common forum topic regarding ippiResize* function, I would like to share the following information with forum participants, please reply this thread if you have any additional comments.
First, start Intel IPP v6.0, there are several APIs deprecated including ippiResize(), please visit this article in Intel IPP Knowledge Base for more details.
Second, there is a known issue on new image resize function ippiResizeSqrPixel() in current Intel IPP v6.0, please check this article "Resize function ippiResizeSqrPixel() crashed for small image" for more reference. It also includes a C code sample on ippiResizeSqrPixel() usage.
Additionally, you may take care of the parameter [src/dst]Stepr. Some unexepcted error may come from wrong step value. For example, the stepBytes may not always equal to channel*Width. It may be ((nChannel*srcWidth+3)>>2)<<2 if a bmp image with 4 bytes-aligned or a multiple of 32 when use ippiMalloc.
Could you tell how do you creat/load or store your dst image data ? for example, if you use OpenCV API cvCreateImage() to create a dst image,
int dstWidth=7; int dstHeight= 1; CvSize tempSize={dstWidth,dstHeight}; dst = cvCreateImage( tempSize, 16, 1 ); Then the row of dst is 4 byte aligned, thus dst->widthStep=16, not 7x2=14, so to call dst->widthStep may be safe here If you are using malloc() dst = (short *) malloc(dstWidth*dstHeight*sizeof(short)); Then the dstStep is dstWidth*2.
We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.
We have noticed that when calling the function ippResize or ippResizeCenter recursively and sequentially with small delay (< 100 ms) in between, uses less CPU usage as compared to ippResizeSqrPixel. The comparison is based on resizing 1280x1024 image to 877x602 image. We have tried using the single algorithm or parallel algorithm introduced in the documentation but it still uses alot of CPU usage in both algorithms. We cannot figure out what's wrong with ippResizeSqrPixel function.
Can anyone please enlighten on this issue?
Hello,
One quick thought, the delay may be related to the muil-thread (threads overhead). Could you please attach a small test code (especially the calll order and time measure) so we can reproduce the problem?
Galdly! ;-) Please locate the 2 C++ attached source files. I am using OpenCV 1.0 to load the image into buffer and run the various resize function in IPPLibTester class. I am using IPP 6.1.1.035 library to run this example.
Sorry, I happen to notice your reply till now.Could you please tell me more informations, like 1. the IPP version and the library you are linking, your OS and CPU type? 2. And how do you measure your observation, by task manager : cpu usage or other time measure funtions?
But generally, it is expected that high performance corresponding high cpu usage. What is your concerns about ippResizeSqrPixel? it take more times than ippiResize or CPU usage is high?
Additionally, about the parallel code, if you are calling ipp 6.1 and dynamic library, the ippiResizeSqrPixel is threaded by OpenMP internally. So in most cases, you don't need to write OpenMP parallel code to wrapper it again. IPP will disable the IPP internal thread if it detect external parallel region. Is there any reason for your create OpenMP thread by yourself?
Thank you for your reply. I thought my concern will ends with no answer from here.
The program I am working uses IPP 6.1.1.035 and also links to OpenCV 1.0 for image retrieval. The OS and CPU I am using is Win XP 32bit, running on Intel Xeon E5430 2.66Ghz with 3GB memory. I observe the high CPU usage from task manager for the process I am using the example. The time measurement for the resize functions are calculated in the program.
My concern is the unusual high CPU usage for ippiResizeSqrPixel function that ippiResize function does not produce. I am testing using OpenMP and non-OpenMP version to test if the ippiResizeSqrPixel function does exhibits high CPU usage and indeed it does.
I understand that ippiResizeSqrPixel function takes over ippiResize function because it has more variety of resize techniques under one function. However, if the ippiResizeSqrPixel function exhibits high CPU usage, it is not attractive and practical to use. Performing image resizing is handy in many applications we are developing but it just one part of the whole operations running in parallel. Having high CPU usage for a single opeartion slows down significantly other parallel operations in our design. In this case, we have to switch back and use the deprecated ippiResize function.
As long as the ippiResize function is still supported in future IPP releases, this issue is just a concern. You understand my point here?
as Ying have mention before, high cpu usage usually corresponds to better performance. If you will count number of cpu clocks spent in ippiResize and ippiResizeSqrPixel you may notice that the second function takes less clocks to do the same work. That basically left more time for other processing you may do in parallel.
Note, if you do threading on top of IPP (i.e. application has several threads calling IPP functions in parallel) it may make sense to disable IPP internal threading with ippSetNumThreads(1) call to avoid threads oversubscription.
I did a test on my lapatop with your test program. Core2Duo-2.0GHz, T7300, 2 cores, 1.96RAM, linking 32bit dynamic library, ippi.lib ippcore.lib
ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809
map_deprecatedCenter : IPP cpe = 42.6039 IPP time = 338.239ms calculating time = 338.413ms cpu usage= 50%, TAKE FULL ONE core
map_serial:
IPP cpe = 32.0714 IPP time = 254.62ms calculating time = 254.752ms cpu uage=100%, take full two core.
As we mentioned, the ippiReiszeSqrPixel is threaded internally by OpenMP, it will start 2 threads on 2 core machines automaticlly. You don't need to write thread code for calling the function. And if you'd like to reduce the CPU usage and use only one thread, you can set ippSetNumThreads (1) before call ippiReiszeSqrPixel().
There are some errors in the map_deprecatedCenter , i did a little modification and attach the modified code here for your reference.
(For the same conditions we must specify the coordinates of the center in the middle of dst roi.
Otherwise it will be processed the non-whole image. I change
We would be using the non-threaded version due to higher-level threading in our software. What would the performance numbers be for ippiReiszeSqrPixel ('map_serial') when just using one thread (maxing out one core)?
In my view - and for our usage pattern - it is still essential that new functions replacing older ones also perform at-least comparable to (preferably better than) the functions they replace for single-threaded use. The use of multi-threading should be an option (as it also is) that just enables higher performance in the right usages.
Right, you can set IPP threads manually based on your real application mode, for example, on 4 core machine, ippSetNumThreads (2) and leave 2 core for your other job.
And in sirail mode (ippSetNumThreads=1), the ippiResizeSqrPixel function is comparable to the disprecated functions in performance. for exmaple,
ippiv8-6.1.dll 6.1 build 137.20 6.1.137.809 ippiResizeCenter: IPP cpe = 42.1132 IPP time = 334.343ms ippiResize: IPP cpe = 62.7621 IPP time = 498.279ms ippiResizeSqrPix IPP cpe = 54.6182 IPP time = 433.622ms Press any key to continue . . . ippiResizSqrPixel should corresponding to IPPiResize(shift). If processed region is the same, comparing the ippiResizeCenter, the calculations are almost the same.