first of all excuse me please for my English.
I am develop some graphics processing application and one of parts of this application it is compressing images by JPEG codec.
I try to use uic JPEG codec from latest IPP 7.0.5 and obtain very good results on my middle entry desktop PC with Intel i5-2300 CPU.
Testing JPEG compression performance on 1920x1080 RGB24 source image show next results:
using 1 compression threads - approx. 55 fps
using 2 compression threads - approx. 95 fps
using 3 compression threads - approx. 95 fps
using 4 compression threads - approx. 91 fps
This results shown very good compression performance and also they shown that it is useless to make compression with more that 2 threads (but in all cases Task Manager shown that CPU cores was uploaded to 100% so in case 3 and 4 threads, they do useless work :-)).
But problem begins when I make tests on target server with dual Xeon E5620 CPUs.
Same testing program which run with same source image shown next results (if all threads run on 1 of CPU):
using 1 compression threads - approx. 37 fps
using 2 compression threads - approx. 46 fps
using 3 compression threads - approx. 50 fps
using 4 compression threads - approx. 51 fps
using more that 4 thread shown continuous slowly down results from 50 to 35-40 fps...
Also, if even one threads run on another CPU then results become more badly (slowdown approx. 10 fps).
Turning off HyperThreading in BIOS slightly improve results, they grow up on approx. 10 fps, but they still 2 times badly that on consumer i5 CPU... :-(
So, my question is, are this results expected and normal, or I make something wrong?
I am expected, that 2.5 times more expensive Xeon CPU will show best results that regular i5...
I am can understand performance slowdown when working threads run on different physical CPUs (memory acces issues and so on), but when I make test in same conditions on only one of Xeon CPU (using SetProcessAffinityMask) why they 2 times more slowly that i5?
E5620 has 12Mb cache and source 1920x1080 image only 6Mb, so whole image can be simply placed in cache...
Thank you in advance.
PS: I compile uic JPEG codec with latest Intel C++ Compiler XE for applications running on IA-32, Version 184.108.40.2068 Build 20111011 with /Ox /QaxSSE4.1 /QxSSE4.1 /Qparallel /Qopenmp switches.
My desktop PC run under Windows 7 professional 32 bit.
My server run under Windows Server 2008 R1 32 bit.
I use ippSetNumThreads() function to set number of processing threads.