Let me give more data, organize them and provide more detail tosupport my conjecture about "Intel's problem".-Note: I report time ratio of 2 algorithms on different platforms andwith varying number of threads. The two algorithms are (parallel)SixSort and (parallel) Quicksort (my version of Quicksort because whatI found out there is not industrial strength). No, I do not do theMickey Mouse thing of sorting integers. Instead, I sort object;i.e. the arrays contain references to structs.-Single thread=========Comparison: SixSort versus QuicksortSingle thread / array size 16MMachine (Sixsort time)/(Quicksort time)AMD 64 X2 4000+ 2.1 GHz/ Linux 0.59Intel Celeron 1.07Ghz/ XP-Cygwin 0.79Intel quad core xeon 2.5Ghz/ Linux 0.81Intel Pentium M 1.73Ghz/ XP-Cygwin 0.83Intel MultiCore Testing Lab (MTL) 0.84Intel i3-2310M 2.1Ghz/ Win7-Cygwin 0.86-Two threads========Comparison: parallel SixSort versus single threaded Quicksorttwo threads / array size 16MAMD 64 X2 4000+ 2.1 GHz/ Linux 0.33quad core xeon 2.5Ghz/ Linux 0.50Intel i3-2310M 2.1Ghz/ Win7-Cygwin 0.51MTL 0.57-Many threads=========Comparison: Paralellized Quicksort against paralellized SixSortThe comparison is based on their time ratios against our sequentialQuicksortMachine: Intel's MTLArray size 16M# threads PQuicksort time ratio SixSort time ratio2 0.65 0.573 0.47 0.484 0.39 0.385 0.35 0.346 0.33 0.317 0.32 0.308 0.30 0.289 0.28 0.2710 0.27 0.27-This last test set is really bad because the superiority of (parallel)SixSort against (parallel) Quicksort is virtually wiped out.-Remember: theory predicts a SixSort/Quicksort single thread time ratioof 0.58.-My conjecture=========The only difference I see between (parallel) SixSort and (parallel)Quicksort is the size of their executables. Quicksort is tiny.SixSort (32 bit) has an executable of around 50Kb. Hence my guessthat the Intel boxes suffer from cache failures of the executable - tothe extend that this conjecture makes sense given my lack of knowledgeabout hardware.-If you folks want to test this conjecture, give me access to a (singlethreaded) box with a large enough cache for execute memory.-PS If you reply to me, please don't use the royal "we" because thatrubs us in the wrong way.
For more complete information about compiler optimizations, see our Optimization Notice.