Let me give more data, organize them and provide more detail to support my conjecture about "Intel's problem". - Note: I report time ratio of 2 algorithms on different platforms and with varying number of threads. The two algorithms are (parallel) SixSort and (parallel) Quicksort (my version of Quicksort because what I found out there is not industrial strength). No, I do not do the Mickey Mouse thing of sorting integers. Instead, I sort object; i.e. the arrays contain references to structs. - Single thread ========= Comparison: SixSort versus Quicksort Single thread / array size 16M Machine (Sixsort time)/(Quicksort time) AMD 64 X2 4000+ 2.1 GHz/ Linux 0.59 Intel Celeron 1.07Ghz/ XP-Cygwin 0.79 Intel quad core xeon 2.5Ghz/ Linux 0.81 Intel Pentium M 1.73Ghz/ XP-Cygwin 0.83 Intel MultiCore Testing Lab (MTL) 0.84 Intel i3-2310M 2.1Ghz/ Win7-Cygwin 0.86 - Two threads ======== Comparison: parallel SixSort versus single threaded Quicksort two threads / array size 16M AMD 64 X2 4000+ 2.1 GHz/ Linux 0.33 quad core xeon 2.5Ghz/ Linux 0.50 Intel i3-2310M 2.1Ghz/ Win7-Cygwin 0.51 MTL 0.57 - Many threads ========= Comparison: Paralellized Quicksort against paralellized SixSort The comparison is based on their time ratios against our sequential Quicksort Machine: Intel's MTL Array size 16M # threads PQuicksort time ratio SixSort time ratio 2 0.65 0.57 3 0.47 0.48 4 0.39 0.38 5 0.35 0.34 6 0.33 0.31 7 0.32 0.30 8 0.30 0.28 9 0.28 0.27 10 0.27 0.27 - This last test set is really bad because the superiority of (parallel) SixSort against (parallel) Quicksort is virtually wiped out. - Remember: theory predicts a SixSort/Quicksort single thread time ratio of 0.58. - My conjecture ========= The only difference I see between (parallel) SixSort and (parallel) Quicksort is the size of their executables. Quicksort is tiny. SixSort (32 bit) has an executable of around 50Kb. Hence my guess that the Intel boxes suffer from cache failures of the executable - to the extend that this conjecture makes sense given my lack of knowledge about hardware. - If you folks want to test this conjecture, give me access to a (single threaded) box with a large enough cache for execute memory. - PS If you reply to me, please don't use the royal "we" because that rubs us in the wrong way.
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.