SixSort is a faster and higher quality in-place sorter than
Quicksort. A complexity analysis suggests that the SixSort/Quicksort
time use ratio is 0.58 on large arrays.
A test on a 16M array yields the ratio 0.59 (against a best in class
Quicksort). On an AMD box.
SixSort can be parallelized easily, like Quicksort. Parallel SixSort
against Quicksort with two threads yields an excellent time ratio of
0.33. On an AMD box.
Tests on multiple Intel boxes give disappointing results. Here what
was observed on MTL:
Comparison: Sixsort versus 3-layered Quicksort/ single thread
Intel's MTL machine:
array size (Sixsort time)/(Quicksort time)
1M 0.92
2M 0.91
4M 0.87
8M 0.86
16M 0.84
Comparison: Sixsort versus 3-layered Quicksort/ two threads
Machine: Intel's multi-core lab/ Linux
array size (Sixsort time)/(Quicksort time)
1M 0.74
2M 0.71
4M 0.68
8M 0.59
16M 0.57
I did many more tests on other Intel boxes and on MTL with threads
upto 10. Similarly disappointing results.
Clay Breshears has suggested to get into the guts of the (serial/
parallel) execution to pinpoint a/the bottleneck. That scares me
because I have only anecdotal knowledge/ opinions about cpu
architecture.
At this point my best conjecture is that there are persistent cache
failures on Intel boxes because:
-- SixSort has way more code than tiny Quicksort
-- SixSort nibbles at 4 locations in the input array in contrast with
Quicksort at 2.
How do we go from here?



