I have tested parallelhashlist(a parallel hashtable that i have implemented) with four threads on a quad core and it's giving a perfect scaling on both reads and writes.

Also when i have done those benchmarks on parallelhashlist, there was not enough/much items organized as a self-balancing tree in the individual chains of the hashtable, so , almost all the items was found and inserted in O(1) , so the parallel part in the Amdahl equation was not much bigger compared to to the serial part. But you will notice in pratice that as soon as you will have more items on the chains of the Hash table , organized as self-balancing tree, with a worst case log(n) , the parallel part will become bigger in the Amdahl equation and you will have better performance and scalability than the numbers in the graph of the benchmarks , This is Gustafson's Law.

Alo i have done some scalability tests on my parallelsort library and i have come to the conclusion that parallel heapsort is better on scalability than parallel quicksort cause the P part (of the Amdahl equation) is bigger in parallel heapsort than in parallel quicksort, the parallel heapsort is doing more on the parallel part, it's why it scales better than parallel quicksort, but parallel quicksort is still faster than parallel heapsort and parallel merge sort on my tests on a quad core processor.

And about lockfree_mpmc( a lockfree fifo queue), i have done some tests and it's not scaling cause when you are using a single thread some variable are updated on the L1 cache but using multiple threads those variables are loaded from the L2 cache and it's more expensive to load them from the L2 cache.

But even though lockfree_mpmc is not scalable, you can increase the P (parallel) part by doing more of the same: Increase the volume of data processed by the P part (and therefore the percentage p of time spent in computing). This is Gustafson's Law and you will get more scalability.

For example i have used the IntToStr() function on each of the four threads (on a quad core) on my lockfree_mpmc test programs to convert from and integer to a string, so i have increased the P (parallel) part and i have got more scalability, this is Gustafson's Law, and you have to remember Gustafson's Law , this is very important.

## The parallel sca,lability tests ...

I have tested parallelhashlist(a parallel hashtable that i have implemented)

with four threads on a quad core and it's giving a perfect scaling on both

reads and writes.

Also when i have done those benchmarks on parallelhashlist,

there was not enough/much items organized as a self-balancing

tree in the individual chains of the hashtable, so , almost all the items

was found and inserted in O(1) , so the parallel part in the Amdahl

equation was not much bigger compared to to the serial part. But

you will notice in pratice that as soon as you will have more items

on the chains of the Hash table , organized as self-balancing tree,

with a worst case log(n) , the parallel part will become bigger in the

Amdahl equation and you will have better performance and scalability

than the numbers in the graph of the benchmarks , This is Gustafson's Law.

Alo i have done some scalability tests on my parallelsort library and i have come

to the conclusion that parallel heapsort is better on scalability than parallel quicksort

cause the P part (of the Amdahl equation) is bigger in parallel heapsort than in parallel

quicksort, the parallel heapsort is doing more on the parallel part, it's

why it scales better than parallel quicksort, but parallel quicksort is still

faster than parallel heapsort and parallel merge sort on my tests on a

quad core processor.

And about lockfree_mpmc( a lockfree fifo queue), i have done some tests

and it's not scaling cause when you are using a single thread some variable

are updated on the L1 cache but using multiple threads those variables are

loaded from the L2 cache and it's more expensive to load them from the L2 cache.

But even though lockfree_mpmc is not scalable, you can increase

the P (parallel) part by doing more of the same: Increase the volume of

data processed by the P part (and therefore the percentage p of time spent

in computing). This is Gustafson's Law and you will get more scalability.

For example i have used the IntToStr() function on each of the four threads (on

a quad core) on my lockfree_mpmc test programs to convert from and integer

to a string, so i have increased the P (parallel) part and i have got more scalability,

this is Gustafson's Law, and you have to remember Gustafson's Law ,

this is very important.

You can download my parallel libraries from

http://pages.videotron.com/aminer/

Sincerely,

Amine Moulay Ramdane.