Hi guys! :)
I'm on a project concerning parallel programming. I need to solve the nqueens problem, in a parallel way.
I got two different versions of the parallel solution, all working.
But while i was testing it at the C.S. department of my university (Univeristy of Pisa) with a 2xQuadCore Xeon E5400, i notice a "strange" behave of my code: setting the task_scheduler_init n, with a n bigger then the total hardware threads avaible, I end up in a improvement of the performance in repect to the execution with n=hardware threads.
More in detail, I obtain as results:
[.@.]$ ./nQueens -d man 16 32
134.097 seconds with man implemenation
[.@.]$ ./nQueens -d man 16 16
130.522 seconds with man implemenation
[.@.]$ ./nQueens -d man 16 8
150.418 seconds with man implemenation
where "-d man" is the argument that says: "use man implementation", "16" is the chessboard size and the last integer is the number of tasks.
I was wondering if there's something "anormal" in this behave.
I think that setting the number of tasks equal to the chessboard size is going to give less work to the task scheduler inside TBB, leaving the work of scheduling them to the OS, it's possible?
Anyway i attach my current work, if you want to check it or test on your machines.
You can compile it by typing:
g++ -o [outputfile] *.cpp -ltbb
and for run it:
./[outputfile] -d man [chessboard size] [# tasks]
You can choose also other implementations, that are made using a simple "parallel_for" template calling:
These 2 implementations are almost the same exept from the different kind of mutex used.
Thanks in advance for any suggestion or confirmations about my insights