How accurate are your benchmark results? Do you get the same time measurements for the same program or do they differ much?
The times differs by up to 0.3 s on a 40-cores HT machine, using 40 worker threads for our solution. I want to know if OpenMP is the reason for this variation or the testing machine. Measure small improvements with these erratic results isn't easy.