I'm developing a multi-threading version for our product now. I used pthread as threading mechanism. I tested my developed 32bit binary on a 32bit machine with 8 cores. Yes, the performance improvement is almost as expected. However, when I tested my 32bit binary on 64bit machine with 16cores, the performance is even worse for 2 threads (the same for 64bit binary on 64bit machine). Then, I turned to use OpenMP for testing the case (32bit binary on 64bit machine), it showed that OpenMP has similar good result as my benchmark as 32bit binary on 32bit machine. Wow, now I'm confused that the benchmark showed that pthread has much larger overhead in 64bit machine!? How does that happen? Is there anything I should keep in mind to develop for 64bit machine? Is there any way I can futher improve the performance in 64bit machine? Any suggestion will be appreciated. Thanks a lot...