According to my experience the standard Linux memory allocators dramatically affects the scalability of parallel program. This could become a barier on the way to scalable threaded applicationsfor future many-core CPUs. I believe using scalable allocators is a must for most applications but it is pretty hard to explicitly introduce them in a large amount of legacy code. Did you think of standardizing the scalable memory allocators and driving the OS developers to make such allocators a part of operating system?
For more complete information about compiler optimizations, see our Optimization Notice.