Is your memory management multi-core ready?

Recently I have got a workload that could not scale beyond a few cores. This particular application is using one thread per user, so theoretically, if one has an 8-core machine then 8 concurrent users should fully utilize the machine giving 8x speedup compared to a sequential run.

It did not happen. At most two cores have been utilized, the query throughput speedup was even smaller.

