About False Sharing

About False Sharing

Anyone Know False Sharing?
Recently I try to know relation between Hyper-Threading and user optimization.
I tested below source on IBM XSeries 225 which has two Xeon 2.4 GHz processors.
I thought that avoiding cache false sharing lifted up performance.
When I padd some data structure, my assumption came true.
But when I turned on Hyper-Threading in BIOS, performance went down.
To improve performance using Hyper-Threading, what factor must I use or change?
Will I increase number of thread?
system spec: H/W : IBM Xseries 225
OS : Redhat Linux 9
compiler : icc 8.0
reference site for source :


struct thread_param {
// 4*4 = 16 bytes
unsigned long thread_id;
unsigned long v;
unsigned long start;
unsigned long end;
// expand to 128 bytes to avoid false sharing
// (4 long + 28 padding)*4 = 128 bytes
int padding[12];
// 1024*1024
#define MAXLEN 1024*1024
#define NUM_PROC 4
int array[MAXLEN];
int count=0;
// example of false sharing
void* thread_fn(void* arg) {
struct thread_param *p = (struct thread_param*)arg;
int i;
for (i=0; ifor (p->v = p->start; p->v < p->end; p->v++)
array[p->v] += 1;
int main(int argc, char *argv[]) {
pthread_t tid[NUM_PROC];
struct thread_param thread_struct[NUM_PROC];
int i, interval;
struct timeval start, end, result;
if (argc < 2) {
printf("usage: false_none count
return 0;
count = atoi(argv[1]);
printf("False sharing testing begin... ");
printf("with FIX ");
printf("without FIX ");
total execution time for ");
for (i=0; iarray[i] = 1;
interval = MAXLEN/NUM_PROC;
for (i=0; i< NUM_PROC-1; i++) {
thread_struct[i].thread_id = i;
thread_struct[i].start = i * interval;
thread_struct[i].end = thread_struct[i].start + interval;
thread_struct[NUM_PROC - 1].thread_id = NUM_PROC;
thread_struct[NUM_PROC - 1].start = (NUM_PROC - 1) * interval;
thread_struct[NUM_PROC - 1].end = MAXLEN;
for (i=0; ipthread_create(&tid[i], NULL, thread_fn, &thread_struct[i]);
gettimeofday(&start, NULL);
for (i=0; ipthread_join(tid[i], NULL);
gettimeofday(&end, NULL);
timersub(&end, &start, &result);
printf("%ld sec, %ld usec
", result.tv_sec, result.tv_usec);

return 0;

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If your performance was reduced by turning on HT, running the same test with 2 threads, it doesn't look like a false sharing issue. To get an advantage from HT, you do usually need to increase the number of threads to match the number of logical processors. A significant reduction in performance is likely to be a scheduling problem. I don't know whether schedulers which work better with HT on dual CPU's are likely to come with distros incorporating 2.6 kernels.

Red Hat EL3 Update 2 is supposed to be the first stock linux distribution with improved dual processor HT scheduling.

Persepone -

As Tim pointed out, if you kept the same two threads when running under HT, the OS may have scheduled both threads onto the same physical processor (the two logical HT processors). This would result in a performance drop comapred to the dual-processor test without HT. Have you tired to run this with four threads on a dual-processor, HT-enabled system?

-- clay

Leave a Comment

Please sign in to add a comment. Not a member? Join today