I want know scope that memory access trace.

I want know scope that memory access trace.

I write following code.

================================================

#include
#include
#include
#include
#include

#define LOOP_COUNT 10000000
#define NUM_THREADS 2 /* should be an even number */

int th0_job = 0;
int th1_job = 0;

void* thread_body(void* arg)
{
int i;
int tid = *((int*)arg);

for (i = 0; i < LOOP_COUNT; ++i)
{
__tm_atomic

{
if (tid == 0)
{
th0_job += 1;
}
else
{
th1_job += 1;
}
}
/* end of atomic section */
} // end of for
return NULL;
}

int main(int argc, char* argv[])
{
int i;
int tids[NUM_THREADS];
pthread_t threads[NUM_THREADS];

/* spawn threads */
for (i = 0; i < NUM_THREADS; ++i)
{
tids[i] = i;
pthread_create(&threads[i], NULL, thread_body, (void*)&tids[i]);
}

for (i = 0; i < NUM_THREADS; ++i)
{
pthread_join(threads[i], NULL);
}

/* print out global counter */
printf("th0_job = %d, th1_job = %d\\n", th0_job, th1_job);
return 0;
}
==================================================

and then, compile and run.

And I got this itm.log

==================================================

STATS REPORT
THREAD TOTALS

Thread 0 : Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 86 407
BytesRead : 4 4.00 348 40001628
BytesWritten : 4 4.00 4 40000000

Transactions for thread 0

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 86 407
BytesRead : 4 4.00 348 40001628
BytesWritten : 4 4.00 4 40000000

Thread 1 : Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 5 34
BytesRead : 4 4.00 24 40000136
BytesWritten : 4 4.00 4 40000000

Transactions for thread 1

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 5 34
BytesRead : 4 4.00 24 40000136
BytesWritten : 4 4.00 4 40000000

TRANSACTION TOTALS

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 20000000
Retries : 0 0.00 86 441
BytesRead : 4 4.00 348 80001764
BytesWritten : 4 4.00 4 80000000

GRAND TOTAL (all transactions, all threads)
: Min Mean Max Total
Transactions : 20000000
Retries : 0 0.00 86 441
BytesRead : 4 4.00 348 80001764
BytesWritten : 4 4.00 4 80000000
NUMBER OF STATS: 6

==================================================

thread 0 and thread 1 access each grobal variable (th0_job or th1_job).

But as you can see, each thread did many retries. why?

I think, there is not any conflict in __tm_atomic block. Because each thread just access each variable.

So I have some assumption.

Intel STM is not support Memory Trace for object. They use other way. (for example, trace one byte, one word(=4byte in 32bit), or several byte(64byte ?))

How can I solve this problem?

I want trace and detect for object unit. (sometimes just one variable, sometimes structure).

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.
Best Reply

Hello,

The reason why you have aborts in this case is because th0_job and th1_job is close in memory. I guess the STM behind the Intel Compiler will protect an memory range (usually the cacheline size).
By the way, in a concurrent program (even without STM) you shouldn't do that because you will cause many false sharing conflicts.

Here your example with padding to avoid conflicts:

#include                                                                                                                                          
#include                                                                                                                                            
#include                                                                                                                                             
#include                                                                                                                                           
#include                                                                                                                                              

#define LOOP_COUNT 10000000
#define NUM_THREADS 2    /* should be an even number */

union {
int th0_job;
char padding[64];
} th0_data;      

union {
int th1_job;
char padding[64];
} th1_data; 

void* thread_body(void* arg)
{
        int i;
        int tid = *((int*)arg);

        for (i = 0; i < LOOP_COUNT; ++i)
        {
                __tm_atomic

                {
                        if (tid  == 0)
                        {
                                th0_data.th0_job += 1;
                        }
                        else
                        {
                                th1_data.th1_job += 1;
                        }
                }
                /* end of atomic section */
        }    // end of for
        return NULL;
}

int main(int argc, char* argv[])
{
        int    i;
        int tids[NUM_THREADS];
        pthread_t threads[NUM_THREADS];

        th0_data.th0_job = 0;
        th1_data.th1_job = 0;

        /* spawn threads */
        for (i = 0; i < NUM_THREADS; ++i)
        {
                tids[i] = i;
                pthread_create(&threads[i], NULL, thread_body, (void*)&tids[i]);
        }

        for (i = 0; i < NUM_THREADS; ++i)
        {
                pthread_join(threads[i], NULL);
        }

        /* print out global counter */
        printf("th0_job = %d, th1_job = %dn", th0_data.th0_job, th1_data.th1_job);
        return 0;
}

Patrick Marlier

Thanks for your answer.

I test many time, and I can find align size, 64byte. So i fixed my code, but I don`t know that reason.

Now I agree your opinion. The reason is cache line size.

Thanks again.

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui