I want know scope that memory access trace.

I want know scope that memory access trace.

I write following code.

================================================

#include
#include
#include
#include
#include

#define LOOP_COUNT 10000000
#define NUM_THREADS 2 /* should be an even number */

int th0_job = 0;
int th1_job = 0;

void* thread_body(void* arg)
{
int i;
int tid = *((int*)arg);

for (i = 0; i < LOOP_COUNT; ++i)
{
__tm_atomic

{
if (tid == 0)
{
th0_job += 1;
}
else
{
th1_job += 1;
}
}
/* end of atomic section */
} // end of for
return NULL;
}

int main(int argc, char* argv[])
{
int i;
int tids[NUM_THREADS];
pthread_t threads[NUM_THREADS];

/* spawn threads */
for (i = 0; i < NUM_THREADS; ++i)
{
tids[i] = i;
pthread_create(&threads[i], NULL, thread_body, (void*)&tids[i]);
}

for (i = 0; i < NUM_THREADS; ++i)
{
pthread_join(threads[i], NULL);
}

/* print out global counter */
printf("th0_job = %d, th1_job = %d\\n", th0_job, th1_job);
return 0;
}
==================================================

and then, compile and run.

And I got this itm.log

==================================================

STATS REPORT
THREAD TOTALS

Thread 0 : Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 86 407
BytesRead : 4 4.00 348 40001628
BytesWritten : 4 4.00 4 40000000

Transactions for thread 0

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 86 407
BytesRead : 4 4.00 348 40001628
BytesWritten : 4 4.00 4 40000000

Thread 1 : Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 5 34
BytesRead : 4 4.00 24 40000136
BytesWritten : 4 4.00 4 40000000

Transactions for thread 1

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 10000000
Retries : 0 0.00 5 34
BytesRead : 4 4.00 24 40000136
BytesWritten : 4 4.00 4 40000000

TRANSACTION TOTALS

Source is line 21 in function thread_body in /home/yu/stm/test.c

: Min Mean Max Total
Transactions : 20000000
Retries : 0 0.00 86 441
BytesRead : 4 4.00 348 80001764
BytesWritten : 4 4.00 4 80000000

GRAND TOTAL (all transactions, all threads)
: Min Mean Max Total
Transactions : 20000000
Retries : 0 0.00 86 441
BytesRead : 4 4.00 348 80001764
BytesWritten : 4 4.00 4 80000000
NUMBER OF STATS: 6

==================================================

thread 0 and thread 1 access each grobal variable (th0_job or th1_job).

But as you can see, each thread did many retries. why?

I think, there is not any conflict in __tm_atomic block. Because each thread just access each variable.

So I have some assumption.

Intel STM is not support Memory Trace for object. They use other way. (for example, trace one byte, one word(=4byte in 32bit), or several byte(64byte ?))

How can I solve this problem?

I want trace and detect for object unit. (sometimes just one variable, sometimes structure).

3 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
Best Reply

Hello,

The reason why you have aborts in this case is because th0_job and th1_job is close in memory. I guess the STM behind the Intel Compiler will protect an memory range (usually the cacheline size).
By the way, in a concurrent program (even without STM) you shouldn't do that because you will cause many false sharing conflicts.

Here your example with padding to avoid conflicts:

#include                                                                                                                                          
#include                                                                                                                                            
#include                                                                                                                                             
#include                                                                                                                                           
#include                                                                                                                                              

#define LOOP_COUNT 10000000
#define NUM_THREADS 2    /* should be an even number */

union {
int th0_job;
char padding[64];
} th0_data;      

union {
int th1_job;
char padding[64];
} th1_data; 

void* thread_body(void* arg)
{
        int i;
        int tid = *((int*)arg);

        for (i = 0; i < LOOP_COUNT; ++i)
        {
                __tm_atomic

                {
                        if (tid  == 0)
                        {
                                th0_data.th0_job += 1;
                        }
                        else
                        {
                                th1_data.th1_job += 1;
                        }
                }
                /* end of atomic section */
        }    // end of for
        return NULL;
}

int main(int argc, char* argv[])
{
        int    i;
        int tids[NUM_THREADS];
        pthread_t threads[NUM_THREADS];

        th0_data.th0_job = 0;
        th1_data.th1_job = 0;

        /* spawn threads */
        for (i = 0; i < NUM_THREADS; ++i)
        {
                tids[i] = i;
                pthread_create(&threads[i], NULL, thread_body, (void*)&tids[i]);
        }

        for (i = 0; i < NUM_THREADS; ++i)
        {
                pthread_join(threads[i], NULL);
        }

        /* print out global counter */
        printf("th0_job = %d, th1_job = %dn", th0_data.th0_job, th1_data.th1_job);
        return 0;
}

Patrick Marlier

Thanks for your answer.

I test many time, and I can find align size, 64byte. So i fixed my code, but I don`t know that reason.

Now I agree your opinion. The reason is cache line size.

Thanks again.

Faça login para deixar um comentário.