STM collision metrics

STM collision metrics

I converted a loop to use a __tm_atomic section within an OpenMP parallel for loop. The code actually ran significantly slower with 8 threads on an 8 core system than the original unthreaded loop, which suggests there are a lot of collisions happening.

It would be very useful to have a runtime diagnostic
that tells me how many times there is a collision detected and rolled back. Is there such a tool in existence or planned?

What happens if there is a collision? Does it explicitly
serialize all the operations on retry, or does it requeue the operations for
another parallel attempt?

One
last question if there is a partial collision, ie 8 threads modify
values but only 2 of the threads happen to collide, is it able to commit the other 6 and retry just the
two that collide, or are they all rolled back?

Thanks,
Martin

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In case of conflicts, one of the threads will retry without serializing the other threads.

Threads which do not have conflicts continue to execute and commit.

Martin,

There are a number of possible reasons why your parallelized loop runs more slowly

  • the cost of starting and committing transactions could dominate if the loop body is small
  • the overhead of executing inside a transaction (where memory accesses to shared variables are converted into calls to the run time library) could slow down the code so much that you can't regain performance until you run on more than the 8 cores you have
  • (as you pointed out) conflicts could cause wasted work

If you measure the performance of your loop using the TM version, but running on one core (put omp_num_threads(1)on the omp parallel for or omp parallel), then you can measure the effect of the first two issues independently of any contentions (since one thread can't contend with itself :-)

We do have a statistics collecting version of the STM runtime, (which would tell you about conficts on a per-transaction basis); we need to work out if we can release that, but will certainly consider your post as a request to do so.

In terms of contention semantics, a contention only affects the dynamic transactions which have the conflict (so other concurrent executions of the same lexical transaction which have different read and write sets are not affected), and only one of the pair of contending transactions is aborted and retried. (If you aborted them both you could end up with a livelock...).

So in your example only one dynamic transaction would be retried.

HTH

-- Jim

Jim,

Thanks for the response. From further testing it is slower even single threaded, so it looks like one or both of the first two conditions is being hit. Then there is an additional slowdown with multiple threads, so I guess I hit the third case too. :-(

Would it be possible for someone to post a simple threaded example here that shows good performance improvements with the STM compiler?

Martin

Since _ITM_error() causes the application to exit, it seems clear that the time is not really in _ITM_error(), but rather that the profiler is mis-attributing time spent elsewhere.

Login to leave a comment.