Software support for Transactional Memory

Last week I attended Intel's annual Software Enabling Summit in Anaheim. This is a worldwide gathering of Intel's software engineers charged with ensuring that the world's software takes best advantage of Intel processor and platform features.

(Sidebar: My wife thought it was really funny that we had a whole conference about "enabling", and suggested that I was now working with families and friends of those with addictions. No, not that kind of enabling, Deb.)

I was at SES 2007 to give a talk about OpenSolaris and help educate our enabling folks with what's available there. In the bargain, I decided to attend some of the other talks and learn about other useful technologies. One such talk I attended was on Transactional Memory or TM. Here are a couple of notes:

    • Usually when you thread your code, you need to protect access to shared data between threads. These critical sections are usually by calling on a lock or mutex.

    • The problem is that you often need to acquire a lock often when you don't need it – you are accessing memory which isn't shared between threads or you acquire locks for readers which are not required.

    • The idea behind TM is to change the threading paradigm, and instead wrap your critical sections in code which marks it as TM.

    • TM is based on the idea that most locks really are never contested, so a round trip to acquire and then release these uncontested locks are wasted. In other cases, you might only have readers accessing the data, and so you again, any locks acquired/released are wasted effort. Wouldn't it be nice to have the system automatically detect this, as if the memory itself was transactional in nature.

    • To support this new threading model, the C/C++ language needs to be extended to add TM sections.

    • Semantics proposed for TM sections in code.:

        • Area of the code is atomic relative to other threads which touch the shared data

        • Locks which are not needed are not acquired – instead, back out transactions if there is a conflict

        • Functions can be marked as tm_callable, which means that they will be backed out if there was a conflict

        • Tm_waveable allow you to show that you don't care about backing out a function call (like a debug printf)

    • There is a prototype compiler implementing these semantics, and it's available in

I applaud the engineers working on this idea for trying to improve the lot of programmers who struggle to get their code threaded, performant and correct. I do have a few observations about this approach:

    1. The biggest challenge with threading code is correctness – making sure that you have protected the right shared locations at the right times. It seems like this TM model doesn't materially affect this problem. If you forget to use locks around a critical section, won't you be likely to forget to put it in a TM section as well?

    1. The other big challenge in threading is scalable performance – how do I decompose my problem to thread it in the first place? Do I do functional decomposition or data decomposition? Make the wrong choice, or pick the wrong level of granularity and your performance will not increase with added threads and it could slow down. Worse, if your code is moved to a system with more or less hardware threads available, the performance could change in undesirable ways. Unfortunately, again TM doesn't help you with this decision process. It might possibly mitigate some of the worst problems.

    1. I'm often frustrated that our first implementation is with C/C++. I know that these are workhorse languages where performance matters, but it seems like Java or even Perl would be a good choice to impact more developers.

    1. I can see some savings in avoiding unnecessary locks. In this case, I can see the opportunity in the performance area.

Check out the code at – maybe I am too pessimistic and this is the holy grail of threading.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

anonymous's picture

As Intel and AMD continues to increase number of cores, the mutex synchronization is just not going to scale. There has to be an alternative way of handling critical sections. I recently discussed some of my experience in this post:

In my experience in a complex SMT based software system scaling beyond 4 threads in a typical C++ application is difficult. Of course one can delegate work to other other processes and achieve better scalability but that adds a level of complexity which may not be desired in lots of applications.

I am not sure if STM is holy grail but at its surface it seem to have some promise.

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.