Recommendations for Intel(r) Transactional Synchronization Extensions have been published

Recommendations for Intel(r) Transactional Synchronization Extensions have been published

Hi,

Chapter 12 of the most recent (June 2013) "Intel 64 and IA-32 Architectures Optimization Reference Manual" contains enabling and tuning recommendations for Intel(r) Transactional Synchronization Extensions in the 4th generation Intel(r) Core(tm) processor family.

Best regards,

Roman

12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

>>Chapter 12 of the most recent (June 2013) "Intel 64 and IA-32 Architectures Optimization Reference Manual" contains
>>enabling and tuning recommendations for Intel(r) Transactional Synchronization Extensions in the 4th generation
>>Intel(r) Core(tm) processor family.

Thanks for the information, Roman!

Sergey,

you are welcome! Maybe you find this list of Intel TSX-related resources useful: http://www.intel.com/software/tsx

Thanks,

Roman

Roman,

I wish to present a simplified example and ask would the following code properly function using RTM?

Assume you have a ring buffer of size size, fill index fill, empty index empty.
Assume size of buffer is at least 1 larger than number of entries that can be placed into the buffer.
Assume you wish to assure a push(x) appears atommicly to insert data and advance the fill index.
Assume you wish to assure a pop() appears atommicly to extract data and advance the empty index.

void* buffer[size];
__declspec( align( CACHE_LINE_SIZE) ) volatile size_t fill = 0;
__declspec( align( CACHE_LINE_SIZE) ) volatile size_t empty = 0;
void push(void* p) {
  while(true) {
    if(_xbegin() == _XBEGIN_STARTED) {
      buffer[fill++ % size] = p;
      _xend(); // commit
      return;
    }
    // here when (_xbegin() != _XBEGIN_STARTED)
    _mm_pause(); // N.B. design assures # pointers .lt. buffer size
  } //  while(true)
} // void push(p)
void* pop() {
  while(true) {
    if(_xbegin() == _XBEGIN_STARTED) {
      if(empty != fill) {
        void* p = buffer[empty++ % size]; // get data
        _xend(); // commit
        return p;
      } if(_xbegin() == _XBEGIN_STARTED)
      // here when (empty == fill), iow empty buffer
      _xend(); // commit (release the _xbegin doing no transaction)
    }
    // here when (_xbegin() != _XBEGIN_STARTED), or empty buffer
    _mm_pause(); // N.B. design assures # pointers .le. buffer size
    // or use sched_yield(), sleep(0), etc...
  } //  while(true)
} // void* pop()

Notes:

When the fill and empty pointers are cache aligned (in seperated cache lines), then push(p) cannot be aborted by pop() or push(p) issued concurrently by other thread. Presumably the hardware is sophistcated enough for when multiple concurrent push(p)'s are in progress that one thread will always succeed (provided no other activity messes with referenced cache lines). pop() could be aborted by concurrent push(p), and concurrent pop()'s without concurrent push(p) would always have one winner (provided no other activity messes with referenced cache lines).

When the fill and empty pointers are not cache aligned and in same cache line, then concurrent push(p) and pop() could abort either the push or pop, but at least one would not abort (provided no other activity messes with referenced cache lines).

Note, "messes with referenced cache lines" includes the cache line written to by "buffer[fill++] = p;" in push(p).

Are my notes correct?

Jim Dempsey

www.quickthreadprogramming.com

Follow-up question relating to pop() in above example:

As written, when the ring buffer is empty, the pop() is performing _xbegin() and _xend() but (when buffer empty) does not modify memory. Will this _xbegin()/_xend() disrupt the _xbegin() of a concurrent push(p) that is secondary to issue the _xbegin()?

Jim Dempsey

www.quickthreadprogramming.com

Hi Jim,

in general RTM does not guarantee that any particular transaction will always succeed (also if retried for many times). For example if there is a page fault on the first access to your buffer -> transaction aborted. Retries do not help because they hit the same issue. Therefore you always need a non-transactional fall-back path (a lock for example): http://software.intel.com/en-us/blogs/2012/11/06/exploring-intel-transactional-synchronization-extensions-with-intel-software

PS: is line 22 correct? "} if(_xbegin() == _XBEGIN_STARTED)" It seens that xbegins do not match with xends.

Roman

Quote:

Follow-up question relating to pop() in above example:

As written, when the ring buffer is empty, the pop() is performing _xbegin() and _xend() but (when buffer empty) does not modify memory. Will this _xbegin()/_xend() disrupt the _xbegin() of a concurrent push(p) that is secondary to issue the _xbegin()?

It may abort because the fill variable belongs both to the write set of the push transaction and to the read set of the pop transaction.

Quote:

When the fill and empty pointers are cache aligned (in seperated cache lines), then push(p) cannot be aborted by pop() or push(p) issued concurrently by other thread.

push can be aborted by pop because at least of the conflict on the fill variable (write-read).

push may have at least a write-write conflict with a different push on the fill variable.

The "//" seemed to get clobbered after the } on line 22 (made comment into statement)

Good point about page fault, revised code then should have the while(true) loop touch the locations such that the lines are paged in if necessary (small probability they could get paged out between touch and _xbegin())

Jim Dempsey

www.quickthreadprogramming.com

Hi Roman,

>>Chapter 12 of the most recent (June 2013) "Intel 64 and IA-32 Architectures Optimization Reference Manual" contains
>>enabling and tuning recommendations for Intel(r) Transactional Synchronization Extensions in the 4th generation
>>Intel(r) Core(tm) processor family...

I've looked at it and there are about 8 pages in total. My question is: Are there some code examples related to Intel TSX technology?

Thanks in advance.

Hi Sergey,

I counted 28 pages in Chapter 12 :-) . Section 12.3 has code examples for lock elision with Intel TSX

For further information you can look into www.intel.com/software/tsx (subscribe for page updates)

Best regards,

Roman

Hi Roman,

>>I counted 28 pages in Chapter 12 :-) . Section 12.3 has code examples for lock elision with Intel TSX
>>
>>For further information you can look into www.intel.com/software/tsx (subscribe for page updates)

Thanks and it looks like I missed something. I'll review what version Intel SDM I was using ( possibly a different one ).

Leave a Comment

Please sign in to add a comment. Not a member? Join today