Is CMPXCHG16B supported?

Is CMPXCHG16B supported?

From the Q&A http://software.intel.com/en-us/articles/intelr-xeon-phitm-coprocessor-f... it appears CMPXCHG16B is supported for the Xeon Phi.

However, compiling I get the following:

/tmp/icpc8hU1ksas_.s: Assembler messages:
/tmp/icpc8hU1ksas_.s:42: Error: `cmpxchg16b' is not supported on `k1om'

If it's not support, what alternatives are there for implementing lock-free algorithms on the Phi (can double width CAS instructions be implemented?)

 

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There is some confusion/conflicting information.

Appendix B.2 of the Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual (available on the Overview tab Intel® Xeon Phi™ Coprocessor Developer site) explicitly states CMPXCHG16B is not supported.

Let me check and get clarification. Please stand-by.

"If it's not support, what alternatives are there for implementing lock-free algorithms on the Phi  (can double width CAS instructions be implemented?)"

I don't think it is possible to synthesize a 16byte  CAS easily from other instructions (one could clearly use an internal lock, but that rather destroys the point!), so you have to make do with the 8byte version. I haven't implemented this (so take it as untested), but it seems as though it should be possible to use the 8byte CAS to atomically handle two 32bit offsets, so provided you are trying to deal with a pair of pointers into the same 4GB region, that might suffice.

@Jim - Thank you!

@Matt - The information I cited from Appendix B.2 is current/correct. The Q&A was based on early documentation that has since been corrected, and so has the Q&A now. Thank you for asking this question.

Will Knights Landing support CMPXCHG16B?

With Knights Corner limited RAM (16GB) one can use a pointer >> 3 in 2x4 bytes and stay within the memory capacity. However, this precludes using the extraneous bits in a pointer as flags (though one bit might be available).

You could also store half of a 64-bit pointer plus a 32-bit ABA in the first QWORD that is CMPXCHG8B'd and the second QWWORD containing the second half of the 64-bit pointer and a copy of the 32-bit ABA that is handled with a mov (cmovz). The modified DCAS only attempts a CMPXCHG8B when both ABA's were seen as equal. When equal, perform the CMPXCHG8B, then if success, write the second 8 bytes (else on failure, loop).

Jim Dempsey

Leave a Comment

Please sign in to add a comment. Not a member? Join today