Intel compilers lack an intrinsic to access the underlying rotate instruction. We are having trouble crafting a portable circular rotate that executes in near constant time, does not violate the standards and is guaranteed to reduce to one machine instruction.
First, the naive implementation for left rotate on a 32-bit unsigned int:
unsigned int y = (x << n) | (x >> (32 - n));
The code above suffers undefined behavior when n = 0. So a second attempt:
unsigned int y = n ? ((x << n) | (x >> (32 - n))) : x;
The above introduces a branch, which is less efficient. Additionally, it may offer an attacker the ability to perform timing analysis in security sensitive and high integrity software.
GCC identified the issue and decided to recognize the following pattern (cf., "Poor optimization of portable rotate idiom", http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57157):
unsigned int y = (x << n) | (x >> (-n & 31));
My question is, what circular rotate pattern does ICC/ICPC recognize to ensure the C/C++ statements are reduced to a single ASM instruction?