PDEP/PEXT in Fortran

PDEP/PEXT in Fortran


is there any extension available to use the PDEP/PEXT bit operations in Fortran? Is there any plan to add bit manipulations like that to the Fortran standard?



7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The Fortran standard has lots of intrinsics for bit manipulation.I haven't heard of PDEP/PEXT - what are these?

Retired 12/31/2016


Steve Lionel (Intel) wrote:

The Fortran standard has lots of intrinsics for bit manipulation.I haven't heard of PDEP/PEXT - what are these?

Google yielded this article about these bit-level scatter/gather instructions, made available on Haswell CPUs: http://www.randombit.net/bitbashing/2012/06/22/haswell_bit_permutations....

Perhaps one should expect the Intel C compiler or the IPP library to support these instructions rather than the Fortran compiler? 

I found those pdep/pext instructions when looking for ways to scatter bits. Apparently there are available on certain Intel architectures (https://software.intel.com/en-us/node/514045). However, I did not find any equivalent intrinsic that does this bit operation for Fortran.

I'm using bit scattering operations inside an inner loop of a performance-critical section in my Fortran code, and I was wondering if using these instructions might speed up that section.

Are you running the program on a Haswell processor? Fortran doesn't have bit gather/scatter intrinsics. You could call out to an Intel C++ routine and use its instruction intrinsics (again for supported processors only.) 

Retired 12/31/2016

I'm not running the program on a Haswell processor myself, but large calculations are usually run on computer clusters, so I should take into account that it might be the case, and then the possible speed-up is definitely important.

So, if I understand correctly, I should create a Fortran function which calls a C function (using bind), then write a C function and compile that with the Intel C compiler and then link. I'm only wondering if in that case the function will be inlined?

Best Reply

The only reason to use a C routine is if you want to use the Haswell instruction intrinsics on a Haswell processor. If you aren't running on a processor with that instruction set, you'd just get a runtime error trying to use these intrinsics.

One could write a C routine that used "manual CPU dispatch", executing the Haswell instruction(s) if running on a capable processor or generic code if not. If you built this using Intel C++ and the /Qipo option, the optimizer might inline the call. It IS possible, through an undocumented feature, to use many (but not all) instruction intrinsics from Intel Fortran, but I have not studied these bit intrinsics to see if that's possible. You'd still have to detect the instruction set and execute generic code (which you would have to write.) Given that the randombit.net article provided generic C code, doing this bit (!) in C would make more sense to me.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today