what is the inline-asm constraint for mask registers?

what is the inline-asm constraint for mask registers?

Hi,

simple question: what constraint does one use for mask registers in inline-asm? "k" does not work and "r" would be wrong.

More specific question: How does one efficiently test whether a mask returned from a __m512d compare is all true? ICC generates quite a lot of code for either the use of the _m512_kortestc intrinsic or a simple compare to 0xff. That's why I wanted to wrap this into a function that does the right thing via inline-asm, but without the constraint... (I still could make use of the constraint even if there is a good solution here that doesn't involve inline-asm.)

Cheers,

  Matthias

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc
6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Are you saying that you cannot say:

kortest %k1, %k2

in your asm code? (or k3, k4, k5, k6, k7 but not k0)

On related note, is it possible to post a complete assembler example, with asm and everything ?

Thank you !

This is what I'm looking for:


bool isFull(__mask8 k) {

  __mmask16 kk;

  asm("kmerge2l1l %[in],%[out]" : [out]"=k"(kk) : [in]"k"(k));

  return _mm512_kortestc(kk, kk);

}

In inline-asm, the constraints are the "k" or "=k" that you put in the operands list. Using %k1 explicitely in the asm string works just fine, of course.

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc

Someone much more knowledgeable than I showed me how to do this:

bool isFull(__mmask8 k) {
  __mmask16 kk;
  asm("kmerge2l1l %[in],%[out]" : [out]"=k"(kk) : [in]"k"((__mmask16)k));
  return _mm512_kortestc(kk, kk);
}

Ouch, alright this actually compiles. But the result is even worse than using the intrinsic for kmerge2l1l. Look at a __m512d compare with following isFull:

vcmpeqpd (%rsi),%zmm0,%k0
kmov   %k0,%edx
movzbl %dl,%edx
kmov   %edx,%k1
kmerge2l1l %k1,%k2
kmov   %k2,%ecx
kmov   %ecx,%k3
kortest %k3,%k3

What I want to have is this:
vcmpeqpd (%rsi),%zmm0,%k0
kmerge2l1l %k0,%k1
kortest %k1,%k1

It appears ICC won't let me do this. The problem is that ICC inserts the mov to GPR and back to mask register for any cast from __mmask8 to __mask16. Now, since even inline asm doesn't grok __mmask8 there's nothing I can do...

Vc: SIMD Vector Classes for C++ http://code.compeng.uni-frankfurt.de/projects/vc

Leave a Comment

Please sign in to add a comment. Not a member? Join today