Bugs in Intrinsics Guide

215 posts / 0 new

The pseudo-code for _mm512_slli_epi64 shows that it only uses 8 bits of the imm8 argument (imm8[7:0]), but that doesn't seem accurate.  If that were true I would expect _mm512_scli_epi64(a, 1066) to have the same result as _mm512_srli_epi64(a, 42) (1066 & 255 == 42), but compilers will just zero the register (see https://godbolt.org/z/2thNFh).

If I provide the count as a command line argument so the compiler can't know the value the result for any value > 63 is all zeros.  Here is a quick test:

#include <immintrin.h>
#include <stdlib.h>
#include <stdint.h>
#include <stdio.h>

__m512i foo(__m512i bar, unsigned int j) {
    return _mm512_srli_epi64(bar, j);
}

int main(int argc, char** argv) {
  __m512i v = _mm512_set1_epi64(~UINT64_C(0));
  __m512i r = _mm512_srli_epi64(v, (unsigned int) atoi(argv[1]));
  printf("0x%llx\n", ((uint64_t*) &r)[0]);

  return 0;
}

That makes sense to me, since _mm512_srl_epi64 says it uses count[63:0].


@Evan N. The second argument of `_mm512_srli_epi64` must be a 8-bit immediate. This is a precondition, meaning that behavior is undefined if these conditions are not met. This follows from the `vpsrlq` instruction, to which the intrinsic corresponds. A good compiler would issue a compile time error if you specify a runtime value or a constant out of bounds.


It looks like there may be a typo in the latency for these instructions on Icelake:

_mm256_lddqu_si256

_mm256_loadu_si256

as it says the latency for the instruction is 7, when all pretty much all other processors (including AMD) the latency is ~1.


Hi, it looks like the Intrinsics Guide indicates dependency on the AVX-512F CPUID flag only for F instructions and VL instruction variants. However, sections 15.2.1, 15.3, and 15.4 of the arch manual (Intel 64 and IA-32 Architectures Software Manual, volume 1) require software check the F flag before checking ER, PF, CD, DQ, BW, or VL flags.

Am I correct in thinking the Intrinsics Guide is missing a few thousand F dependencies and that this is maybe an incompletely implemented workaround for the way the guide's AVX-512 group checkboxes work? The guide also seems to be missing the required OSXSAVE check for AVX, AVX2, and AVX-512.

Figure 15-5 of the manual does indicate table 2-2, which I presume this is a typo for table 15-2, and figures 15-4 and 15-5 appear to misspell OSXSAVE as OXSAVE. So the current manual probably isn't 100% correct either. I suspect 15.3 also needs updating for IFMA52, VPOPCNTDQ, BF16, BITALG, VBMI, VBMI2, VNNI, VP2INTERSECT. Since, presumably, those instruction groups also require checking OXSAVE, F, and (at 128 and 256 bit width) VL flags. 4FMAPS and 4VNNIW are also missing but might fit better in 15.2.1.


Quote:

albert, tomas wrote:

It looks like there may be a typo in the latency for these instructions on Icelake [] as it says the latency for the instruction is 7

Yes that is incorrect. You can find the correct latency numbers here: https://software.intel.com/content/www/us/en/develop/download/10th-gener... . The intrinsic guide will be updated to match that.


Quote:

Matthias Kretz wrote:

There's a bug either in ICC or the documentation. Consider https://godbolt.org/g/LYJjM2. The documentation for _mm_mask_mov_ps says "dst[MAX:128] := 0". The comments in the test case expect this behavior.

I don't think the test case shows this. The test case doesn't capture what _mm_mask_mov_ps does with the upper bits. Because it tries to read those upper bits with _mm512_castps128_ps512 but it is documented to have undefined values for the upper bits. And I don't think there is any way to get to dst[MAX:128] bits of a __m128 variable. Therefore it is irrelevant what _mm_mask_mov_ps does to the upper bits.


Quote:

Roland S. (Intel) wrote:

You can find the correct latency numbers here: https://software.intel.com/content/www/us/en/develop/download/10th-gener... .

That link doesn't seem to work.

 


Hi,

The website for the Intrinsic Guide seems to be broken (It's stuck in "loading" the intrinsics). I'm using Chrome 78. I'm not web developer but I think the problem is in the perf.json and perf2.json files which cannot be executed as javascript files (due to the .json extension I think). This is the error message when inspecting the website:

Refused to execute script from 'https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/perf2...' because its MIME type ('application/json') is not executable, and strict MIME type checking is enabled.

I think it can be solved by changing the extension of the files (perf and perf2) to .js and changing the .html file accordingly as well.

Best.


Hey, the guide is not working at all today. I checked Chrome & Edge. Development console contains the following error:

 

Refused to execute script from 'https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/perf....' because its MIME type ('application/json') is not executable, and strict MIME type checking is enabled.
 


https://software.intel.com/sites/landingpage/IntrinsicsGuide/  being broken is absolutely annoying! Please, would someone try to fix this asap.


The Intrinsics Guide not work now, I can't search intrinsics there now, no error message poped up. I remember it was work last week. My browser is Chrome 83.0.4103.61

 


Still doesn't work today, it seems that no one is in charge.

Quote:

Osiv, Oleksiy wrote:

Hey, the guide is not working at all today. I checked Chrome & Edge. Development console contains the following error:

 

Refused to execute script from 'https://software.intel.com/sites/landingpage/IntrinsicsGuide/files/perf....' because its MIME type ('application/json') is not executable, and strict MIME type checking is enabled.
 


Hello.... Intel developers...!

The Intrinsics guide is STILL not functioning properly.  Yes, the javascript that was broken for a week has been fixed, so thank you for that, but the architecture performance figures are still not working, due to the strict MIME type checking failure on the .json files, as has previously been reported.  Please FIX this problem, and implement protocols to ensure that the guide is properly tested and not left hanging in a broken state for a week without anyone at Intel bothering to care...

Thank you!


Hello, Intel developers,

Thank you for fixing the JavaScript to restore partial functionality of the Intrinsics Guide.  Unfortunately, there are still errors with the JSON files due to browsers' strict MIME type checking (as previously reported) and thus the latency & throughput stats are still not working.  Please fix these ASAP. 

Thanks!

Pages

Leave a Comment

Please sign in to add a comment. Not a member? Join today