Instruction format question

Instruction format question

Reading the "Knights Corner Instruction Set Reference Manual" I see that there are 32 vector registers where the x64 instruction set has only 16. The V register field and the R register field in the MVEX prefix are extended with an extra bit (V', R') to code the extra registers. But the B and X fields are not extended. How do you code register zmm16 - zmm31 in an instruction with three or more register operands? Is this impossible, or are you using some other bits, like the pp bits which are mostly unused anyway or the unused bit to the left of the pp bits? Maybe you are using the X bit, which is not needed anyway if there is no memory operand, to extend the B bits. Then the only limitation would be that registers zmm16 - zmm31 cannot be used with VSIB addressing. Are the extra bits inverted?

I would like to update my disassembler (named "objconv") to cover this instruction set so I need this info.

7 帖子 / 0 全新

Reading the manual more carefully, I see that the B bits are extended with the X bit when there is no memory operand.

The X bits are extended with one of the V bits when there is a VSIB memory operand, but it is not clear which of the V bits. Page 42 and 43 say VX, should that be V'X?

Page 42 says that the vector mask register is coded in the bits. Should that be MVEX.kkk?

The assembly syntax on p. 45 does not explain clearly how to indicate swizzle, etc. It says:
mnemonic vreg{masking modifier}, source1, transform_modifier(vreg/mem)
Perhaps that should be:
mnemonic vreg{masking modifier}, source1, vreg/mem{transform_modifier} ?

How are the JKZD and JKNZD instructions coded? No 0F escape code is indicated for the short jump version. Does that mean mmmmm=0? This is contradicted on page 44 saying mmmmm=0 will cause an exception. Is the mask register coded in the vvvv bits or is there a mod/reg/rm byte?

Thank you for catching these doc issues. I don't have hardware to test but I believe
1. in VSIB encoding, the index operand would be encoded with MVEX.V'X
2.There was a latent notation change that led to two different notation expressing the same feature. is the correct notation that replaces MVEX.kkk.
3. I believe the notation convention of transform_modifieris consistent with the table listed in pg 47
4. It turns out, the mmmm=0000 mapwas used to encode some of the scalar mask instructions.

With upated correction.

Thanks for your answers.

@3. The table on p. 47 refers to the kind of swizzle, not the actual value. Each entry in table 3.1 points to another table of 8 possible values. For example Sf32 refers to table 2.2 listing 8 possible values for register operands and table 2.4 listing 7 possible values for memory opeands. So I think the value to list in assembly would be e.g. zmm3{cdab}. The notation Sf32(zmm3) gives only the kind of swizzle, not the chosen value.

@4. The codes for JKZD and JKNZD are identical to the codes for JZ and JNZ with a VEX prefix added. JZ and JNZ have a short version without 0F and a near version with 0F. Neither have a mod/reg/rm byte. I think, for the sake of decoder efficiency, that the instructions with VEX prefix will have the same composition as the corresponding codes without VEX prefix. This would mean no 0F escape code and no mod/reg/rm byte for the short version. A VEX prefixed code without 0F escape code is unprecedented so we don't know what the value of mmmmm should be, but 0 is a logical guess. This information is missing in the manual.

And BTW, I have more questions:

5. Which CPUID bit indicates support for Knights Corner/MIC instructions? Does this instruction set have an official name yet?

6. How many zmm registers are there in 32-bit mode? I understand the the preferred mode is 64 bits, but the first line in chapter 2.4 page 36 says that 32-bit mode is also supported. The MVEX prefix is carefully designed to be compatible with 32-bit mode in the same way as the VEX prefixes. The R and X bits are not available in 32-bit mode because they are used for another instruction (BOUND) in 32-bit mode. So bit number 3 in the 5-bit register number is fixed at 0 in 32-bit mode, while bits 0,1,2 and 4 are free to be 0 or 1. So the possible register numbers in 32-bit mode are 0-7 and 16-23. This gives three possibilities in 32-bit mode:
a. 8 zmm registers named zmm0-zmm7
b. 16 zmm registers named zmm0-zmm7 and zmm16-zmm23
c. 16 zmm registers renamed to zmm0-zmm15
I would prefer c, but which one is correct?

7. There are rumors that Knights corner instructions will be supported in mainline Intel chips in the future, perhaps in Broadwell, and that SSE-AVX will be supported in a later generation of Knights. Can you comment on this or is it just unconfirmed rumors?

on item 3, your point is valid that some additional decorator will be needed by an assembler to fully conveny the programmer's intent to the machine. I haven't seen how binutil implements that vs. Intel tool chain. If past evolution shed any light, I think the tool writerswill make their own choiceon theplacement of that decorator in consideration ofthe evolution trailand historical legacy of respective tools.

I think it suffices to infer from the CPUID section of this doc that the instruction set support (addition/subtraction) in Knights Corner that are not covered by feature flags is captured by the Family/model.

The 32-bit mode question is treading into tech support scope outside of my interest. I should leave that for more qualified folks.

@5. Ooops! Back to the old days where there was no CPUID instruction and there were long discussions on the net about how to detect the CPU type safely! We can be pretty sure that the Knights Corner instructions will make it into the mainstream x86 ecosystem and be copied by other CPU vendors. Relying on CPU brand, family and model number is not gonna work because software and operating systems will need a list of known CPUs that support this instruction set and this list will never be up to date. I have done a lot of research on CPU dispatching and I have seen many bad examples where the dispatcher relies on CPU family numbers. For example, Mathcad version 15 uses an old version of Intel MKL library (version 7.2) that has CPU dispatching based on CPU family numbers. It gives the optimal code path for Intel processors family 0x0F and an inferior path for family 6. As you probably know, the Intel core processors have family number 6. Software is not updated as often as developers think, and a CPU dispatcher based on known CPUs is always gonna lag behind.

I consider it absolutely necessary that you implement a CPUID bit for the Knights Corner instruction set.

@6. Now I realize that it is not possible to have more than 8 zmm registers in 32 bit mode because of the dual function of the MVEX.X bit (used for bit #3 in some cases and bit #4 in other cases and not accessible in 32-bit mode). The manual should tell this, and also tell whether the registers zmm8-zmm31 are disabled in 32-bit mode or they are accessible in special cases, e.g. those cases where they are coded in the V bits.

The Knights Corner Instruction Set Reference Manual has been updated with a correction for the mmmm=0 question, but not the other questions.

I found one more possible typo in the manual:
On pages 640 - 654, the legacy instruction called PREFETCH0 should correctly be called PREFETCHT0, according to previous x86 manuals. Accordingly, vprefetch0, vprefetch1 and vprefetch2 might preferably be named vprefetcht0, vprefetcht1 and vprefetcht2 to match the names of the same instructions without VEX prefix.