Intel ISA Extensions

New extension needed for Maps and Sets


In current SW lot of time every app is spending walking Maps and Sets (besides arrays, those are most often used data structures). I think this is place where CPU can provide enormous acceleration with specialized design and instructions for these data sets. Here is one idea how to do it:

Encodings for instructions with {sae} are unclear in the doc

Chapter 4.6 indicates that EVEX.L'L is encoded for the vector length, and that {sae} is supported for all vector lengths.

However, the various instruction pages, such as VCMPPD, only show {sae} for 512-bit vectors.  Furthermore, the E2 #UD equations indicate that EVEX.L'L must be 10b (VL=512).

Processor Trace decoding support library for Atom

Dear Intel guru,

Could I ask will libipt on github support decoding small-core (Atom) processor trace packets (pt pkt)?
Or is already supported in other commercial  product like PAL (Platform Analysis Library)?

I found that the Intel SDM documented the ia-core pt pkt format and atom processor (Cherry Trail) use another packet format documented in real-time-instruction-trace-atom-reference.pdf.

If I am wrong about there are two pkt format among ia and small core, please correct me.



Thank you!

Ooops - wrong instruction description in volume 2 of the SDM

Looking at the new version of Volume 2 of the SDM (document 325383-055), I just noticed that the "Description" field for the VINSERTF128 instruction (page 4-514) is incorrect.  It appears to have been copied (with some modification) from the description of the VINSERTPS instruction (which is described with the INSERTPS instruction on page 3-422), but it should be almost identical to the description of the VINSERTI128 instruction on page 4-515.

MPX instructions not in the Appendix A opcode map


In the last release 55  of  Intel® 64 and IA-32 Architectures Software Developer’s Manual in Vol 2C A-11, we can't see MPX instructions. In fact, I usually use opcode maps to find instructions encoding. I am not sure this forum can be used to report typos like these. Just tell me if I am not in the right place.



small typo in Intel® 64 and IA-32 Architectures Software Developer’s Manual


It seems that there is a small typo in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (Order Number: 253665-054US April 2015), page 3-149 (cmpss instruction) :

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location.

It should be 32-bit memory location.




Guaranteed atomic operation clarification


I'm trying to understand a line in the Intel Architecture manual. It's a description of a memory operation that is guaranteed to be atomic.

The line is at Chapter 8, Section 8.1.1 "Guaranteed Atomic Operations", second bullet list, second item:
>16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The way I interpret this (which must be wrong) is: Accesses to 16-bit regions of memory that are not currently cached and that fit within a data bus that transfers 32-bit values.

the issue about APIC drop msix interrupt

hello, I have a difficult problem,.scenes are as follows:

the hardware env is Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz, a Altera FPGA board. 

the os is Linux debian-rss 3.16.7-ckt7

FPGA create 32 DMA transfer to cpu, generate a interrupt per transfer.

This 32 interrput distribution to 8 diffirent msix IRQ.

According to APIC spec, each interrupt maybe one in ISR, one in IRR,the third maybe dropped.

But now i distribution 2 interrputs to each IRQ, why maybe dropped interrputs?

Dynamic Shift


I am trying to achieve a dynamic shift. Well, let me explain the task. I process data with SSE, AVX. Data gets loaded, worked with and later results are stored. To support arbitrary lengths, I need some kind of maskload, but also for SSE.

Suppose my lenght is 9 elements, I work with int32 and SSE. First load, second load is fine. Third load is fine from memory bound, this is no problem. But only element 0 in vector register is valid, others need to be zero. How do I achieve this best?

S’abonner à Intel ISA Extensions