Intel® Advanced Vector Extensions

small typo in Intel® 64 and IA-32 Architectures Software Developer’s Manual

Hi,

It seems that there is a small typo in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (Order Number: 253665-054US April 2015), page 3-149 (cmpss instruction) :

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The second source operand (second operand) can be an XMM register or 64-bit memory location.

It should be 32-bit memory location.

Regards,

BeatriX

 

Guaranteed atomic operation clarification

Hello,

I'm trying to understand a line in the Intel Architecture manual. It's a description of a memory operation that is guaranteed to be atomic.

The line is at Chapter 8, Section 8.1.1 "Guaranteed Atomic Operations", second bullet list, second item:
>16-bit accesses to uncached memory locations that fit within a 32-bit data bus

The way I interpret this (which must be wrong) is: Accesses to 16-bit regions of memory that are not currently cached and that fit within a data bus that transfers 32-bit values.

the issue about APIC drop msix interrupt

hello, I have a difficult problem,.scenes are as follows:

the hardware env is Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz, a Altera FPGA board. 

the os is Linux debian-rss 3.16.7-ckt7

FPGA create 32 DMA transfer to cpu, generate a interrupt per transfer.

This 32 interrput distribution to 8 diffirent msix IRQ.

According to APIC spec, each interrupt maybe one in ISR, one in IRR,the third maybe dropped.

But now i distribution 2 interrputs to each IRQ, why maybe dropped interrputs?

Dynamic Shift

Hello,

I am trying to achieve a dynamic shift. Well, let me explain the task. I process data with SSE, AVX. Data gets loaded, worked with and later results are stored. To support arbitrary lengths, I need some kind of maskload, but also for SSE.

Suppose my lenght is 9 elements, I work with int32 and SSE. First load, second load is fine. Third load is fine from memory bound, this is no problem. But only element 0 in vector register is valid, others need to be zero. How do I achieve this best?

Java* Application Performance Improvement with Intel® Xeon® Processor E7 v3

Background

Java1, 2 is a programming language used for developing applications that can run on any operating system (OS). To do that, Java applications need to be compiled to bytecode.3 This bytecode can then be run on any Java Virtual Machine (JVM)4 without recompiling. To run Java applications on OSs like Windows* and Linux*, a Java Runtime Environment (JRE)7 must be installed.

  • Linux*
  • Server
  • Java*
  • JVM
  • Intel® Xeon® Processor
  • TYDIC*
  • Intel® QPI
  • Intel® TSX
  • Intel® AVX2
  • Intel® Advanced Vector Extensions
  • Cloud computing
  • Data center
  • Azienda
  • Settore dei servizi finanziari
  • Why is my AVX slower than SSE?

    As the description of "IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions",

    The AVX should be faster than SSE,But, my result of performance measurement as following:

     The computer supports AVX
    number CPU in the system = 4

     IIR Gaussian Filter Coefficients are:
    a0 = 0.021175, a1 = -0.017807, a2 = 0.021103, a3 = -0.017875, b1 = -1.837578, b2
     = 0.844174, cprev = 0.510583, cnext = 0.489409

    image width = 1024, height = 1024

    Running multi threaded SSE code

    Running multi threaded AVX code

    Iscriversi a Intel® Advanced Vector Extensions