Intel® Streaming SIMD Extensions

Intel 64 documentation bug

This is for Intel 64 and IA-32 Architectures Software Developer’s Manual, Order Number: 325462-053
US January 2015. Page Vol. 1 5-29.

MOVZX (64-bits) Move doubleword to quadword, zero-extension

In fact 32 to 64 bit zero extension isn't supported as per Vol. 2A 3-583. It is only spec'd for r/m8 and r/m16.
There is a small chance that I'm misinterpreting that but then so is NASM which disallows this.
Somebody's wrong and so I'd like an official ruling.

Is profiling information running on SDE accurate and trustable?


         I am trying to look at AVX 512 performance, currently, I wrote a simple function for evaluation as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ), and I see the AVX-512 instructions are used from the asm files generated by compiler.

void complexVectorConjMpy(float *inputPtr1, float *inputPtr2, float *outputPtr, int numData)
    int idxData;
    float data1Re, data1Im, data2Re, data2Im;

Intel® Parallel Studio XE 2016 Beta

  • Sviluppatori
  • Partner
  • Professori
  • Studenti
  • Apple OS X*
  • Linux*
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Client business
  • Server
  • .NET*
  • C#
  • C/C++
  • Fortran
  • Strumenti per i cluster
  • Intel® Cluster Checker
  • Benchmark Intel® MPI
  • Intel® Trace Analyzer and Collector
  • Compilatore C++ Intel®
  • Intel® C++ Composer XE
  • Intel® Cilk™ Plus
  • Intel® Composer XE
  • Compilatore Fortran Intel®
  • Intel® Fortran Composer XE
  • Intel® Visual Fortran Composer XE
  • Debugger
  • Intel® Data Analytics Acceleration Library
  • Intel® Integrated Performance Primitives
  • Intel® Math Kernel Library
  • Intel® MPI Library
  • Intel® Threading Building Blocks
  • Intel® C++ Studio XE
  • Intel® Cluster Studio XE
  • Intel® Fortran Studio XE
  • Intel® Parallel Studio XE
  • Intel® Parallel Studio XE Cluster Edition
  • Intel® Parallel Studio XE Composer Edition
  • Intel® Parallel Studio XE Professional Edition
  • Intel® Advisor XE
  • Intel® VTune™ Amplifier XE
  • Intel® Inspector XE
  • Kit di sviluppo del software Intel® Cilk Plus
  • Intel® Cluster Poisson Solver Library
  • Intel® Streaming SIMD Extensions
  • Message Passing Interface
  • Ricerca
  • Big data
  • Elaborazione basata su cluster
  • Debugging
  • Strumenti di sviluppo
  • Settore dei servizi finanziari
  • Geolocalizzazione
  • Sanità
  • Ottimizzazione
  • Elaborazione parallela
  • Threading
  • Vettorizzazione
  • Measuring Core Voltage

    I am using an Atom N2600 processor. The intel software developer's manual says that a p-state can be requested by writing to MSR 0x199 and the locked p-state can be seen in MSR 0x198. The way to compute Core Voltage is given as MSR_PERF_STATUS[47:32] * (float) 1/(2^13).

    The data that I see in MSR_PERF_STATUS (MSR 0x198) is 62d104306001045. Bits [47:32] is always 1043 irrespective of the value that I set in MSR 0x199.

    When I use the formula: 0x1043 = 4163. Voltage = 4163/(2^13)=0.5 V, which is a really low voltage for the processor to operate stably at.

    why does _mm_mulhrs_epi16() always do biased rounding to positive infinity?

    Does anyone know why the pmulhrsw instruction or

    _mm_mulhrs_epi16(x) := RoundDown((x * y + 16384) / 32768)

    always rounds towards positive infinity? To me, this is terribly biased for negative numbers, because then a sequence like -0.6, 0.6, -0.6, 0.6, ... won't add up to 0 on average.

    Is this behavior intentional or unintentional? If it's intentional, what could be the use? Is there an easy way to make it less biased?

    Lucky for me, I can just change the order of my operations to get a less biased result (my function is a signed geometric mean):

    Iscriversi a Intel® Streaming SIMD Extensions