Intel ISA Extensions

Haswell TLBs undefined in Intel cpu spec

I am currently upgrading my cpuid detection of Intel TLBs and have a Haswell 4770 cpu.  I note that in the 4 registers returned by cpuid test eax=2 I observe undefined descriptors of 0xc1 and 0xb6 being returned which are not defined in the Intel cpu spec for my Intel i7 4770 released cpu.  

CAn someone at intel update the spec for tlb detection in leaf eax=2 and let me know what is missing "please".  I use this in my high perf code for tlb detection and currently don't detect any 2nd level TLB.


Almost-unit-stride stores

Hi all

I have a AVX vector register reg containing 4 double values, let's call them (in order): 0 - 2 - 3 - 4
These values have to be added to distinct locations of an array A, namely to positions A[0], A[2], A[3], A[4]
In other words:

A[0] += reg[0], A[2] += reg[2] and so on

This is a quite recurrent situation in my program, i.e. sequences of load-add-stores that are "almost" unit-stride - but actually they are not. 

To use FPU

The following code is to use FPU. I run it on E5-2620. It only upto 2 GFlops. If I want to 2*8 GFlops, how could I code program?

Any help will be appreciated.

void* test_pd_avx()
  double x[4]={12.02,14.34,34.23,234.34};
  double y[4]={123.234,234.234,675.34,3453.345};
  __m256d mx=_mm256_load_pd(x);
  __m256d my=_mm256_load_pd(y);
    __m256d mz=_mm256_mul_pd(mx,my);   

The Compiler Option: icc test.c -O0

Haswell GFLOPS

Hi Intel Experts:

    I cannot find the latest Intel Haswell CPU GFlops, could you please let me know that?

    I want to understand the performance difference between Haswell and Ivy-bridge, for example, i7-4700HQ and i7-3630QM. From Intel website, I could know i7-3630QM's GFlops is 76.8 (Base). Could you please let me know that of i7-4700HQ?

    I get some information from internet that: 

        Intel SandyBridge and Ivy-Bridge have the following floating-point performance: 16-SP FLOPS/cycle --> 8-wide AVX addition and 8-wide AVX multiplication.

Question about example on Optimization manual---AVX mask move to avoid branch penalty

Hi all,

I am trying to run an example introduced by optimization manual(June 2013) on page 11-23, example 11-14. I tried to use a separate .s file to write the function, and a main.c file to do the main func. The code will only run correctly in debug mode. Please see attachment for my code. The cond_loop.c is actually cond_loop.s but the forum won't accept this kind of extension.  

IPP causes invalid opcode exception at h9_ippsFFTGetSize_C_32fc

We are using IPP version on 4th generation (Haswell) Core i7 processor under INtime (5) operating system.

We are using static linkage (#include <ipp_h9.h> before #include <ipp.h>).

A call to ippsFFTInitAlloc_C_32fc causes an invalid opcode exception. This occurs inside h9_ippsFFTGetSize_C_32fc function when trying to execute the les esp,edx instruction.

Subscribe to Intel ISA Extensions