Intel® C++ Compiler

Parallelization of dyadic product

Hi,

I have two vectors (they can address the same vector) and I need to perform the product x[i]*y[j] with i,j=1..n.

What is the best way to perform this operation in parallel? I've tried

cilk_for(h=0;h<n*n;h++)r[h]=x[h/n]*y[h%n];

but I guess it is only a naive tentative to do that. Indeed vec-report says it is uneffcient.

Thanks.

Fabio

 

_mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

I am using the _mm_unpacklo_epi16 and _mm_unpackhi_epi16 with second argumet vector of 0s to convert signed/unsigned short vectors into 2 signed/unsigned integer vectors. i.e.:

__m128i lowVec  = _mm_unpacklo_epi16(vecA vec0);
__m128i highVec = _mm_unpackhi_epi16(vecA,vec0);

This works fine with 16 unsigned chars vector into 2 unsigned short  vectors using  _mm_unpacklo_epi8 and _mm_unpackhi_epi8, yet when the input vector is of 16 signed chars the 2 short values in result vectors are all 127+original values. 

simple vector addition

 

Hello,

I have a question w.r.t below scenario on Intel Sandy Bridge,

For simple vector addition code in C,

If I do dynamic memory allocation it vectorizes the main addition loop

    C[i] = A[i] + B[i]

even if I do not use any restrict keyword (icc 13)

But if I statically allocate arrays, it does not vectorize loop nor it specifies something about it in vectorization report.

Even if I do allocation using declspec(align) it does not vectorize.

What can be the causes ?

Thanks in advance,

  Chaitali

 

Memory leak?

Hi,

I have a C++ application which is coded this way:

  • The main program does not need much memory (just a few variables). But this main program runs a loop in which we call a function.
  • This function needs about 140 MB of memory to run. The memory is allocated in the function and then released (using RAII).

When I run this program overnight on OSX, here is the data I get from "Activity Monitor", or "top" in terms of memory consumption

Windows: CUDA 6.5 and Intel Compiler (2015)?

I've been trying to find an authoritative answer for this, but everything is a few years old.  Can the Intel compiler be used with CUDA 6.5 or 7.0 on Windows / Visual Studio 2013?

I did manage to get it to work for a few hours, but then Visual Studio crashed hard and had to be repaired, and that broke it. 

Is there support for using CUDA 6.5 or 7.0 with the Intel Compiler (2015)?

icc13 license

Hi, 

We have been using icc10 for building the binaries and few months back switched to using icc13 compiler. icc13 build time is more than the icc10. is this something to do with license file, i have read some where that using trial license will slow down the build. But we are not using trial version but using the license which was being used for icc10 to icc13. 
Please share your inputs. 
-regards, 
Balaji

Suscribirse a Intel® C++ Compiler