Parallel Computing

Parallel Search With Cilk Plus

Hi everyone , I have a program that generating random number if it does not exist in ( allocated custom size) array  then add array. But if custom size is very big   ( 1 million )  after a period search is very slowing down. I did learn  cilk_for and reducers.I want to paralleize but I could not decide what reducer is suitable for array. Is there someone who can help me ? 

(Sorry for my english if you do not understand my problem you can write my e-mail   " " )  


Catastrophic error while using _mm512_extload_epi32

Dear experts,

I'm having some troubles while using the _mm512_extload_epi32 instrinsic. I want to load 16 signed char elements and convert them to int32 vector. The instruction is:

___m512i v = m512_extload_epi32(buffer, _MM_UPCONV_EPI32_SINT8 , _MM_BROADCAST32_NONE, _MM_HINT_NONE ); //buffer is aligned to 16-bytes

When I compiled it, icc said "catastrophic error: Invalid upconversion argument to intrinsic."

icc version 14.0.2 (gcc version 4.4.7 compatibility). MPSS version 3.1.4. 

Can someone tell me where is the mistake?

vtune Analyze Active Power Consumption

i would like to Analyze Active Power Consumption using vtune.

however after running a general exploration analysis (collect stacks selected), the PMU tab does not show Energy consumption

i setup my system as described here

the needed drivers are up and running and the output from amplxe-runss suggests that power analysis is available.




I am trying to use gather function in the follow code on MIC. When I ran it, it said "Segmentation fault". Can someone tell me how to fix it?




#include <stdlib.h>
#include <math.h>
#include <stdio.h>
#include <time.h>
#include <immintrin.h>

#define N 32

int main(){

   double a[N], b[N], c[N];

   for(int i = 0; i < N; i++)
         a[i] = 1.0*i;
         b[i] = 2.0*i;
         c[i] = -1.0;

Documentation Error, IPP 8.1 Volume 2, ippiGradientVectorScharr

The entry for ippiGradientVectorScharr shows a kernel identical to that shown for ippiGradientVectorPrewitt with sample weights of magnitude [1, 1, 1].  I believe for the Scharr gradient filter the kernel weights should be of magnitude [3, 10, 3].  This appears to be a simple copy/paste error in the document.

Technical Presentation tomorrow on new Optimization Reporting in Compiler 15.0 beta

Hi all,

We've got a technical presentation coming tomorrow (9am Pacific) on one of the key new features in Intel® Composer XE 2015 (both C++ and Fortran). If you're interested in performance tuning involving vectorization or inlining or other optimizations, I encourage you to attend. The full description and a link to register follows.

memory not free when using tbb::concurrent_hash_map

hi everyone!

I have run into a problem with concurrent_hash_map, We are trying to use tbb::concurrent_hash_map to store key-value resource, we have a pointer in the value of sturct which point to memory dynamic allocated,  late we would traveled through it and delete timeout object.

my test code is 

Bug in SDE emulation of AVX-512 _mm512_permutevar_ps() ?


I have an issue with SDE emulating _mm512_permutevar_ps() [aka VPERMPS] in an unexpected way. I understand from the documentation that it should behave as the 512 bit variants of _mm256_permutevar8x32_ps(), and be able to do cross-lane shuffling. So the attached file should reverse the content of the vector. It works with _mm256_permutevar8x32_ps(), but _mm512_permutevar_ps() clearly doesn't produce the expected results, but rather an intra-lane shuffling:

Parallel Computing abonnieren