矢量化

is std:sort faster than ipp sort ippsSortDescend_64f_I

hi,

I am trying to benchmark std::sort on vectors of double vs ipp sort using (ippsSortDescend_64f_I). I am sorting 200 vectors of length 2000 elements each.  My test program is attached in this post. I see that std::sort is consistently performing better ( at least 10 times faster) than ipp sort. Is this expected behavior? I would prefer to move to ipp assuming its better in performance but i am not able to prove it. What am I doing wrong in my test program? 

How to set a 'rec' structuring element in ippiMorphCloseBorder_8u_C1R

Hello,

I am using IPP 9.0 Intel 64 to try to reducing processing time in a morphology closing operation on a 2048x2048 image.  I have used perfsys and am seeing a significant reduction in the processing time when Parm5 is 'rec'.  I'm assuming that this is a rectangular structure element, which is what I'm using. 

function
Parm1
Parm2
Parm3
Parm4
Parm5
Parm6
Parm7
Parm8
Comment
Clocks
per
Time (usec)

Accounting for CYCLE_ACTIVITY.CYCLES_NO_EXECUTE

Hi all,

I am using Vtunes' bandwidth profile to look at the fraction of time my software is waiting on any cache accesses on my HSW i7 processor. The CYCLE_ACTIVITY.CYCLES_NO_EXECUTE gives this time. When I try to break this down into fraction of time waiting on L1, L2, and L3+Mem, I am trying to use CYCLE_ACTIVITY.STALLS_L1D_PENDING, ...STALLS_L2_PENDING, and STALLS_LDM_PENDING. However, the sum of these three counts is > the CYCLES_NO_EXECUTE count always.

iconv issue

hi all,

 

I'm trying to build something for the Phi that depends on iconv; the library routines are present , but the following application fails when run on the Phi:

#include <stdlib.h>
#include <iconv.h>

int main () {
  iconv_t cd;
  cd = iconv_open("latin1","UTF-8");
  if(cd == (iconv_t)(-1)) exit(1);
  iconv_close(cd);

  exit(0);
}

if I build this using "icc -o iconv_test iconv_test.c" and run it on the host it return no error (exit code 0).

Using MPI parMETIS with cluster_sparse_solver

Hello.

I am optimizing the `cluster_sparse_solver` runtime. In my case, the majority of the runtime is taken by phase `11`, symbolic factorization, with METIS. Additionally, only a single node is used in an otherwise `MPI`-enabled application.

I was wondering if there is a way to use `parMETIS` for fill-reducing ordering, in order to benefit from the cluster environment. One thing that would help tremendously is the source code for `cluster_sparse_solver`.

The version of MKL in question is mkl 11.2u3, which was bundled with composer_xe 2015 3.187.

Thanks!

订阅 矢量化