Error: unimplented function(s) used

Error: unimplented function(s) used

Good morning forum,I ran the Shoc benchmark (http://ft.ornl.gov/doku/shoc) on Linux, on two different configurations:1) AMD 64 X2 4400+ w/ Ubuntu Lucid2)Intel Xeon E5620 @ 2.40GHz w/ RHEL6For some problem sizes of the benchmark, I run into an SDK error:For datasize=1: the AMD, in Reduction, Scan, Sort and Spmv; for the Intel, Spmv;No errors for datasize=2.The error reported in Shoc's Reduce is:Error: unimplemented function(s) used:allZero_v8_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduceallZero_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduceLog: Error: unimplemented function(s) used:allZero_v8_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduceallZero_i32 in function __Vectorized_reduceallOne_v8_i32 in function __Vectorized_reduceIn Shoc's Spmv, the error looks related:Error: unimplemented function(s) used:allZero_v4_i32 in function __Vectorized_spmv_csr_vector_kernelallOne_v4_i32 in function __Vectorized_spmv_csr_vector_kernelAnd I *think* (but I am not really sure) that rarely, these errors do not occur -- so it might be a race condition.What can I do to fix this? A gdb backtrace points me deep into the library, and we don't have the source code to that of course. If it helps, I can report the backtrace. I can also post the code to the kernels if that helps.Rutger HofmanVU Amsterdam

publicaciones de 14 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hi,

Please confurm that you have these issues on Intel Xeon / RHEL6.

Yes, the problem also occurs on the Xeon/RHEL. Do you want me to post the relevant kernels or even the complete benchmark tarball? It seems the Shoc site is unreachable currently.

Rutger

Sure,
Thanks a lot.
Please provide tarball.

Evgeny

Get it from here:http://www.cs.vu.nl/~rutger/shoc-1.1.0.tar.gzHmmm. I was imprecise with the error report. The 'unimplemented function(s)' only occurs on AMD. On Intel, the Spmv benchmark throws a segmentation violation deep in the OpenCL library.The error on Intel Xeon/RHEL occurs with size=1 with benchmark Spmv. To see the error quickly, configure and make. Then cd to the bin/Serial/OpenCL directory and run:$ ./Spmv -d 0 -s 1(a number of tests passes)Segmentation fault (core dumped)Any idea what the 'unimplemented function(s)' on the AMD CPU means, and how I can tackle that?Rutger

Thanks for additional info. Intel SDK requries SSE4.1 support (this is minimum requirement)
Please check if your AMD machine supports it.

We will try to reproduce the failure on the Xeon machine.

Evgeny

It seems that Athlon 64 X2 Dual Core processors support SSE1,2,3. That would explain the issue. Thanks.Rutger HofmanVU Amsterdam

I tried on yet another system, Intel Core i3 550 @ 3.20GHz running Ubuntu 10.04 x86_64 (since last night). The same segfault as on the Intel Xeon. Of course, it could be a Shoc problem, even though it doesn't surface with the AMD OpenCL/CPU SDK or the NVIDIA OpenCL/GPU SDK. And it is indeed a race condition, as it occasionally runs correctly to termination.
Rutger

Thanks,
For the info.
I asked our QA team to reproduce the issue.

We've reproduced this bug. This issue is connected with a mistake in the SHOC source code. To fix this problem add the following code:memset(newcols, 0, paddedSize*sizeof(int));
in filesrc/opencl/level1/spmv/util.h, line 478 (this line is empty) and recompile it. That should help you to avoid crashes.

Well, that is fast debugging! Thanks a lot. I will notify the Shoc people.
Rutger

You're welcome!I think we've already notified the SHOC team. But you can remind them about this problem one more time as they haven't fixed it yet.

Hello,

on core2duo E6850 stepping 0b, I got the following error:

Error: unimplemented function(s) used:
allZero_i32 in function __Vectorized_.preillumin
allOne_i32 in function __Vectorized_.preillumin
CompilerException Failed to parse IR

on a task (workgroup of 1 thread) for an housekeeping non optimized part of a code (listed at the end of my post).

Does it mean that I should avoid assignments as i=0; or i= -1; ???

code:

__kernel __attribute__((reqd_work_group_size(1,1,1))) void preillumin(const int iazimut,const int bufoffset,const int buflongfilled,const float azimut,const __global float *altilook,const __global float *integstartband,const __global float *integstopband,const __global double *xtraj2,const __global double *ytraj2,const __global double *ztraj2,const __global float *psiatti2,const __global float *thetaatti2,const __global float *phiatti2,const __global float *lobexbuf,const __global float *t2,const __global float *hardpreintbuf,const __global float *rangefilterbuf,__global float *restrict compenstoband)
{
__private int i,j,n,p;
__private float w,w2,w4,d1,d2,xfocus,yfocus,zfocus,xtraj,ytraj,ztraj;
__private float c,s;
__private float x,y,z;
__private float x1,y1,z1;

d2=azimut*(3.316377475857735e-02f)+(-1.877929329872131e+00f);
i=floor(d2);
if (i<0)
 {
 i=0;
 d2=0.0f;
 }
if (i>132)
 {
 i=132;
 d2=133.0f;
 }
d2-=i;
d1=get_global_id(0)*(9.692779287317110e-01f)+(-3.377743680511874e-01f);
j=floor(d1);
if (j<0)
 {
 j=0;
 d1=0.0f;
 }
if (j>43)
 {
 j=43;
 d1=44.0f;
 }
d1-=j;
n=45*i+j;
zfocus=(1.0f-d2)*((1.0f-d1)*altilook[n]+d1*altilook[n+1])+d2*((1.0f-d1)*altilook[n+45]+d1*altilook[n+46])+(-2.252434373756267e+03f);
w=get_global_id(0)*(2.955416520878326e+01f)+(3.603700952450512e+03f);
d2=zfocus+azimut*(-1.010356079772958e-04f);
d1=w*(3.501031082123518e-03f)+d2*(-1.010356084929913e-04f);
d2=w*w-d2*d2-d1*d1;
if (d2<0.0f)
 {
 d1=w*(3.602066051945484e-03f);
 zfocus=azimut*(1.010356079772958e-04f)+w*(-9.999935018887185e-01f);
 d2=0.0f;
 }
else
 d2=sqrt(d2);
xfocus=azimut*(8.592837859064036e-01f)+d1*(8.592837902922715e-01f)+d2*(-5.114991375759568e-01f);
yfocus=azimut*(-5.114991349652156e-01f)+d1*(-5.114991375759568e-01f)+d2*(-8.592837902922715e-01f);
w2=0.0f;
for (n=0;n<=2;n++)
 {
 p=64*iazimut+get_global_id(0);
 p=floor(integstartband[p]+(integstopband[p]-integstartband[p])*(5.000000000000000e-01f)*n+0.5f);
 if (p<0)
  p=0;
 if (p>=buflongfilled)
  p=buflongfilled-1;
 i=(p+bufoffset)&2047;
 xtraj=xfocus-xtraj2[i];
 ytraj=yfocus-ytraj2[i];
 ztraj=zfocus-ztraj2[i];
 d2=sqrt(xtraj*xtraj+ytraj*ytraj+ztraj*ztraj);
 x1=(8.660254037844387e-01f);
 y1=(5.302876193624535e-17f);
 z1=(-4.999999999999999e-01f);
 c=cos(phiatti2[i]);
 s=sin(phiatti2[i]);
 x=x1*c+z1*s;
 z1=z1*c-x1*s;
 x1=x;
 c=cos(thetaatti2[i]);
 s=sin(thetaatti2[i]);
 y=y1*c-z1*s;
 z1=z1*c+y1*s;
 y1=y;
 c=cos(psiatti2[i]);
 s=sin(psiatti2[i]);
 x=x1*c+y1*s;
 y1=y1*c-x1*s;
 x1=x;
 w=(xtraj*x1+ytraj*y1+ztraj*z1)/d2;
 w=native_sqrt(1.0f-w*w);
 i=floor(w*4000.0f+0.5f);
 if (i<0)
  i=0;
 if (i>3999)
  i=3999;
 w4=lobexbuf[i];
 if (!(w4>1.000000047497451e-03f))
  w4=0.0f;
 i=p-1;
 if (i<0)
  i=0;
 i=(bufoffset+i)&2047;
 j=p+1;
 if (j>buflongfilled)
  j=buflongfilled;
 j=(bufoffset+j)&2047;
 x=xtraj2[j]-xtraj2[i];
 y=ytraj2[j]-ytraj2[i];
 z=ztraj2[j]-ztraj2[i];
 w=native_divide(xtraj*x+ytraj*y+ztraj*z,d2*(t2[j]-t2[i]));
 i=floor(w*(1.292359000000000e+06f)+(1.874149169921875e+03f));
 if (i<0)
  i=0;
 if (i>3999)
  i=3999;
 w=hardpreintbuf[i];
 w4*=w;
 i=floor(d2*(6.101493061542592e+00f)+(-2.613644720440822e+04f));
 if (i<-4097)
  i=-4097;
 if (i>4094)
  i=4094;
 w4=native_divide(w4,rangefilterbuf[i&8191]);
 i=1+(n&1);
 i=i+i;
 if (n==0 || n==2)
  i=1;
 w2+=i*w4;
 }
w2*=1.666666666666667e-01f;
p=64*iazimut+get_global_id(0);
i=floor(integstopband[p]);
if (i>=buflongfilled-1)
 i=buflongfilled-2;
w=integstopband[p]-i;
w=w*t2[(bufoffset+i+1)&2047]+(1.0f-w)*t2[(bufoffset+i)&2047];
i=floor(integstartband[p]);
if (i<0)
 i=0;
w4=integstartband[p]-i;
w-=w4*t2[(bufoffset+i+1)&2047]+(1.0f-w4)*t2[(bufoffset+i)&2047];
w2*=w;
compenstoband[get_global_id(0)]=w2;
}

Okay, I checked that Core2 E6850 does not have the SSE4.1 instructions !

My mistake (SSE4.1 is a requisite for Intel OpenCL SDK)

on a Xeon X5570 (I just checked) it compiles with no error.

Deje un comentario

Por favor inicie sesión para agregar un comentario. ¿No es socio? Únase ya