Internal compiler error 010101_239

Internal compiler error 010101_239

Hi guys,

I condensed our project down to a piece of code that lets you reproduce the following issue.
When I compile this in Release configuration (Debug works), I get this compiler error:

1>------ Build started: Project: ng-gtest, Configuration: Release x64 ------
1> CilkTest.cpp
1>" : error : 010101_239
1>
1> compilation aborted for General\CilkTest.cpp (code 4)
========== Build: 0 succeeded, 1 failed, 3 up-to-date, 0 skipped ==========

This is our compiler: Intel(R) C++ Intel(R) 64 Compiler XE for Intel(R) 64, version 14.0.3 Package ID: w_ccompxe_2013_sp1.3.202
OS: Windows 7, x64.

This is the code:

#include <math.h>

const int VecSize = 8;

const short* acdata;
const short* lowdata;

const unsigned short* meas_data;
const unsigned short* rdval;

short trident[2 * VecSize];
short speed[2 * VecSize];
float spdfact[VecSize];
float spdfact2[VecSize];
float tdat[VecSize];
float array1[VecSize];
float array2[VecSize];

const float *input_01;
const float *input_02;
float val0;
float vvv9;
float agn;

void get_g(float ag[VecSize], const float ae[VecSize], const float* pp)
{
float a01[VecSize];
a01[:] = ae[:];
if (a01[:] >= 360.0f)
a01[:] -= 360.0f;
if (a01[:] >= 360.0f)
a01[:] -= 360.0f;
if (a01[:] < 0.0f)
a01[:] += 360.0f;
if (a01[:] < 0.0f)
a01[:] += 360.0f;

int i0[VecSize], i1[VecSize];
i1[:] = static_cast<int>(a01[:]);
i0[:] = i1[:] + 1;

float g0[VecSize], g1[VecSize];
g1[:] = pp[i1[:]];
g0[:] = pp[i0[:]];

ag[:] = g0[:] - (g1[:] - g0[:]) * (a01[:] - static_cast<float>(i0[:]));
}

void f(float prlo[VecSize], const int cntr)
{
float cop[VecSize];
short cod[2 * VecSize];
short maxm[2 * VecSize];

cod[0:VecSize:2] = acdata[cntr:VecSize];
cod[1:VecSize:2] = lowdata[cntr:VecSize];
maxm[0:VecSize:2] = lowdata[cntr:VecSize];
maxm[1:VecSize:2] = acdata[cntr:VecSize];
cop[:] = (1.0f / float(255 * 255)) * static_cast<float>(
cod[0:VecSize:2] * trident[0:VecSize:2] + cod[1:VecSize:2] * trident[1:VecSize:2]);
float music[VecSize];
music[:] = (1.0f / float(16383 * 16383)) * static_cast<float>(
maxm[0:VecSize:2] * speed[0:VecSize:2] + maxm[1:VecSize:2] * speed[1:VecSize:2]);

float velo2[VecSize];
float brigh[VecSize];
velo2[:] = static_cast<float>(rdval[cntr:VecSize]) * spdfact[:];
brigh[:] = asinf(velo2[:]);

float denom[VecSize];
denom[:] = cop[:] * array1[:] + array2[:] / brigh[:];

float accel[VecSize];
accel[:] = atanf(music[:] / denom[:]);
accel[:] = atan2f(accel[:], velo2[:]);

bool haMask[VecSize];
haMask[:] = accel[:] < 0.0f;
if (haMask[:] & (music[:] > 0.0f))
accel[:] += float(9.81 / 2);
if (!haMask[:] & (music[:] < 0.0f))
accel[:] -= float(9.81 / 2);

float diff[VecSize];
diff[:] = array1[:] * brigh[:] - array2[:] * cop[:] * velo2[:];
float prod[VecSize];
prod[:] = sinf(accel[:]) * diff[:];

float accel2[VecSize];
accel2[:] = atanf(prod[:] / music[:]);

float halter[VecSize] = { 0.0f };
if (cop[:] <= 0.0f)
halter[:] = 2.8182963f;
float valter[VecSize];
valter[:] = cop[:] > 0 ? velo2[:] - tdat[:] : velo2[:] + tdat[:];

if (music[:] == 0)
{
accel[:] = halter[:];
accel2[:] = valter[:];
}
accel[:] *= float(9.81);
accel2[:] *= float(9.81);

float v8[VecSize];
v8[:] = 25.7385f - accel2[:];
float hxx[VecSize];
hxx[:] = fabsf(accel[:]) * float(1.38e-23);
float hnn[VecSize];
hnn[:] = -accel[:];

float vgg[VecSize], gx2[VecSize], hms[VecSize], sv[VecSize];
get_g(vgg, accel2, input_02);
get_g(gx2, v8, input_02);
get_g(hms, hnn, input_01);
sv[:] = ((1.0f - hxx[:]) * (val0 - vgg[:])) + (hxx[:] * (vvv9 - gx2[:]));

float p0[VecSize];
p0[:] = hms[:] - sv[:];

prlo[:] = static_cast<float>(meas_data[cntr:VecSize]) * spdfact2[:] - p0[:] - agn;
}

13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I've been unable to reproduce the problem. I extracted the code into a file, and issued the following command:

bash-3.2$ icl test.cpp
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0 Build 20140303
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

test.cpp
Microsoft (R) Incremental Linker Version 9.00.30729.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:test.exe
test.obj
LIBCMT.lib(crt0.obj) : error LNK2019: unresolved external symbol main referenced in function __tmainCRTStartup
test.exe : fatal error LNK1120: 1 unresolved externals

So the file was successfully compiled and failed in the linker. The problem may already be fixed. I'm using a nightly compiler build from March 3 which may be newer than the compiler you've got. To be sure, I'll need a Visual Studio build log to reproduce the command line options.

  - Barry

I'm using the same compiler with Microsoft Visual Studio* 2013, and I'm not seeing a problem:

 

1>------ Build started: Project: test, Configuration: Release x64 ------

1> icl /Qvc12 "/Qlocation,link,C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64" /Zi /W3 /O2 /Oi /Qipo /Qftz- -D __INTEL_COMPILER=1400 -D WIN32 -D NDEBUG -D _CONSOLE -D _LIB -D _UNICODE -D UNICODE /EHsc /MD /GS /Gy /Zc:wchar_t /Zc:forScope /Fox64\Release\ /Fdx64\Release\vc120.pdb /TP test.cpp

1>

1> Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.3.202 Build 20140422

1> Copyright (C) 1985-2014 Intel Corporation. All rights reserved.

1>

1> test.cpp

========== Build: 1 succeeded, 0 failed, 0 up-to-date, 0 skipped ==========

 

Can you turn off the Startup banner (/nologo) and see if your command line matches mine above? And are you using a different Visual Studio version?

Brandon Hewitt
Technical Consulting Engineer

For 1:1 technical support: http://premier.intel.com

Software Product Support info: http://www.intel.com/software/support

Hi

I think you need vectorization to reproduce the problem.
I use Visual Studio Premium 2013.
This is my output:

1>------ Build started: Project: ng-gtest, Configuration: Release x64 ------
1> icl /Qvc12 "/Qlocation,link,D:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\bin\amd64" /Zi /W3 /MP /O3 /Oi /Qip /Qftz -D __INTEL_COMPILER=1400 -D WIN32 -D NDEBUG -D _CONSOLE -D NG_EXPORTS -D NG_DLL_ID=1 -D USE_TBB_PARALLEL -D USE_TBB_RWLOCK -D NOMINMAX -D BOOST_FILESYSTEM_NO_DEPRECATED -D _WINDLL -D _SCL_SECURE_NO_WARNINGS -D BOOST_MULTI_INDEX_DISABLE_SERIALIZATION -D NOMINMAX -D _WINDLL -D _VARIADIC_MAX=10 -D _UNICODE -D UNICODE /EHsc /MD /GS /fp:precise /QxSSE3 /Zc:wchar_t /Zc:forScope /Qstd=c++11 /Qrestrict /Fo.\tmp\msvc_x64_ur\ /Fd.\tmp\msvc_x64_ur\vc120.pdb /TP General\CilkTest.cpp
1>
1> Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 14.0.3.202 Build 20140422
1> Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
1>
1> CilkTest.cpp
1> *** Compiling Cilk test code in Debug (fails to compile in Release with w_ccompxe_2013_sp1.3.202.
1>" : error : 010101_239
1>
1> compilation aborted for General\CilkTest.cpp (code 4)
========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

/fp:precise seems to be the critical option.  I've submitted CQ256573 on this problem.

Thank you for reporting it.

   - Barry

You are right, /fp:precise and /fp:strict don't work. /fp:fast and /fp:fast=2 both work.

Thanks, Barry, for your investigations.

Cheers,

Martin

I see it too. Looks like something that broke between the 12.1 and 13.0 compilers. Unfortunately, I don't see any easy workarounds beyond not using /fp:precise, which isn't ideal. When there's progress made on the investigation here, I'll update the thread.

Brandon Hewitt
Technical Consulting Engineer

For 1:1 technical support: http://premier.intel.com

Software Product Support info: http://www.intel.com/software/support

Hi Brandon,

unfortunately it's worse than I thought. Although /fp:fast compiles successfully, the resulting code is buggy.

I was able to get the code to compile with /fp:precise by splitting the function into smaller pieces and using __declspec(noinline), and this binary passed our tests. The binary with /fp:fast caused an access violation, no matter if I split the function or not.

From that I would conclude that there is an issue with CilkPlus which leads to either an internal error or buggy code.

Unfortunately I cannot provide test data at this stage. It's hard to extract this from our test environment.

Regards,
Martin

Martin,

It may be a good idea to check what kind of access violation it is. If it's a bad address, then we're probably stuck until we can get a test case from you, but if it's a stack overflow or something along those lines, it might be easier to workaround and reproduce

Brandon Hewitt
Technical Consulting Engineer

For 1:1 technical support: http://premier.intel.com

Software Product Support info: http://www.intel.com/software/support

Hi Brandon,

the stack seems to be corrupt. The line "get_g(gx2, v8, input_02);" is compiled into these instructions.

000007FEE36DA28A  movsxd      rbp,dword ptr [rsp+20h]  
000007FEE36DA28F  movsxd      r10,dword ptr [rsp+30h]  
000007FEE36DA294  movsxd      rdi,dword ptr [rsp+24h]  
000007FEE36DA299  movsxd      r11,dword ptr [rsp+34h]  
000007FEE36DA29E  movsxd      r8,dword ptr [rsp+28h]  
000007FEE36DA2A3  movsxd      r12,dword ptr [rsp+38h]  
000007FEE36DA2A8  vmovss      xmm2,dword ptr [rax+rbp*4]  
000007FEE36DA2AD  vmovss      xmm15,dword ptr [rax+rbp*4+4]  
000007FEE36DA2B3  vmovss      xmm9,dword ptr [rax+r10*4]  
000007FEE36DA2B9  vmovss      xmm4,dword ptr [rax+r10*4+4]  
000007FEE36DA2C0  movsxd      r9,dword ptr [rsp+2Ch]  
000007FEE36DA2C5  movsxd      r14,dword ptr [rsp+3Ch]  
000007FEE36DA2CA  vinsertps   xmm7,xmm2,dword ptr [rax+rdi*4],10h  
000007FEE36DA2D1  vinsertps   xmm5,xmm15,dword ptr [rax+rdi*4+4],10h  
000007FEE36DA2D9  vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h  
000007FEE36DA2E0  vinsertps   xmm2,xmm4,dword ptr [rax+r11*4+4],10h  
000007FEE36DA2E8  vinsertps   xmm6,xmm7,dword ptr [rax+r8*4],20h  
000007FEE36DA2EF  vinsertps   xmm3,xmm5,dword ptr [rax+r8*4+4],20h  
000007FEE36DA2F7  vinsertps   xmm1,xmm8,dword ptr [rax+r12*4],20h  
000007FEE36DA2FE  vinsertps   xmm7,xmm2,dword ptr [rax+r12*4+4],20h  
000007FEE36DA306  vinsertps   xmm10,xmm6,dword ptr [rax+r9*4],30h  
000007FEE36DA30D  vinsertps   xmm6,xmm3,dword ptr [rax+r9*4+4],30h  
000007FEE36DA315  vinsertps   xmm14,xmm1,dword ptr [rax+r14*4],30h  
000007FEE36DA31C  vinsertps   xmm9,xmm7,dword ptr [rax+r14*4+4],30h 

 

r11 has this value: 0xffffffff80000000, therefore the instruction at address 000007FEE36DA2D9 (vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h) causes an access violation.

As you can see, r11 is loaded from the stack with movsxd      r11,dword ptr [rsp+34h], and the stack at address [rsp+34h] is: 00 00 00 80.

 

Regards,

Martin

Hi Brandon,

the stack seems to be corrupt.

The source code line "get_g(gx2, v8, input_02);" gets translated into these instructions:

000007FEE36DA299  movsxd      r11,dword ptr [rsp+34h]  
000007FEE36DA29E  movsxd      r8,dword ptr [rsp+28h]  
000007FEE36DA2A3  movsxd      r12,dword ptr [rsp+38h]  
000007FEE36DA2A8  vmovss      xmm2,dword ptr [rax+rbp*4]  
000007FEE36DA2AD  vmovss      xmm15,dword ptr [rax+rbp*4+4]  
000007FEE36DA2B3  vmovss      xmm9,dword ptr [rax+r10*4]  
000007FEE36DA2B9  vmovss      xmm4,dword ptr [rax+r10*4+4]  
000007FEE36DA2C0  movsxd      r9,dword ptr [rsp+2Ch]  
000007FEE36DA2C5  movsxd      r14,dword ptr [rsp+3Ch]  
000007FEE36DA2CA  vinsertps   xmm7,xmm2,dword ptr [rax+rdi*4],10h  
000007FEE36DA2D1  vinsertps   xmm5,xmm15,dword ptr [rax+rdi*4+4],10h  
000007FEE36DA2D9  vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h  
000007FEE36DA2E0  vinsertps   xmm2,xmm4,dword ptr [rax+r11*4+4],10h 

 

The instruction at address 000007FEE36DA2D9 (vinsertps   xmm8,xmm9,dword ptr [rax+r11*4],10h) causes the access violation, because r11 has the value 0xffffffff80000000. As you can see, r11 is loaded from the stack (movsxd      r11,dword ptr [rsp+34h]), and the stack at address [rsp+34h] is: 00 00 00 80.

 

Regards,

Martin

 

Hi again Brandon,

after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.

 

So we do need to get that code compiled with /fp:precise. Did you raise a bug?

 

Regards,

Martin

Quote:

after some investigation I can say that the reason for the wrong values on the stack is a mathematical instability of the algorithm caused by the trigonometric functions and divisions which obviously only occurs with /fp:fast but not with /fp:precise. Actually the algorithm was designed to be robust against mathematical instabilities.

Well, perhaps.  However, I suggest that you would be better off attending to its robustness.  In one place, you have:

    float hxx[VecSize];
    hxx[:] = fabsf(accel[:]) * float(1.38e-23);
    ...
    sv[:] = ((1.0f - hxx[:]) * (val0 - vgg[:])) + (hxx[:] * (vvv9 - gx2[:]));

That raises alarm bells in my head!  At best, you have a very poorly scaled problem, and poor scaling is a classic cause of numerical problems.  Despite modern belief, the use of floating-point (as distinct from fixed-point) is NOT a solution to all scaling problems, though you would have to look at some very serious (and probably 1960s or 1970s) textbooks to see discussions of the issues.

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today