Segfault on a raytracing benchmark from the PARSEC benchmark suite

Segfault on a raytracing benchmark from the PARSEC benchmark suite

Bild des Benutzers Zhunping Zhang

Hello, I think I encountered a bug in the Intel compiler 12.1.0, for the same setting GCC runs smoothly but ICC produces a segfault. Here is steps to repdocue the bug:
Test Machine:Linux:Linux 2.6.32-5-amd64 #1 SMP Mon Jan 16 16:22:28 UTC 2012 x86_64 GNU/LinuxCPU: 12 Core
Processor : 11vendor_id : GenuineIntelcpu family : 6model : 44model name : Intel Xeon CPU X5650 @ 2.67GHzstepping : 2cpu MHz : 2666.995cache size : 12288 KBphysical id : 1siblings : 6core id : 10cpu cores : 6apicid : 52initial apicid : 52fpu : yesfpu_exception : yescpuid level : 11wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpidbogomips : 5333.50clflush size : 64cache_alignment : 64address sizes : 40 bits physical, 48 bits virtualprocessor : 11vendor_id : GenuineIntelcpu family : 6model : 44model name : Intel Xeon CPU X5650 @ 2.67GHzstepping : 2cpu MHz : 2666.995cache size : 12288 KBphysical id : 1siblings : 6core id : 10cpu cores : 6apicid : 52initial apicid : 52fpu : yesfpu_exception : yescpuid level : 11wp : yesflags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpidbogomips : 5333.50clflush size : 64cache_alignment : 64address sizes : 40 bits physical, 48 bits virtual
ICC:icc version 12.1.0 (gcc version 4.4.5 compatibility)

Steps to run it with GCC:
Please download the Parsec-2.1 version from the websitehttp://parsec.cs.princeton.edu/. After it finished, cd to the directory pkgs/apps/raytrace/src, please make sure that cmake and gcc is installed.Compile it with:$ cmake .$ make This is the version compiled with gcc. To run it with gcc, please do the following: $ cd ../inputs$ tar xvf input_simsmall.tar$ ../src/bin/rtview happy_buddha.obj -nodisplay -automove -nthreads 1 -frames 1 -res 480 270 This runs smoothly and finishes on my machine.Now to reproduce the bug with ICC.
Steps to reproduce the bug in ICC:
Please go to the directory pkgs/apps/raytrace/src:$ cd ../src (continuing from the GCC running)Edit the CMakeCache.txt, and modify the following two lines:
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/c++CMAKE_C_COMPILER:FILEPATH=/usr/bin/gcc
into:
CMAKE_CXX_COMPILER:FILEPATH=/usr/bin/iccCMAKE_C_COMPILER:FILEPATH=/usr/bin/icc
Then go back to the command line and type :
$ make
This compiles the same app with ICC. To run it:
$ bin/rtview ../inputs/happy_buddha.obj -nodisplay -automove -nthreads 1 -frames 1 -res 480 270
This produces a Segfault on my machine.

Some analysis

A colleague helped trace down the execution and he thinks the program probably have a corrupted heap. One trace showed that the segfault happens at a movaps instruction with the address not aligned. However, a former movaps instruction with unaligned address passed. So it might be that the address points to an invalid location. Then we added in some printf, and the segfault happens at the call printf. So it looks like something more fundamental is wrong, but we are not sure. A valgrind run with a smaller data set (inputs/inputs_simdev.tar) reveals many error for the ICC one but four erros for the GCC one. Hope this helps.

Justin

6 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers Zhunping Zhang

Hello, I wonder if anyone can take a look at this?
Many many thanks!

Bild des Benutzers Georg Zitzlsberger (Intel)

Hello Justin,

I saw this thread being open for too long without response. Hence I'm taking a look. Up to now I can reproduce the SEGV and come back to you once analyzed.

Regards,

Georg Zitzlsberger

Bild des Benutzers Georg Zitzlsberger (Intel)

Hello Justin,

I've analyzed the problem and it seems to be a severe issue with unaligned access. Hence I've escalated it to compiler engineering (DPD200294372). Unfortunately I cannot provide you a workaround because it seems to be a general issue.
As soon as I learn more I'll let you know.

Best regards,

Georg Zitzlsberger

Bild des Benutzers jcebrian

Hi.

I had a similar problem with Raytrace but using GCC 4.7 and using real SSE, not Emulated SSE. After tracing the problem for a while I found out that it was the initialization that was wrong.

For me, RTBox.hxx:299


/// Treat this as aligned box of 3D vectors (which it is really is).

        _INLINE float volume() const {

            return ((RTBox_t<3, float, 16>*)this)->volume();

        }

        _INLINE float area() const {

            float a3 = ((RTBox_t<3, float, 16>*)this)->area();

            //float a4 = ((RTBox_t<4, float, 16>*)this)->area();

            //cout << a3 << "t" << a4 << endl;

            return a3;

        }

Does not seem to work properly. Area function is then expanded from inline function:


        /// Box area. Valid for 2D and 3D, for other dimensions first 3 components will be used.

        _INLINE DataType area() const {

            DataType a = (m_max[0]-m_min[0]) * (m_max[1]-m_min[1]);

            if (N >= 3) {

                a = 2 * (a +

                         (m_max[0]-m_min[0]) * (m_max[2]-m_min[2]) +

                         (m_max[1]-m_min[1]) * (m_max[2]-m_min[2]));

            }

            return a;

        }

But values of m_max[0] [1] and [2] are not properly read, in fact, [0] reports "0", m_max[1] reports m_max[0] and m_max[2] reports m_max[1].
I "fixed" this by using the "Emulated SSE" code with 16 alignment:


        _INLINE float volume() const {

            return RTBox_t<3, float, 16>(min3f(), max3f()).volume();

        }

        _INLINE float area() const {

            return RTBox_t<3, float, 16>(min3f(), max3f()).area();

        }

However, there should be a better way to solve this. I'm not completely sure but later on that file, RTBoxSSE is defined as RTBox3a, and RTBox3a as RTBox_t<1, sse_f>, thus leaving RTVec as align 0 by default?

typedef RTVec_t RTVec;

And then

RTVec m_min;
RTVec m_max;

Also as align 0?.

Correct me if I'm wrong, this code is a little bit complex :)

Jm.

Bild des Benutzers Georg Zitzlsberger (Intel)

Hello Justin,

I just got informed that a fix for the above problem is part of Intel(R) Composer XE 2013 SP1 (and higher).

Best regards,

Georg Zitzlsberger

Melden Sie sich an, um einen Kommentar zu hinterlassen.