Binary instrumentation with itcpin fails with segmentation violation

Binary instrumentation with itcpin fails with segmentation violation

Hi,
I want to use the Intel Trace Collector on a 64-bit Windows HPC Server 2008 (2x Xeon E5450) to instrument a binary that uses Intel MPI 3.2.2.
Package ID: w_itac_p_8.0.3.008
Build Number: w_itac_p_8.0.3.04_intel64
Package Contents: Intel Trace Analyzer and Collector for Windows*

On a command line, I initialize the ITAC variables by executing "itacvars.bat". Then I start the application using the following command:

impiexec.exe  -genv VT_DLL_DIR "%VT_DLL_DIR%" -genv VT_MPI_DLL "%VT_MPI_DLL%" -genv VT_LOGFILE_PREFIX "" -genv VT_FLUSH_PREFIX "" -genv VT_STF_PROCS_PER_FILE 2 -n 2 itcpin --verbose 3 --insert VT --run -- myProgram.exe

If so, my application fails after a couple of seconds with a segmentation violation (signal 11). If I run my application without itcpin, everything works fine.
During the very short run with itcpin, the Intel Collector either write 0 Byte files into the given directory, or sometime two 130MB dat-files.

Then, I tried to exchange the VT library by VTmc (using the --insert flag). Then, I get the following output:

   [..]

New thread #5, stack at 000000002484FAD8, flags 0x0.

New thread #5, stack at 000000002483FAD8, flags 0x0.

[1] ERROR: Signal 3 caught in ITC code section.

[1] ERROR: Either ITC is faulty or (more likely in a release version)

[1] ERROR: the application has corrupted ITC's internal data structures.

[1] ERROR: Giving up now...

   [..]

[0] WARNING: EXCEPTION_ACCESS_VIOLATION occurred

[0] ERROR: Signal 3 caught in ITC code section.

[0] ERROR: Either ITC is faulty or (more likely in a release version)

[0] ERROR: the application has corrupted ITC's internal data structures.

[0] ERROR: Giving up now...

  [..]
Now, I don't know what to do. I don't have access to the sources. Is there another way of binary instrumentation?
What does the error mean? Can I change ITAC parameters (envrionment variables) to get it to work? I've already tried to decrease the memory block sizes and number, but without any success...

Any help is appreciated.
Sandra

PS: Joe had a similar problem (http://software.intel.com/en-us/forums/showthread.php?t=101217&o=a&s=lr) some time ago. But in contrast to me, he had source access.

13 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi Sandra,

The command line looks correct. Might be you can remove '--verbose 3' but it should not affect itcpin or Trace Collector behavior.
The issue might be related to your application. It seems to me that your application is multithreaded and might be it leads to incorrect behavior.

Can you send me the binary? (you can answer as a private message and attach it). You can submit a tracker at premier.intel.com.

Regards!
---Dmitry

Dear Dmitry,

Yes, removing the verbose flag does not make any difference.

Unfortunately, I cannot send you the binary since we can only execute it with a valid (commercial) license.

Is there any other possibility to trace the MPI behaviour of the application under Windows?

We also have a Linux version of the binary. Under Linux, we could do the tracing just be specifying "mpiexec -trace [...]". Then, the application did not corrupt the ITC internal data structures.

But, we still need the Windows traces as well.

Concerning Multithreading, I am not totally sure. Under Linux, I see that the pthread library is linked to the application. However, the CPU performance never gets about 100% during the computation. When I run Intel Amplifier (Concurrency Analysis) on the first minute of the application, I see actually 2 threads in total. However, the thread 0 only seems to do anything at the beginning of the application, while thread 1 is waiting in the meanwhile. Afterwards only thread 1 is running/computing (as far as I can see).

Sandra

Hi Sandra,

Well, first of all, I don't understand the purpose of tracing if you don't have source files.
Might be light-weight MPI statistics will be enough for you? Just set I_MPI_STATS=N (where N=1...10) and you'll get statistical information of MPI functions.
If you need to know what exact functions was called at particular time then you need to trace the application. But it seems to me that MPI behavior under Windows should be absolutely the same as under Linux and you can investigate them.

Unfortunately ITC under Windows cannot exploit LD_PRELOAD feature and you need to use itcpin. And the error message:
"the application has corrupted ITC's internal data structures"
probably means that ITC's internal data structures were destoyed by itcpin but not by your application. But without your application I cannot reproduce the issue and suggest any solution.

As an alternative you can try to use Intel Amplifier XEin the following way:
mpiexec n1 amplxe-cl -result-dir hotspots -collect hotspots -MyProgram :-n1 MyProgram
OR
mpiexec n 1 amplxe-cl -result-dir LWhotspots -collectlightweight-hotspots
- MyProgram : -n 1 MyProgram
And analyze traces with Amplifier XE.

Regards!
---Dmitry

You can try to use VTfs library instead of VT but if the application crashes you will get the whole picture.

Regards!
---Dmitry

Hi Dmitry,
Let's say it is our project to verify Linux and Windows versions have the same MPI behavior.
I tried out I_MPI_STATS=1-10 and I got at least some times that are spent in the different MPI routines. Is there a way to see this MPI-Stats in a figure? I read about ipm format. However, I could not get it to work
a)if I use I_MPI_STATS=1-10, I get only the stats.txt with the corresponding information
b)if I use I_MPI_STATS=ipm, a file stats.ipm is created. However, it is empty after the simulation. Probably because the default behavior for summaries is level 0.
c)if I use I_MPI_STATS=1-10,ipm no ipm-file is created

Can you tell me how I can combine these two possibilities?

However, I don't get the dependencies between MPI calls (e.g. so that you cannotfigure out whether the used reduce/ broadcast algorihm is good).

I also tried the Amplifier way, however, I harly see any MPI functions calls in the results. There,I can found just 3 MPI routines whose amount of time does not matter for the application. The biggest portion here is "Unknown" and probably contains the relevant MPI calls.

I have also tried VTfs library, but as the application crashes really early in its application time, it does not make any sense to look at these results.

Sandra

Sandra,

Sorry, forgot that you are using iMPI 3.2.2. Intel MPI library supports statistics output in IPM format in version 4.0.3 and higher, so you cannot get it until you use iMPI 4.0.3.
It looks very strange that you got stats.ipm - could you send me MPI version? Justrun a simple program withI_MPI_DEBUG=5.

BTW: you can run your application with I_MPI_DEBUG=10 (or even 100). Running it with debug version (impid.dll) instead of impi.dll you can get even more information.

Knowing nothing about your application it's hardly possible to understand what's going wrong. Unfortunately!

Regards!
---Dmitry

Hi Sandra,

Just one note: running itcpin with VTmc library could you please try to set VT_PCTRACE=0. Something like:

impiexec.exe -genv VT_PCTRACE 0 -genv VT_DLL_DIR "%VT_DLL_DIR%" -genv VT_MPI_DLL "%VT_MPI_DLL%" -genv VT_LOGFILE_PREFIX "" -genv VT_FLUSH_PREFIX "" -genv VT_STF_PROCS_PER_FILE 2 -n 2 itcpin--insert VTmc --run -- myProgram.exe

Please let me know whether it works or not.

Regards!
---Dmitry

Hi Dmitry,
I tried using VTmc and PCTRACE=0. First I get warnings from each process (this time 8 processes):

[7] WARNING: LOCAL:MEMORY:OVERLAP: warning

[7] WARNING:    New receive buffer overlaps with currently active receive buffer at address 0000000027EC8640.

[7] WARNING:    Control over active buffer was transferred to MPI at:

[7] WARNING:       MPI_IRECV(*buf=0x0000000027ec8640, count=7098, datatype=MPI_INTEGER, source=3, tag=10, comm=0xffffffff84000002 CART_CREATE COMM_WORLD  [0:7], *request=0x00000000004

f6558, *ierr=0x00000000004f6574)

[7] WARNING:    Control over new buffer is about to be transferred to MPI at:

[7] WARNING:       MPI_IRECV(*buf=0x0000000027ec8640, count=7098, datatype=MPI_REAL, source=3, tag=10, comm=0xffffffff84000002 CART_CREATE COMM_WORLD  [0:7], *request=0x00000000004f65

28, *ierr=0x00000000004f6548)

[6] INFO: LOCAL:MEMORY:OVERLAP: reported 10 times, limit CHECK-SUPPRESSION-LIMIT reached => not reporting further occurrences

[7] INFO: LOCAL:MEMORY:OVERLAP: reported 10 times, limit CHECK-SUPPRESSION-LIMIT reached => not reporting further occurrences
And after having the warning for each process a couple of times, I get the message "reported 10 times [..]" (see last lines in code above).

After some seconds, I get:

[7] ERROR: GLOBAL:MSG:DATA_TRANSMISSION_CORRUPTED: error

[7] ERROR:    Data was corrupted during transmission.

[7] ERROR:    Data was sent by process [0] at:

[7] ERROR:       MPI_Isend(*buf=0x000000002ec946bc, count=17086, datatype=MPI_FLOAT, dest=7, tag=100, comm=MPI_COMM_WORLD, *request=0x00000001408e89c4)

[7] ERROR:    Receive request activated at:

[7] ERROR:       MPI_Irecv(*buf=0x00000000271e0040, count=17086, datatype=MPI_FLOAT, source=0, tag=100, comm=MPI_COMM_WORLD, *request=0x00000001408e87c0)

[7] ERROR:    Data was received by process [7] at:

[7] ERROR:       MPI_Waitany(count=1, *array_of_requests=0x00000001408e87c0, *index=0x00000000004f6384, *status=0x00000000004f6388)

[7] INFO: 1 error, limit CHECK-MAX-ERRORS reached => aborting

[0] WARNING: starting premature shutdown
[0] INFO: LOCAL:MEMORY:OVERLAP: found 2713 times (0 errors + 2713 warnings), 2633 reports were suppressed

[0] INFO: GLOBAL:MSG:DATA_TRANSMISSION_CORRUPTED: found 1 time (1 error + 0 warnings), 0 reports were suppressed

[0] INFO: Found 2714 problems (1 error + 2713 warnings), 2633 reports were suppressed.
ERROR: signal 3 (???) caught, stopping process


So,it didn't work :-(
Sandra

PS: I will run with high DEBUG level next time.

Okay, I also tried MPI_DEBUG=5 and MPI_STATS=ipm:

[0] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer
[..]
[0] Init(): I_MPI_DEBUG=5

[0] Init(): I_MPI_STATS=ipm

I get the syntax error message for each process. Nevertheless, an empty ipm-file is created
>dir

[..]

07/13/2012  02:54 PM                 0 stats.ipm

[..]

Hi Sandra,

Well, Message Checker works and it reports about corrupted buffers. It just compares 2 buffers: before send and after receive. They are not the same. The reason is unclear and having no source files it's hardly possible to indentify the failure.

Could you please run your application with I_MPI_DEBUG=9 and send me Intel MPI library version you see.
It's very strange that you see stats.ipm file.

Thanks!
---Dmitry

[0] MPI startup(): Intel MPI Library, Version 3.2.2  Build 20090827

[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation.  All rights reserved.

[0] MPI Startup(): process is pinned to CPU00 on node XXX

[0] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer

[0] Rank    Pid      Node name                           Pin cpu

[0] 0       7756     XXXX  0

[0] Init(): I_MPI_ADJUST_BCAST=0

[0] Init(): I_MPI_ADJUST_REDUCE=0

[0] Init(): I_MPI_DEBUG=9

[0] Init(): I_MPI_STATS=ipm

[0] Init(): NUMBER_OF_PROCESSORS=8

[0] Init(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 6, GenuineIntel

[0] MPI startup(): shared memory data transfer mode

[0] MPI startup(): Intel MPI Library, Version 3.2.2  Build 20090827

[0] MPI startup(): Copyright (C) 2003-2009 Intel Corporation.  All rights reserved.

[1] MPI startup(): shared memory data transfer mode

[3] MPI startup(): shared memory data transfer mode

[2] MPI startup(): shared memory data transfer mode

[1] MPI Startup(): process is pinned to CPU02 on node XXX

[2] MPI Startup(): process is pinned to CPU04 on node XXX

[3] MPI Startup(): process is pinned to CPU06 on node XXX

[1] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer

[2] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer

[0] MPI Startup(): process is pinned to CPU00 on node XXX

[3] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer

[0] MPI Startup(): syntax error in I_MPI_STATS=ipm  , allowed value should be non-negative integer

[0] Rank    Pid      Node name  Pin cpu

[0] 0       12116    XXX              0

[0] 1       8432      XXX              2

[0] 2       12420    XXX              4

[0] 3       11160    XXX              6

[0] Init(): I_MPI_ADJUST_BCAST=0

[0] Init(): I_MPI_ADJUST_REDUCE=0

[0] Init(): I_MPI_DEBUG=9

[0] Init(): I_MPI_STATS=ipm

[0] Init(): NUMBER_OF_PROCESSORS=8

[0] Init(): PROCESSOR_IDENTIFIER=Intel64 Family 6 Model 23 Stepping 6, GenuineIntel 
This is the output for MPI-Debug-Level 9. So Intel MPI Lib 3.2.2

Hi Sandra,

Your application was built with Intel MPI 3.2.2 and you are trying to use Intel Trace Analyzer and Collector 8.0 Update 3 which was compiled with Intel MPI 4.0 update 3. Might be this mix of libraries causes the error you see. Itcpin also depends on the MPI version.
Can you try Intel Trace Analyzer and Collector version 7.2.2?

Version 3.2.2 of the Intel MPI Library doesn't support I_MPI_STATS=ipm and you can the error message.

Regards!
---Dmitry

Hi Dmitry,
after a long time, I finally could try out your suggestion. And indeed, that was it! Now I can trace my application with ITAC 7.2.2 and MPI 3.2.2. Thank you very much. Sandra

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui