problems with Intel Trace Collector

problems with Intel Trace Collector

Аватар пользователя paolobur@libero.it


Hi all,
I've installed Intel Trace Collector 6 on Red Hat Linux and I useIntel Fortran 9 andMPICH 2.
After compiling the sample program in intel/mpi/1.0.2/test with

mpif90 test.f90 -c

and linking with

mpif90 test.o -L{VT_ROOT}/lib -lVT -ldwarf -lelf -lnsl -lm -lpthread -o ftest

I get the follwing error message

aborting job:
Fatal error in MPI_Comm_dup: Invalid communicator, error stack:
MPI_Comm_dup(171): MPI_Comm_dup(comm=0x5b, new_comm=0xbfffc250) failed
MPI_Comm_dup(93): Invalid communicator
rank 0 in job 31 {host_name}_33927 caused collective abort of all ranks
exit status of rank 0: return code 13

The program run Ok if I don't use ITC.


If I use Intel MPI 1, then there is no error message,
but there is no output from the program either.

Any help is very appreciated.
Paolo

5 сообщений / 0 новое
Последнее сообщение
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.
Аватар пользователя Clay Breshears (Intel)

Paolo -


Are you running on an Itanium 2 system? If so, you need to add a "-lvtunwind" flag after the "-lVT" in the linking step. This is noted on page 7 (Chapter 3) of the User's Guide for the 5.0 version.


Otherwise, can you run the application by itself, without the Trace Collector, using MPICH 2 (or Intel MPI)? That is, is the problem only when you try to run using Trace Collector or is there something going wrong at a more basic level with MPI on your system?


--clay
Аватар пользователя paolobur@libero.it


Clay,
I'm using IA-32(16 Dual-xeon processor cluster).
No, I don't think the problem is with the program.
This is the 'hello world' example and it runs with
either MPICH2 and IMPI1, without ITC.
I also tried other more complex cases, and they fail
only when I use ITC.
I guess my problem
must be with the installation of ITC 6, although
I followed the instructution in the user guide.
Is thereanychecks I can run to test the
installation?
Should I try v.5?

Thank you
Paolo

Аватар пользователя Clay Breshears (Intel)


Paolo -


I agree, there seems to be some problem with the installation of ITC. Are the libraries visible (or loaded) on each node of the cluster? If you run 'ldd' on the binary, where will the application be looking for the shared library objects?


Can you create a statically linked version of the app and run this on the cluster nodes? Will the program run if you restrict the processes to the node that you installed ITC on?


If none of the above works and you have the libraries available on the cluster nodes, you should report the error to the Intel Premier Support site.


--clay


Аватар пользователя paolobur@libero.it

Clay
I've installed ITC on a NFS. If I do 'which VTserver' on the master or any of the nodes, Iget ~/libraries/itc/bin/VTserver.
If I run ldd on the app I have
libnsl.so.1 => /lib/libnsl.so.1 (0xb75c9000)
libimf.so => /home/paolo/intel/fc/9.0/lib/libimf.so (0xb73ed000)
libm.so.6 => /lib/tls/libm.so.6 (0xb73cb000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb73bb000)
librt.so.1 => /lib/tls/librt.so.1 (0xb73a7000)
libc.so.6 => /lib/tls/libc.so.6 (0xb7270000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb75eb000)


I've statically linked version of the app but does not run on the nodes.


Paolo

Зарегистрируйтесь, чтобы оставить комментарий.