Intel Adaptive Spike-Based Solver questions

Intel Adaptive Spike-Based Solver questions

I have a couple questions about Intel Adaptive Spike-Based Solver.

1) In the hello_World.f90 example posted in the users' manual, where one needs to set 'nbprocs' and 'rank'? In the source code or as environment variables? Is not 'mpirun -np ./hello_world.exe sufficient for not setting the variables manually?

2) Do I need to redirect the hello_world.exe to read the input matrix?

3) My hello_world.f90 compiles but fails in the execution with a segfault indicating that 'Unable parse input as legal command or C expression'.

I appreciate your suggestions.

15 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Please find the answers of your questions below:

1- In the hello_world example, the global variables 'nb_procs' and 'rank' are set up automatically by the calls to initialize mpi. These values have to be entered in the "pspike" data_structure using the parameter fields "%nbprocs" and "%rank" (instruction line: "pspike%nbprocs=nb_procs; pspike%rank=rank"). It is true that we could have set the values pspike%nbprocs and pspike%rank automatically (e.g. inside SPIKE_DEFAULT()); this modification will take place in future versions and we would like to thank you for your comment.

2- In the hello_world driver example, the matrix is created inside the code. We are not sure what do you mean by redirect the driver and which input matrix you are referring to?

3- Could you provide us more detail about the problem you encountered? It would be great if you could tell us:
The version of the Fortran compiler and MPI library you used.
The commands you used to compile and run the program.
The output of your failed run.

1. Thank you for the iformation.

2. Please ignore my question about the input matrix.

3. Provided below are compiler, compilation, and output from the failed run on an Altix4700:

compiler version: fc/10.1.015

mpi library: file /usr/lib/libmpi.so
/usr/lib/libmpi.so: ELF 64-bit LSB shared object, IA-64 (Intel 64 bit architecture), version 1 (SYSV), not stripped

Compilation method:

ifort -o hello hello_world.f90 $SPIKE_INC $SPIKE_LIB -lmpi

where

$SPIKE_INC is: spike-1.0/include and

$SPIKE_LIB is: spike-1.0/lib/64 -lspike -lspike_mpi_comm -lspike_adapt -lspike_adapt_de -lspike_adapt_grid_f -lmkl_solver -lmkl_lapack -lmkl -lguide -lpthread

Failed output (relevant information is shown only):

.....

setenv MPI_DSM_DISTRIBUTE 1
setenv OMP_NUM_THREADS 1

.........

mpirun -np 4 ./hello
MPI: On host ..., Program spike/examples/hello, Rank 3, Process 28805 received signal SIGSEGV(11)

MPI: --------stack traceback-------
line: 2 Unable to parse input as legal command or C expression.
The "backtrace" command has failed because there is no running program.
MPI: Intel Debugger for applications running on IA-64, Version 10.1-35 , Build 20080310

MPI: -----stack traceback ends-----
MPI: On host ..., Program spike/examples/hello, Rank 3, Process 28805: Dumping core on signal SIGSEGV(11) into directory spike/examples
MPI: MPI_COMM_WORLD rank 3 has terminated without calling MPI_Finalize()
MPI: aborting job
MPI: Received signal 11

Please let me know if further information is needed.

From the information you provided, it seems to me that you are using default SGI's MPI implementation rather than Intel MPI. In current release, we only support 4 different MPI implementations (Intel MPI, MPICH1, MPICH2 and Open MPI). Could you try linking your program with Intel MPI?

Also, I notice you did not use -I and -L to specify the header and library directories. I would like to suggest you try the following steps:

  1. Set $SPIKE_INC to "-Ispike-1.0/include"
  2. Set $SPIKE_LIB to "-Lspike-1.0/lib/64 -lspike -lspike_mpi_comm -lspike_adapt
    -lspike_adapt_de -lspike_adapt_grid_f -lmkl_solver -lmkl_lapack -lmkl
    -lguide -lpthread"
  3. Compile the program using the command "ifort -o hello hello_world.f90 $SPIKE_INC $SPIKE_LIB -L -lmpi"

where is the directory where the Intel MPI Library files are located?

Let us know if you still have problem compiling and running your program.

If you don't have access to the Intel MPI Library, you can create your own MPI compatibility library. This is described in Appendix C of the Intel Adaptive Spike-Based Solver User Guide.

Problem has not yet been resolved with the correct links to spike libraries and-lmpi from the SGI default libray. WhatIntel trial mpi library would you recommend me to download and install instead ofmpi from SGI default? Your recommendation of building Intel MPI library, as suggested in Appendix C ofSpike User Guide is not equivalent tohaving an MPI library and cannot solvethe run timeerror on an Altix system.

I built Spike package on another system,an Intel Xeon. In this system,compilation fails with the following error:

mpif90 -o hello hello_world.f90 -I/SPIKE/1.0/include -L/SPIKE/1.0/lib/64 -lspike -lspike_mpi_comm -lspike_adapt -lspike_adapt_de -lspike_adapt_grid_f -lmkl_solver -lmkl_lapack -lmkl -lguide -lpthread
hello_world.f90(40): (col. 4) remark: LOOP WAS VECTORIZED.
ld: skipping incompatible /SPIKE/1.0/lib/64/libspike.a when searching for -lspike
ld: cannot find -lspike

mpif90 is a wrapper for ifort and ifort is Intel 64, Version 10.1. And -lspike exists in the directory.

Since source is not provided in Spike package, I cannot not re-build the Spike libraries. What would be your suggestion for solving the ld error on Intel Xeon system?

We tested the solver using Intel MPI Libraryversion 3.1, so it is recommended that you try using that library.

Regarding building package on another system, I notice that you have the following -I and -L options : "-I/SPIKE/1.0/include -L/SPIKE/1.0/lib/64". Did you install the solver at /SPIKE? If yes, could you try using "-L/SPIKE/1.0/lib/em64t" instead of "-L/SPIKE/1.0/lib/64"?

Compilation problem on our Intel Xeon is resolved using Spike libraries form lib/em64t instead of lib/64.

However, hello_world.f90 crashes both on our Altix and Intel Xeon systems, even on 1 processor.

We use MPI from SGI default on our Altix and mvapich2 on Intel Xeon and unable to obtainand install Intel MPI Libraryversion 3.1 at this time. Is there still a chance to make the Spike package to work on our Altix system?

Provided below are some debugging information:

On Altix:

............................................

Core was generated by `./hello'.
Program terminated with signal 11, Segmentation fault.
#0 0x2000000000c21bc0 in MPI_SGI_barrier () from /usr/lib/libmpi.so
(gdb) bt
#0 0x2000000000c21bc0 in MPI_SGI_barrier () from /usr/lib/libmpi.so
#1 0x2000000000c22270 in PMPI_Barrier () from /usr/lib/libmpi.so
#2 0x2000000000c22300 in pmpi_barrier__ () from /usr/lib/libmpi.so
#3 0x40000000001e72a0 in spike_barrier_ ()
#4 0x4000000000023370 in spike_algo_mp_spike_begin_ ()
#5 0x4000000000022f80 in spike_algo_mp_spike_ ()
#6 0x400000000000a140 in spike_ ()
#7 0x4000000000008ca0 in MAIN__ ()
#8 0x4000000000008950 in main ()
(gdb)

On Intel Xeon:

compilation:

$ mpif90 -o hello hello_world.f90 -I/SPIKE/1.0/include -L/SPIKE/1.0/lib/em64t -lspike -lspike_mpi_comm -lspike_adapt -lspike_adapt_de -lspike_adapt_grid_f -lmkl_solver -lmkl_lapack -lmkl -lguide -lpthread
hello_world.f90(40): (col. 4) remark: LOOP WAS VECTORIZED.

Execution (with some manual print out):
$ mpirun -np 1 ./hello

after MPI_COMM_RANK
before call to SPIKE_DEFAULT
before mat%format
iside the rank0 if statement
before the call to SPIKE, rank and info are 0 0
UNSUCCESSFUL RUN FOR SPIKE - INFO EXIT -1
SPIKE_CORE ERROR CODE -320
after the call to SPIKE, rank and info are 0 -1
before the last if statement, my rank and info are: 0 -1

>However, hello_world.f90 crashes both on our Altix and Intel Xeon systems, even on 1 processor.

For the Xeon system, it looks like the code is running fine and you are just
getting a documented error code for Spike_Core. In the documentation (page 53) one can find the following:
"Spike_Adapt cannot be selected if only one processor". The hello world example should then give you output results on more than one processor. For the one processor case, you will need first to change the flag "pspike%autoadapt" to "false" (different from its default value) as it has been done in all the other source code examples in the documentation. However, without Spike_Adapt on, it is up to the user to select its on %RSS and %DFS options if he wants them to be different from their default values (section 2.4 and Table 2.4). Spike_adpat will be upgraded to enable the one processor case in future release.

It's great that you have resolved the compilation problem on Intel Xeon. Regarding Altix systems, could you follow the suggestion above to build your own spike_mpi_comm.a file and link that file to your program instead of the one we shipped with the package? The instruction is described in Appendix C of the User's Guide.

Please let us know if you have problem doing that.

I followed the instruction in Appendix C for building the spike_mpi_comm.a on Altix. But the execution failed with thesegfault as before. Gdb trace is provided belowFYI:

Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "ia64-suse-linux"...
Using host libthread_db library "/lib/libthread_db.so.1".

warning: Can't read pathname for load map: Input/output error.

warning: .dynamic section for "/lib/libc.so.6.1" is not at the expected address (wrong library or version mismatch?)
Reading symbols from /usr/lib/libmpi.so...done.
Loaded symbols for /usr/lib/libmpi.so
Reading symbols from /SPIKE/lib/64/libspike_adapt_de.so...done.
Loaded symbols for /SPIKE/lib/64/libspike_adapt_de.so
Reading symbols from /SPIKE/lib/64/libmkl_intel_lp64.so...done.
Loaded symbols for /SPIKE/lib/64/libmkl_intel_lp64.so
Reading symbols from /SPIKE/lib/64/libmkl_intel_thread.so...done.
Loaded symbols for /SPIKE/lib/64/libmkl_intel_thread.so
Reading symbols from /SPIKE/lib/64/libmkl_core.so...done.
Loaded symbols for /SPIKE/lib/64/libmkl_core.so
Reading symbols from /opt/intel/fc/10.1.015/lib/libguide.so...done.
Loaded symbols for /opt/intel/fc/10.1.015/lib/libguide.so
Reading symbols from /lib/libpthread.so.0...done.
Loaded symbols for /lib/libpthread.so.0
Reading symbols from /opt/intel/fc/10.1.015/lib/libimf.so.6...done.
Loaded symbols for /opt/intel/fc/10.1.015/lib/libimf.so.6
Reading symbols from /lib/libm.so.6.1...done.
Loaded symbols for /lib/libm.so.6.1
Reading symbols from /lib/libc.so.6.1...done.
Loaded symbols for /lib/libc.so.6.1
Reading symbols from /lib/ld-linux-ia64.so.2...done.
Loaded symbols for /lib/ld-linux-ia64.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libunwind.so.7...done.
Loaded symbols for /lib/libunwind.so.7
Reading symbols from /lib/librt.so.1...done.
Loaded symbols for /lib/librt.so.1
Reading symbols from /usr/lib/libbitmask.so...done.
Loaded symbols for /usr/lib/libbitmask.so
Reading symbols from /usr/lib/libcpuset.so...done.
Loaded symbols for /usr/lib/libcpuset.so
Reading symbols from /usr/lib/libxpmem.so...done.
Loaded symbols for /usr/lib/libxpmem.so
Core was generated by `./hello'.
Program terminated with signal 11, Segmentation fault.
#0 0x20000000000e1bc0 in MPI_SGI_barrier () from /usr/lib/libmpi.so

#0 0x20000000000e1bc0 in MPI_SGI_barrier () from /usr/lib/libmpi.so
#1 0x20000000000e2270 in PMPI_Barrier () from /usr/lib/libmpi.so
#2 0x20000000000e2300 in pmpi_barrier__ () from /usr/lib/libmpi.so
#3 0x40000000001e72a0 in spike_barrier_ ()
#4 0x4000000000023370 in spike_algo_mp_spike_begin_ ()
#5 0x4000000000022f80 in spike_algo_mp_spike_ ()
#6 0x400000000000a140 in spike_ ()
#7 0x4000000000008cb0 in MAIN__ ()
#8 0x4000000000008950 in main ()

We were able to reproduce the error and found out the reason, thank you for letting us know about it.

It is due to some restrictions on sgi-altix, namely you can only use ifort with -lmpi (i.e. nompiifort ).

Here isa quick solution:

1) after following the instruction in appendix C,there is a filecreated "spike_mpi_comm_data.mod", please copy itto the directory where you compileand link the hello world program.

2) add the following tothe hello world program: use spike_mpi_comm_data

3) add the following lines right before "call SPIKE ..." :

SPIKE_STATUS_SIZE = MPI_STATUS_SIZE
SPIKE_DOUBLE_PRECISION = MPI_DOUBLE_PRECISION
SPIKE_INTEGER = MPI_INTEGER
SPIKE_CHARACTER = MPI_CHARACTER
SPIKE_LOGICAL = MPI_LOGICAL
SPIKE_COMM_WORLD = MPI_COMM_WORLD
SPIKE_MIN = MPI_MIN
SPIKE_MAX = MPI_MAX
SPIKE_SUM = MPI_SUM
SPIKE_MPI_SUCCESS = MPI_SUCCESS
SPIKE_COMM_WORLD = MPI_COMM_WORLD

4) recompile and link; and run ( $mpirun -np 2 .... )

please let us know if this solves the problem.

Best regards,

Murat

PS. We will provide a more elegant solution for sgi-altix in the next release of SpikePACK.

Applying your suggested patches, I succeeded running the hello_world.f90 and other Fortran examples. Now the C examples fail to run with the same error on Altix. How do I need to adjust the Fortran patches (use, mod, ...) for the C examples?

Compilationmethod FYI:

mpicc -o example1 example1.chelp.c -I/SPIKE/include -L/SPIKE/lib/64 -lspike -lspike_mpi_comm -lspike_adapt -lspike_adapt_de -lspike_adapt_grid_f -lmkl_solver -lmkl_lapack -lmkl -lguide -lpthread

Thanks for reporting the C example problem on Altix. Since the quick fix involves Fortran statements, it is not directly applicable to the C examples. Please note that in the next release of the Intel Adaptive Spike-Based Solver (to be released soon), the C examples failures will be fixed as well.

The Intel Adaptive Spike-Based Solver 1.0 Release 2 is available for download now. Your problem should be fixed by using this release. Could you try it and let us know if you still have the problem?

Thanks.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen