mpitune critical errors

mpitune critical errors

Imagen de Bryan C.

I'm attempting to use mpitune to get optimized IMPI environment settings for an application. 

I ran it with the following command:

mpitune -d -hf $nodelist -od $pwd -avd min -pm hydra -a \"mpirun -ppn 16 -np 1001 ./myapplication\"

During the tuning, I got the following critical errors:

27'Dec'12 16:00:24     | MTWTN_0 : Starting range cycle...
27'Dec'12 16:00:24     | MTWTN_0 : Starting iteration cycle...
27'Dec'12 16:07:32     | MTWTN_0 : Complete.
27'Dec'12 16:07:32     | MTWTN_0 : No one iteration was completed successfully.
27'Dec'12 16:07:32     | MTWTN_0 : Ranges' loop finished.
27'Dec'12 16:07:32     | MTWTN_0 : List of threshold times is:
{
}
27'Dec'12 16:07:32 ERR | Error in thread 'MTWTN_0': list index out of range
27'Dec'12 16:07:32 CER | A critical error has occurred!
--
27'Dec'12 16:07:32     | MTWTN_1 : Starting range cycle...
27'Dec'12 16:07:32     | MTWTN_1 : Starting iteration cycle...
27'Dec'12 16:13:59     | MTWTN_1 : Complete.
27'Dec'12 16:13:59     | MTWTN_1 : No one iteration was completed successfully.
27'Dec'12 16:13:59     | MTWTN_1 : Ranges' loop finished.
27'Dec'12 16:13:59     | MTWTN_1 : List of threshold times is:
{
}
27'Dec'12 16:13:59 ERR | Error in thread 'MTWTN_1': list index out of range
27'Dec'12 16:13:59 CER | A critical error has occurred!
--
27'Dec'12 16:14:00     | MTWTN_2 : Starting range cycle...
27'Dec'12 16:14:00     | MTWTN_2 : Starting iteration cycle...
27'Dec'12 16:20:26     | MTWTN_2 : Complete.
27'Dec'12 16:20:26     | MTWTN_2 : No one iteration was completed successfully.
27'Dec'12 16:20:26     | MTWTN_2 : Ranges' loop finished.
27'Dec'12 16:20:26     | MTWTN_2 : List of threshold times is:
{
}
27'Dec'12 16:20:26 ERR | Error in thread 'MTWTN_2': list index out of range
27'Dec'12 16:20:26 CER | A critical error has occurred!

27'Dec'12 17:11:34     | Attention! No results have been obtained during current tuning process. It may be caused by:
- Tuning process has not been completed at all due to one of follow reasons:
        * Time limitations
        * Critical errors in process
        * Abort of the process by user
        * Other

Does anyone know what these errors refer to or a way to fix them?  I was unable to get any usable output from mpitune due to them I believe.

Thanks!

publicaciones de 8 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.
Imagen de Bryan C.

I also see these errors in the log:

ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in
File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha512

There is no /p/pdsd/Intel_MPI... directory structure on this system though..

Imagen de Gergana Slavova (Intel)

Hey Bryan,

Unfortunately, I haven't seen these errors before so I'll try to reproduce this from my side. First off, what version of the Intel MPI Library are you using? I would recommend upgrading to the latest Intel MPI 4.1 (you can grab it from the Intel Registration Center).

I see you're using the application-specific tuning. Does this also happen when you do a cluster-specific tuning? That'll help to narrow it down if it's an issue with the mpitune script itself or with how mpitune calls your application. To do cluster-only tuning, just omit everything after and including the -a flag ("mpitune -d -hf $nodelist -od $pwd -avd min -pm hydra"). Let me know how that goes.

Also, what does your $nodelist file look like?

Looking forward to hearing back soon.

Regards,
~Gergana

Gergana Slavova
Technical Consulting Engineer
Intel® Cluster Tools
E-mail: gergana.s.slavova_at_intel.com
Imagen de Bryan C.

Hi Gergana,

Thanks for the quick response!

I am using the most recently MPI libraries, IMPI 4.1.

Our batch system is PBSPro, so the $nodefile is a list of nodes with a node entry per task like below:

node1
node1
...
node2
node2
...
etc

I have a job in the queue running in cluster mode. I'll post the results of it once it's complete.

- Bryan

Imagen de Bryan C.

I grepped for CER in the mpituner logfile and don't see any critical errors from the cluster mode run.

However, the .conf files that mpitune wrote out (mpiexec_shm-dapl_nn_32_np_1024_ppn_32.conf and mmpiexec_shm-dapl_nn_32_np_1024_ppn_32.conf) are both 0 bytes. Not sure if that's what is supposed to happen or not.

Imagen de Bryan C.

Gergana,

Have you had a chance to look into this any further?

Thanks!

Imagen de Ariel B.

Hello,

I seem to be facing a similar problem. I have tried using mpitune for the first time, and I have the same errors show up:

ERROR:root:code for hash md5 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type md5
ERROR:root:code for hash sha1 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha1
ERROR:root:code for hash sha224 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha224
ERROR:root:code for hash sha256 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha256
ERROR:root:code for hash sha384 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha384
ERROR:root:code for hash sha512 was not found.
Traceback (most recent call last):
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 139, in <module>
  File "/p/pdsd/Intel_MPI/Software/Python/python-2.7.2-linux-intel64-rhel5.7/lib/python2.7/hashlib.py", line 91, in __get_builtin_constructor
ValueError: unsupported hash type sha512
TERM environment variable not set.

Here is the mpitune log, notice that I ran it similar to what was suggested here

27'Nov'13 11:01:34     | MPITune started at  27 November'13 (Wednesday) 09:01:34
27'Nov'13 11:01:34     | MPITune has been started by: ariel
27'Nov'13 11:01:34     | Preparing tuner's components...
27'Nov'13 11:01:34 DBG | Session's ID is 1385542893
27'Nov'13 11:01:34 DBG | MPITuner has been executed by follow command: ' /usr/local/intel/impi/4.1.1.036/bin64/tune/mpitune -d -hf -od odr -avd min -pm hydra'
27'Nov'13 11:01:34     | Initialization of signals handlers...
27'Nov'13 11:01:34     | Start catching signal with code 15 (SIGTERM) ...
27'Nov'13 11:01:34     | Success.
27'Nov'13 11:01:34     | Start catching signal with code 2 (SIGINT) ...
27'Nov'13 11:01:34     | Success.
27'Nov'13 11:01:34     | Initialization of signals handlers completed.
27'Nov'13 11:01:34 DBG | Extracted tuner's executable part of run line: '/usr/local/intel/impi/4.1.1.036/bin64/tune/mpitune'
27'Nov'13 11:01:34 DBG | Parsed command line arguments' dictionary:
{
        'avd' : 'min'
        'hf' : ''
        'pm' : 'hydra'
}
27'Nov'13 11:01:34 DBG | Initialization of configurator object...
27'Nov'13 11:01:34 WRN | Invalid default value ('<redacted>/config.xml') of argument ('config-file').
27'Nov'13 11:01:34 CER | Invalid default value ('<redacted>/options.xml') of argument ('options-file').
27'Nov'13 11:01:34 CER | A critical error has occurred!
Details:
--------------------------------------------------------------------------------
Type  : <type 'exceptions.Exception'>
Value : Invalid default value ('<redacted>/options.xml') of argument ('options-file').
--------------------------------------------------------------------------------
27'Nov'13 11:01:34     | Time of work automatic tuning utility is 0h:0m:0s:19ms
27'Nov'13 11:01:34 CER | Error while terminating child processes. Description: 'NoneType' object has no attribute 'DestroyAllChildProcesses'
27'Nov'13 11:01:34 INF | Safe application's termination completed.
27'Nov'13 11:01:34 DBG | Deleting temp files...
27'Nov'13 11:01:34 DBG | Deleting temp files completed.
27'Nov'13 11:01:34     | Time of work automatic tuning utility is 0h:0m:0s:20ms

Seems like the error is related to creating the default config.xml and options.xml?

 

Imagen de Michael M.

I get exactly the same string of Python-related error messages as Brian C. (Quote #2, posted Fri, 12/28/2012 - 06:28) when I launch mpitune in application-specific mode.

Additionally, and perhaps related to these error messages, mpitune seems to refuse to launch the application on more than one node, irrespective of the hosts file and command line options that were specified to run on multiple nodes.

Have any fixes been identified at this point?

Inicie sesión para dejar un comentario.