This is the way I typically submit batch jobs:
qsub -l select=1:ncpus=40 rl-myjob
Ever since the Memorial Day weekend maintenance, jobs submitted this way have been running about 5 times slower than than they do on the login node. I traced the problem to ncpus values greater than 32. For example, on a small test that uses 64 threads and normally runs in under 30 seconds:
qsub -l select=1:ncpus=32 rl-myjob
# Finishes in about 24 seconds
qsub -l select=1:ncpus=33 rl-myjob
# Takes over 120 seconds
Where rl-myjob looks something like this:
#!/bin/sh #PBS -N myjob #PBS -j oe #PBS -l walltime=0:15:00 export OMP_NUM_THREADS=40 cd ~/threading ./myprogram 64
For a clue about what might be going wrong see Mike Pearce's March 15 announcement of the upgrade to 40 cores:
Quoting Mike Pearce (Intel)
On the Linux side, we have added a 40-core batch node to our existing cluster. To run jobs on this node, you should include the following arguments to your qsub command:
qsub -l select=1:ncpus=
Replace xx with the number of CPUs that you want to test with, if greater than 32, then your job will be scheduled on the new 40-core batch node (acano04).
Note: all batch nodes are currently configured with Hyperthreading off.
So the problem could be specific to acano04. Choose 32 CPUs or fewer and you get a different batch node.
Also, has Hyperthreading really been off from the beginning? The MTL was advertised as a "40-core (80-thread) development environment". The C function sysconf(_SC_NPROCESSORS_CONF) now returns 40 on acano04. If I recall correctly, it returned 80 last month. No problem either way. I just want to choose an appropriate number of threads for the configuration.