Problem running jobs on batch node

Problem running jobs on batch node

Hello all,

I've been trying to get a simple batch job to run and produce results, and I'm not having much luck.
My job script (a file called "myjob") is as follows:

#PBS -l walltime=0:00:30
#PBS -l mem=100mb,ncpus=1

The problem is, when I submit it using "qsub myjob", no output file is produced. If I run "tracejob xxx.acaad0x", I see the following.

05/18/2010 11:13:59 L Considering job to run
05/18/2010 11:13:59 S enqueuing into workq, state 1 hop 1
05/18/2010 11:13:59 S Job Queued at request of yufvb@acano01, owner =
yufvb@acano01, job name = myjob, queue = workq
05/18/2010 11:13:59 S Job Run at request of Scheduler@acaad01 on exec_vnode
05/18/2010 11:14:00 L Job run
05/18/2010 11:14:00 S Obit received momhop:1 serverhop:1 state:4 substate:41
05/18/2010 11:14:00 S Request invalid for state of job, state=5
05/18/2010 11:17:08 S Post job file processing error
05/18/2010 11:17:08 S Exit_status=0 resources_used.cpupercent=0
resources_used.cput=00:00:00 resources_used.mem=1152kb
resources_used.ncpus=1 resources_used.vmem=10792kb
05/18/2010 11:17:08 S dequeuing from workq, state 5

The line that concerns me is the one that says "Post job file processing error". It looks to me like the job runs to completion (exit_status=0), but there is a problem writing the result to the working directory. (I've been operating under the assumption that the working directory is the directory from which I executed the qsub command.)

Any thoughts as to what I'm missing?
Thank you in advance.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I am having the same problem. From what I have read in the PBS Pro User's Guide, my guess is that it is not able to rcp/scp the output file from the staging directory (on the execute host) to the working directory.

Since the home directories are shared across the machines, I wonder if I can have it simply copy it locally rather than trying to scp it...

Thanks for pointing out the problem, and as you surmised it had to do with the post job processing ofany results file.

I believe it has been fixed now - you were not the only one that had this problem (it was system wide), but you were the only one to report it - thanks for being so proactive. And sorry for the inconvenience.

Leave a Comment

Please sign in to add a comment. Not a member? Join today