File locking error on Polyserve file system ADIOI_Set_lock

File locking error on Polyserve file system ADIOI_Set_lock

Hi,

I'm running an MPI application on a Polyserve file system and I'm getting the following error;

File locking failed in ADIOI_Set_lock(fd 11,cmd F_SETLK/6,type F_WRLCK/1,whence 0) with return value FFFFFFFF and errno 25.
If the file system is NFS, you need to use NFS version 3, ensure that the lockd daemon is running on all the machines, and mount the directory with the 'noac' option (no attribute caching).
ADIOI_Set_lock:: No locks available
ADIOI_Set_lock:offset 640, length 160
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 2

This error happens when I try to write to a file with;

CALL MPI_FILE_WRITE(ifileHandle,str_buf,160,MPI_CHARACTER,istatus,ierrmpi)

I'm using Intel MPI 4.0.0.025. I also noticed that I had to make some modifications to file OPEN command since the file system wouldn't let me open files with 'shared' permission. Any help would be much appreciated.

Thanks,

GK

11 posts / novo 0
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.

Hi Gandharv,

Does this error occur with the current version of the Intel® MPI Library?  What version of NFS are you using?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Hi James,

The NFS version is nfs-utils-1.0.7-36.29. It happens with the latest Intel MPI 4.1 as well.

- GK

Hi Gandharv,

Please use nfsstat to determine the version of NFS that is actually in use, as different NFS versions can be in use with a single installation.

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

'nfsstat' doesn't work on this machine. Any other command I can try?

- GK

I did '/usr/sbin/nfsstat -m' and here is the output I get;

/app from polyserv:/app
Flags: rw,v3,rsize=32768,wsize=32768,hard,intr,lock,proto=tcp,sec=sys,addr=polyserv

/users from polyserv:/users
Flags: rw,v3,rsize=32768,wsize=32768,hard,intr,lock,proto=tcp,sec=sys,addr=polyserv

/project from polyserv:/project
Flags: rw,v3,rsize=32768,wsize=32768,hard,intr,lock,proto=tcp,sec=sys,addr=polyserv

/scratch from polyserv:/scratch
Flags: rw,v3,rsize=32768,wsize=32768,hard,intr,lock,proto=tcp,sec=sys,addr=polyserv

So, it is v3 then?

- GK

Deleted!

Hi Gandharv,

It appears that the NFS settings are correct.  We have no known issues with Polyserve, but we have also not tested on it.  Let me check with our developers and see if they have any additional information.  Do you have a small reproducer program?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Hi James,

Here is a small test program that gives the same error as described in the first post on a Polyserv file system;

program main
implicit none
include 'mpif.h'

integer i, size, rank, namelen, ierr, istatus
integer ifileHandle ! returned by MPI_FILE_OPEN
character*(MPI_MAX_PROCESSOR_NAME) name
character*160 filename
integer stat(MPI_STATUS_SIZE)

call MPI_INIT (ierr)

call MPI_COMM_SIZE (MPI_COMM_WORLD, size, ierr)
call MPI_COMM_RANK (MPI_COMM_WORLD, rank, ierr)
call MPI_GET_PROCESSOR_NAME (name, namelen, ierr)

filename='partition.inp'
c Open file
call MPI_FILE_OPEN(MPI_COMM_WORLD, filename,
& MPI_MODE_CREATE+MPI_MODE_WRONLY,
& MPI_INFO_NULL, ifileHandle, ierr)
c Write node names to file
call MPI_FILE_WRITE(ifileHandle,name,MPI_MAX_PROCESSOR_NAME,
& MPI_CHARACTER,istatus,ierr)
c Close file
CALL MPI_FILE_CLOSE(ifileHandle,ierr)

call MPI_FINALIZE (ierr)

end

Please let me know if you find out anything about the error.

Thanks,

Gandharv

Hi James,

Any luck with the test program? Were you able to reproduce the error or it is something specific to my cluster?

- GK

Hi Gandharv,

I am unable to reproduce the problem here.  Can you produce the error on a different filesystem?  What happens if you use the latest version of the Intel® MPI Library, 4.1 Update 1?

Sincerely,
James Tullos
Technical Consulting Engineer
Intel® Cluster Tools

Faça login para deixar um comentário.