random error 173 on streaming I/O to NFS disk

random error 173 on streaming I/O to NFS disk

I've got some Fortran (ifc v7.0 RH 8.0) codes that get random error 173 and crash after 12 to 15 hours of processing. I am using formatted output with advance=no to a NFS mounted raid array. Individual file sizes are limited to less than 4GB with a total of about 17 GB per 3 day run.

Write times NOT using advance=no are very slow to the RAID array. This is of course why we have gone to advance=no.

The code does write to local disk in good time and without error NOT using advance=no. (I should try it to local disk with advance=no).

Thoughts?

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Having rerun the tests I find that the error 173 happened in a "normal" write so it appears that the problem is in how fortran (ifc) interacts with NFS mounted files.

So will the "New" RH8 supported ifc cure this problem? Or what (and how do I get a copy?)?

REgards, Ethan

Line 3340 is:

write(16,*)'done solving w...'

And the error is:

Input/Output Error 173: Input/output error

In Procedure: main program
At Line: 3340

Statement: List-Directed WRITE
Unit: 16
Connected To: mirror.out
Form: Formatted (contains List-Directed records)
Access: Sequential
Records Read : 0
Records Written: 43654

Current I/O Buffer:

done solving w...
!
End of diagnostics

58277.328u 316.316s 16:31:57.21 98.4% 0+0k 0+0io 243pf+0w

That doesn't really sound like a Fortran problem in particular, e.g. it could have happened with C as well.

I don't know for sure, but I would imagine the Fortran IO libraries don't particularly do anything different for NFS versus local files.

Did you check:

1) the NFS mounted filesystem has plenty of free space?
2) any error logs (e.g. /var/spool/messages) on either client or server?
3) the networking reliability? (is it dropping packets in high loads?)
4) NFS mounting parameters?

One more thing.

I've seen Input/Output errors when there are bad blocks on the disk. Did you check your RAID volume thoroughly?

Hi,

Did you ever find a solution to this problem? I also experienced random errors (Input/Output Error 173) writing to NFS partitions. I then tried upgrading to v7.1 of the Intel Fortran Compilers and the error changed to an Input/Ouput Error 176.

If anyone out there has a fix for this problem or know what is causing it please let me know.

Regards,

soulde.

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen