Asynchronous IO problems (compiler bug?)

Asynchronous IO problems (compiler bug?)

The problem:
I wrote a fairly straight forward test program to begin examining the benefits of performing asynchronous IO while post processing CFD flow fields, or even while performing the actual (DNS) CFD computations. This test program times some IO operations because I wanted to explore 2 potential techniques of parallelizing IO with respect to computations. The data is separated into different volumes for each time realization, then each volume is separated into different wall-normal (or k) planes each of which is stored in its own unformatted file. So my two strategies were to either read in an entire volume ahead of the one I am currently processing, or to read in a single k plane in advance of the data which is currently being processed. I have attached all the relavent files needed to compile the test program (3 files). If you like I can even send you the data (maybe > 1GB though...) if you want to test it. The problem is that the serial/non-parallel case works fine, and each of the two different parallel strategies works fine as long as I don't read in any more than 13 files (k planes). If i read 14 or more (106 in total per volume) the code simply hangs. No error messages, no crashes. If I use strace -p to look at the code while it is executing I see no activity (but I see plenty when I look during the serial code execution without asynchronous IO). I have tried this on two different machines (although the data is in the same place, mounted with NFS I think) and the same thing happens. I have not yet opened it in a debugger since totalview doesn't forward x windows to my home machine properly, but plan to do this when I get back to work on Monday. Below is some perhaps optional motivational info for the two strategies, and then some contextual info about my background, and perhaps finaly some background on each of the 3 files I have attached.

Motivation:
The first strategy requires reading multiple unformatted files asynchronously and concurrently. In the past I have encountered stupidly parallel problems, but which require performing IO stored in different files but on the same device, and attempted a similar technique: I had n sets of files each of which could be processed independantly. So option 1: write a script which processes them in series. Option 2: write a script which using PBS runs each case on an individual node concurrently. The problem was that this parallel case took longer than the serial case. I suspect that this was because the processing was almost entirely IO and all the data was stored on the same device, so each cluster would ask for some different file all at once from the same device/data server and the server would find file one, start reading it and sending it to the node with my program, then before it could finish would find file 2 start reading etc. and then have to go back and find where it was in file 1, in effect causing more write head seeks than the serial case. By my logic, depending on how asynchronous IO has been implemented I could encounter a similar situation using strategy 1 of my test program.
As an alternative to this I devised strategy 2: only one file would be asynchronously opened and read at a time, but I could still perform calculations laging the data being currently read.

Where I am coming from: contextual background information about my programming abilities
I am a graduate student researching supersonic and hypersonic compressible turbulence which can require the use of some very large data sets (~1TB sometimes). I do not necessarily have the best background in computer architecture/science etc. Since I am not a professional developer, I write codes mainly for readability, portability, generality (where possible) and speed (when necessary) so please keep all of this in mind when replying.

Files:
testasynchIO.f90 main test program. Takes 4 commandline arguments: case, i-dimensions, j-dimensions, k-dimensions
case is s|p|b corresponding to serial, parallel, and block or stencil strategies. The i-dimension and j-dimension are integers and correspond to the size of the data arrays in the 1 and 2 dimensions and are fixed by the content of each kplane file. You may vary k-dimension from 1 to 106 to determine the number of kplanes to read in and also the size of the variables in the 3 dimension.

IOutils.f90:
A module with some very basic utilities for getting command line arguments, handling errors related to them, and printing progress bars, spinners, etc.

types.f90:
a module containing derived types/data structures, as well as some various scientific constants and integer constants corresponding to various data types.

testasyncIO.x:
a binary executable which i compiled and may or may not work on your machine.

Thanks in advance for any help. If you think you have a better way to do what I am trying to do I am happy to discuss it with you.

-Zaak

AttachmentSize
Downloadapplication/octet-stream types.f907.42 KB
Downloadapplication/octet-stream IOutils.f9012.44 KB
Downloadapplication/octet-stream testasyncIO.f9015.81 KB
20 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Zaak,

I'll take a look at this today. What OS and version are you using, and what version of the Intel compiler? I'd like to try to replicate your environment as much as possible.

thanks

ron

Zaak,

I think you have a general misconception about asynchronous I/O.

When you issue an asynchronous I/O command such as READ or WRITE, the I/O command returns while the I/O operation is enqueued into the I/O system. While I/O is pending your program may continue to execute. Then later, and an appropriate time (or position) in your code, you will be required to determine if (when) the I/O is complete and if an error occured. INQUIRE with PENDING can be use to determine if I/O is pending (without waiting for completion) or WAIT can be used to wait for each asychronous READ (or write) or wait for all asychronous reads or writes.

The code sample you gave has no corrispondingWAIT (after inserted execution code). I think there is an upper limit of 511 pending I/O requests.

Additionally, this was not clear in your code, the arrays that you are reading into, appear to be the working set arrays.

I suggest you createa user defined type to hold your working set arrays. Then create two instances of this type A, and B. Add an I/O results variable to the type for ERROR, EOF, etc.... as well as PENDING, DONE.

Initialize eachto PENDING

Use OpenMP to create a parallel region with two threads. One thread for computation, one thread for I/O. Assuming you choose OpenMP thread team member number 0 is compute and 1 is I/O
The parallel region calles either the compute subroutine or the I/O subroutine.

CALL OMP_SET_NUM_THREADS(2)
!$OMP PARALLEL
if(OMP_GET_NUM_THREADS() .ne. 2) call ThisWontWork()
if(OMP_GET_THREAD_NUM() .eq. 0) then
call ComputeThread
else
call IOThread
endif
!$OMP END PARALLEL

subroutine ComputeThread
use YourModule
100 continue
DO WHILE(A%PENDING)
SLEEP(0)
END DO
IF(A%ERROR) RETURN
CALL PROCESS(A)
A%PENDING =.TRUE. ! INFORM i/o TO READ AHEAD
DO WHILE(B%PENDING)
SLEEP(0)
END DO
IF(B%ERROR) RETURN
CALL PROCESS(B)
B%PENDING =.TRUE. ! INFORM i/o TO READ AHEAD
GOTO 100
end subroutine ComputeThread

subroutine IOThread
use YourModule
100 continue
DO WHILE(.not. A%PENDING)
SLEEP(0)
END DO
CALL DOREAD(A)
IF(A%ERROR) RETURN
A%PENDING = .FALSE.
DO WHILE(.not. B%PENDING)
SLEEP(0)
END DO
CALL DOREAD(B)
IF(B%ERROR) RETURN
B%PENDING = .FALSE.
GOTO 100
end subroutine IOThread

Vary the SLEEP amount, and flesh out the error code.
The above does not use asynchrounous I/O. When this is working, consider adding event code to avoid the SLEEP(0) calls.

Jim Dempsey

Quoting - Ronald Green (Intel)
What OS and version are you using, and what version of the Intel compiler?

I used a non-commercial version of ifort (installed on my home machine) to build the binary and test it. (The version we currently run at work is below):

My home machine is a Lenovo/thinkpad T60 with ubuntu intrepid ibex:
07:17 PM (1) POSTPROCESS $ uname -a
Linux ************ 2.6.27-11-generic #1 SMP Thu Jan 29 19:24:39 UTC 2009 i686 GNU/Linux

Running:
06:42 PM (1) POSTPROCESS $ ifort -V
Intel Fortran Compiler Professional for applications running on IA-32, Version 11.0 Build 20090131 Package ID: l_cprof_p_11.0.081
Copyright (C) 1985-2009 Intel Corporation. All rights reserved.
FOR NON-COMMERCIAL USE ONLY

I tested the code on two machines. Both should be running some sort of RHEL derivative (probably PU_IAS, see www.elders.princeton.edu)
Info on the first is listed below:
[************@********* ~]$ uname -a
Linux ********* 2.6.18-92.1.22.el5 #1 SMP Tue Dec 16 11:44:36 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

info on the 2nd:
*********@**********:~> uname -a
Linux ******** 2.6.16.54-0.2.12-smp #1 SMP Fri Oct 24 02:16:38 UTC 2008 x86_64 x86_64 x86_64 GNU/Linux
SGI ALTIX ICE

Intel Harpertown 2.83 GHz, 12 MB cache, 4 core/CPU, 8 cores/node Infiniband

At work I run:
07:17 PM (0) MACH5_TOTE5_DNS_DP2M_MP $ ifort -V
Intel Fortran Intel 64 Compiler Professional for applications running on Intel 64, Version 11.0 Build 20081105
Copyright (C) 1985-2008 Intel Corporation. All rights reserved.

on a repacked version of RHEL (PU_IAS, see www.elders.princeton.edu):
07:17 PM (0) MACH5_TOTE5_DNS_DP2M_MP $ uname -a
Linux ********.Princeton.EDU 2.6.18-128.1.1.el5 #1 SMP Tue Feb 10 18:09:15 EST 2009 x86_64 x86_64 x86_64 GNU/Linux

(I have obscured the hostnames here to protect the machines of interest)

If you need more hardware info i may be able to get it to you, please let me know.


Quoting - jimdempseyatthecove
Zaak,

I think you have a general misconception about asynchronous I/O.

When you issue an asynchronous I/O command such as READ or WRITE, the I/O command returns while the I/O operation is enqueued into the I/O system. While I/O is pending your program may continue to execute. Then later, and an appropriate time (or position) in your code, you will be required to determine if (when) the I/O is complete and if an error occured. INQUIRE with PENDING can be use to determine if I/O is pending (without waiting for completion) or WAIT can be used to wait for each asychronous READ (or write) or wait for all asychronous reads or writes.

The code sample you gave has no corrispondingWAIT (after inserted execution code). I think there is an upper limit of 511 pending I/O requests.

Additionally, this was not clear in your code, the arrays that you are reading into, appear to be the working set arrays.

Jim I am aware of the wait syntax. According to the standard a close statement is equivalent to wait, with the added functionality that it closes the open file. See Fortran 95/2003 Explained by Metcalf et al.

And I am not sure what you mean by saying I am reading into the working set arrays. A portion of each file gets read into the corresponding sub-section of an array. I agree that this perhaps is not necesarrily best practice, but the code does no actual work on the arrays (i.e. their values are never modified either using assignment statements or reads to the same array subsection) and the entries are not accessed until the IO for that array subsection is complete (gauranteed by the close statement). In an actual implementation I would put these arrays in a module along with the IO routines and give them the protected attribute so that they are read only as far as the rest of the code is concerned. Right now my code just ends after reading in the arrays from the different files, but in practice I will compute some secondary values (derivatives, swirl, vorticity etc.) from array sub-sections which have already been read in asynchronously, and the corresponding files closed (i.e. waited for).

Let me try to put the basic idea into words:
Read the first file, or set of files and close them (wait for them).
Start reading the second file or set of files asynchronously while performing computations using the values from the first file/set of files.
Then close the open file(s) (i.e. wait for then).
Open/read the next file/set of files asynchronously,
perform computations using the data taken from the previous file/set of files while the new ones are being read in.

The 2003 standard does not specify any need to use openMP or MPI to perform asynchronous IO, indeed abstraction, generality etc. are the whole reason we have high level languages like fortran. At the moment I have no need for managing the threads on such a low level and i will let Intel's fine compilers do this for me. If you look carefully at my source, I beleive you will note that it is leveraging asynchronous IO as the standard intended, values which are currently being read in are not accessed until after they are done being read (i.e. the file is closed). Further the "working arrays" as you put it are never modified, and indeed should never be modified and will ultimately be PROTECTED in the final implementation. Besides stylistic concerns I see no need to create any sort of buffer variable or structure, provided I don't try to access memory locations which are being asynchronously read or written.

By the way, SLEEP is not part of the standard, and therefore is non-portable.

>>Let me try to put the basic idea into words:
Read the first file, or set of files and close them (wait for them).
Start reading the second file or set of files asynchronously while performing computations using the values from the first file/set of files.
Then close the open file(s) (i.e. wait for then).
Open/read the next file/set of files asynchronously,
perform computations using the data taken from the previous file/set of files while the new ones are being read in.
<<
This is was the reason for the A and B user defined types. The code snippet you provided had all the reads going into the same arrays. i.e. your next reads are reading over the top of the prior reads (current working set).

Outline of what I think you want to do

Read A
loop:
Close A (implied WAIT A)
(caution Close Amay not necessary capture error on Read A)
if error A bail out
Read B
compute A
Close B
if error B bail out
Read A
compute B
goto loop

It is not entirely clear from the IVF documentation as if the io-stat variable reference on an asychronous I/O will receive the completion code from the eventual completion of the I/O operation (i.e. its contents are volatile after return from asychronouts READ/WRITE). If so, then the variable to receive theI/O status must not be destroyed prior to completion (i.e. must not be stack local variable of subroutine that returns prior to completion).

Jim Dempsey

Quoting - jimdempseyatthecove

This is was the reason for the A and B user defined types. The code snippet you provided had all the reads going into the same arrays. i.e. your next reads are reading over the top of the prior reads (current working set).

Outline of what I think you want to do

Read A
loop:
Close A (implied WAIT A)
(caution Close Amay not necessary capture error on Read A)
if error A bail out
Read B
compute A
Close B
if error B bail out
Read A
compute B
goto loop

Jim thanks for your response. If you look at the code you will see that arrays into which I read the data are rank 3. Also in each read statement there is an implied do loop over i and j but NOT k for each variable. Equivalently each individual file contains a rank 2 section of each of the arrays. Therefore each unique k index corresponds to an individual file (containing rank 2 sections of the data). Also each unique k index corresponds to a rank 2 array section of the same dimensions as the contained in the file. So the data in its totality from all of the files forms some rank 3 volume of flow data. However each file contains only a plane, (rank 2) of the data. Essentially what I need to do is reconstruct the volume (rank 3 data) from the set of planes (rank 2 data). The height of this volume and hence the number of k-plane files which need to be read in varies from case to case. So my code is equivalent to yours, but rather than declare two variables A and B I have 1 variable wich is of rank = rank(data_in_file) + 1. i.e. file 1 contains a(i,j) , file 2 b(i,j), file 3 c(i,j), etc. corresponding to u(i,j,1), u(i,j,2), u(i,j,3) etc. Because I know at runtime how many files I am dealing with (the number of k-planes, or the height of my volume) but not at compile time I need a means of creating k variables. since I am in scientific computing linked lists etc. would be silly and innecficient so instead of creating k variables I create an array with an extra dimension of length k in that dimension.

Also i agree that my error handling is not terribly rigorous or robust, but:
1) The code is not an end user code it is only run by people like me who develope it
2) I know the files exist, where they are located and what they contain because I wrote them
3) finally if the files exist, are not corrupt, and are of the format I wrote them in then i expect the fortran compiler to compile my code in a manner which will successfully read them. This is indeed the case and has been varified many times over. If you supply 's' as the first argument to my code for the synchronous/serial read case no problems arise, and the files are read successfully. Additionally no problems occur for fewer than 14 files with the asynchronous io ('p' or 'b' as the first argument).

I hope this helps you understand my code, and that we can get to the issue at hand as to why it hangs for more than 13 files to be read.

Not sure if it will help but here are two core dumps taken using gcore while the code appears to be hung. Each file is gzipped so it was smallenough to upload. The file extensions correspond to the commandline arguments used. I also tried using kill -6 to make core dumps but I got some sort of error.

Also I tried to open the code and run it under totalview debugger and it always hangs here even for cases which do not hang when executing on their own. Totalview is supposed to be able to handle MPI and threaded codes. One of the files attached is a text output from totalview to the terminal (stderror probably) while it is trying to execute the code.

Finally I have uploaded a single k-plane data file. You should just be able to copy this file and change the extension to create a set of kplanes. i.e. I have included the first one, old_plane.101.gz, unzip it to old_plane.101, then copy this file to old_plane.102, old_plane.103, ... etc.

Attachments: 

Just for "kicks", insert in front of the CLOSE, the number of WAITs necessary to consume the pending and/or completed asychronous I/O requests (i.e. 1 WAIT for each asychronous read on that unit). There may be a case of (missing) garbage collection going on in the IVF runtime system.

Jim

Quoting - jimdempseyatthecove
Just for "kicks", insert in front of the CLOSE, the number of WAITs necessary to consume the pending and/or completed asychronous I/O requests (i.e. 1 WAIT for each asychronous read on that unit). There may be a case of (missing) garbage collection going on in the IVF runtime system.

WAIT statements wait for all pending IO on the file unit no need for multiple wait statements per file, unless you also specify id= in the wait statement corresponding to the id= of individual asynchronous io statements. So only one per file is needed. Also, not IVF, Intel compiler for Linux. Not under windows.

I put wait statements appropriately before the close statements and saw no change of behavior. Since the code is compiled with the -trace flag it tells me the line number where it is when i kill it with C-c which is always the line where a file is being opened asynchronously.

Try something like this:

     ! CALL parallel_read() ! inlined
     DO k = 1,kedg
        WRITE(ftag, '(I3)') k + 100
        INQUIRE( file=hdir//'H/REST/old_plane.'//ftag, exist=fliv)
        IF (fliv) THEN
           OPEN( furd+k, file=hdir//'H/REST/old_plane.'//ftag, &
                &form='unformatted', action='read', asynchronous='YES' )
        ELSE
           OPEN( furd+k, file=hdir//'H/REST/old_planeNEW.'//ftag, &
                &form='unformatted', action='read', asynchronous='YES' )
        ENDIF
        
        READ(furd+k, asynchronous='YES')
        READ(furd+k, asynchronous='YES') hold, hold, xlen, ylen ! Hold is just a garbage can
        READ(furd+k, asynchronous='YES') xloc, yloc, zloc, dzdk
        READ(furd+k, asynchronous='YES') ((  u(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  v(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  w(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  p(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  T(i,j,k),i=1,imax),j=1,jmax), &
                                         ((rho(i,j,k),i=1,imax),j=1,jmax)
       if(k .gt. 1) then
            close(k-1)
            call DoWokrOnVolumn(k-1)
       endif
     END DO
    close(kedg)
    call DoWokrOnVolumn(kedg)

Jim Dempsey

Quoting - jimdempseyatthecove

Try something like this:

     ! CALL parallel_read() ! inlined
     DO k = 1,kedg
        WRITE(ftag, '(I3)') k + 100
        INQUIRE( file=hdir//'H/REST/old_plane.'//ftag, exist=fliv)
        IF (fliv) THEN
           OPEN( furd+k, file=hdir//'H/REST/old_plane.'//ftag, &
                &form='unformatted', action='read', asynchronous='YES' )
        ELSE
           OPEN( furd+k, file=hdir//'H/REST/old_planeNEW.'//ftag, &
                &form='unformatted', action='read', asynchronous='YES' )
        ENDIF

        READ(furd+k, asynchronous='YES')
        READ(furd+k, asynchronous='YES') hold, hold, xlen, ylen ! Hold is just a garbage can
        READ(furd+k, asynchronous='YES') xloc, yloc, zloc, dzdk
        READ(furd+k, asynchronous='YES') ((  u(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  v(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  w(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  p(i,j,k),i=1,imax),j=1,jmax), &
                                         ((  T(i,j,k),i=1,imax),j=1,jmax), &
                                         ((rho(i,j,k),i=1,imax),j=1,jmax)
       if(k .gt. 1) then
            close(k-1)
            call DoWokrOnVolumn(k-1)
       endif
     END DO
    close(kedg)
    call DoWokrOnVolumn(kedg)

Jim Dempsey

Jim this is essentially equivalent to my second test case corresponding to specifying b as the first argument although somewhat simplified. For "kicks" I'll give it a shot quickly, even though, as i stated above, this is more or less what i am trying to test in the second scenario ('b').

...

Alright I have commented out my original code for the 'p' case and copied in Jim's verbatim, except i have commented out the calls to DoWorkOnVolume (since these are purely notional now and all I want to do is compare execution times using different strategies/test for a penalty (from seeks most likely) for asynchronously reading multiple files concurrently. As suspected, the code hangs for 14 files but not 13 or fewer with the added bonus that now we have some unexplainable segmentation fault which occurs right before the code exits for 13 or fewer asynchronous IO reads. (I have placed a print statement right before end program testasyncIO which successfully prints, since I cannot run at all under totalview) The code was compiled with -g -trace -O0.

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image PC Routine Line Source
testasyncIO.x 0000000000482461 Unknown Unknown Unknown
testasyncIO.x 0000000000481435 Unknown Unknown Unknown
testasyncIO.x 0000000000446CAA Unknown Unknown Unknown
testasyncIO.x 000000000040EA92 Unknown Unknown Unknown
testasyncIO.x 0000000000411AD3 Unknown Unknown Unknown
libpthread.so.0 0000003B49A0E4C0 Unknown Unknown Unknown
testasyncIO.x 000000000048A980 Unknown Unknown Unknown

The issue with the close statement below fixes this segmentation fault. See Jims post below.

I have a strong suspicion that this is compiler related.... since the same results happen across different machines, and I am fairly confident there are no syntax or logic errors, unless some of the finer points of the standard are evading me.

I hope you fixed the error in my close statement.They should haveincluded your unit offset "furd"

close(furd+k-1)

Jim

Is any one from intel going to look at this again ever? It would be nice to get this resolved.

>>The issue with the close statement below fixes this segmentation fault. See Jims post below.

So does this mean that the code reads through all your files correctly?

If not, then is there something else to pursue?

If yes, then the explination might be this:

For each file (slice) you are having 4 asynchronous I/O requests in flight. The compiler is not making an assumption that you will not issue an INQUIRE or WAIT (although in this case you can see that in your code but the compiler optimization is not looking at the I/O statements).

For each asynchronous I/O request the I/O system uses one of a fixed number of preallocated objects (completion nodes)which contain or will contain the progress and/or results of the I/O. This object is held in use until it is disposed of by use of appropriate WAIT or CLOSE.

If the CLOSE is not reclaiming the completion nodes (code blocks or crashes after some number of asynch I/O requests) but inserting the appropriate WAITs fixes the problem, then insert the WAITS and file a complaint to Premier Support (use the WAITs until resolution of incident).

Jim

Jim,
The original problem still persists, with or without wait statements. I suspect, as you stated in one of your earlier posts, it is some sort of garbage clean up which is not happening correctly. As stated originally, the code will read (asynchronously) 13 files, but hangs without producing any sort of error trying to read 14 or more. Using idb for the first time and compiling with -g -trace -O0 it seems that perhaps it is creating some sort of extra threads and/or not killing the ones it is done with, but I may be interpretting things erroneously. The only other type of issue I can think of, with my VERY limited knowledge of such matters, is some sort of race condition or cache thrashing is occuring. All of this, besides my description of the actual problem, is admitedly uninformed speculation. (On this subject, do you have any thoughts on how asynchronous IO is implimented? The only way i can think of is by creating some thread for the IO operations, but as I said my knowledge of this type of stuff is quite limited.)

Since I am not much of a computer scientist, i have no idea what you are talking about vis a vis completion nodes. As far as I can tell all of my syntax and constructs are correct and compliant with the standard.

How does one file bug reports etc.?

Use https://premier.intel.com for software support.

If the asychronous reading of your data will buy you significant wall clock time then I suggest you consider this as the motivatind opportunity to learn how to incorporate OpenMP into this routine to perform your reads.

The only catch is the type of programming required in this case is NOT the typical programming you learn first when learning how to program in OpenMP.

I will make a quick look at your code and make some suggestions.

Jim Dempsey

Sandwich this into your test program

  integer, volatile :: LastRecordRead, ioStatus
  . . .
  CASE('o')
     ! CALL OpenMP_read()   ! inlined
     LastRecordRead = 0
     ioStatus = 0
!$OMP PARALLEL SECTIONS PRIVATE(k)
     DO k = 1,kedg
        WRITE(ftag, '(I3)') k + 100
        INQUIRE( file=hdir//'H/REST/old_plane.'//ftag, exist=fliv)
        IF (fliv) THEN
           OPEN( furd, file=hdir//'H/REST/old_plane.'//ftag, &
                &form='unformatted', action='read', iostat=ioStatus, err=100 )
        ELSE
           OPEN( furd, file=hdir//'H/REST/old_planeNEW.'//ftag, &
                &form='unformatted', action='read', iostat=ioStatus, err=100 )
        ENDIF
        REWIND(furd, iostat=ioStatus, err=100)
        READ(furd, iostat=ioStatus, err=100)
        READ(furd, iostat=ioStatus, err=100) hold, hold, xlen, ylen
        READ(furd, iostat=ioStatus, err=100) xloc, yloc, zloc, dzdk
        READ(furd, iostat=ioStatus, err=100) ((  u(i,j,k),i=1,imax),j=1,jmax), &
                   ((  v(i,j,k),i=1,imax),j=1,jmax), &
                   ((  w(i,j,k),i=1,imax),j=1,jmax), &
                   ((  p(i,j,k),i=1,imax),j=1,jmax), &
                   ((  T(i,j,k),i=1,imax),j=1,jmax), &
                   ((rho(i,j,k),i=1,imax),j=1,jmax)
        CLOSE(furd, iostat=ioStatus, err=100)
        LastRecordRead = k
     END DO
100  continue
!$OMP SECTION
     ! Work on current volume, in seperate parallel section.
     DO k = 1,kedg
         do while(k .gt. LastRecordRead)
            if(ioStatus .ne. 0) goto 200
            call sleepqq(0)
         end do
         call DoWork(k)
     END DO
200  continue
!$OMP END PARALLEL SECTIONS
    if(ioStatus .ne. 0) goto 999
! end of case

Jim Dempsey

(implied in post - compile with OpenMP options)

Jim

Quoting - zbeekman
The problem:
I wrote a fairly straight forward test program to begin examining the benefits of performing asynchronous IO while

...

on two different machines (although the data is in the same place, mounted with NFS I think) and the same thing happens. I have not yet opened it in a debugger since totalview doesn't forward x windows to my home machine properly, but plan to do this when I get back to work on Monday. Below is some perhaps optional motivational info for the two strategies, and then some contextual info about my background, and perhaps finaly some background on each of the 3 files I have attached.

...

-Zaak

Zaak,

If you are having specific trouble with X forwarding with TotalView that you aren't with other X11 apps let us know. We might be able to help you out. Please contact: support@totalviewtech.com.

In general we've found that X fowarding over a long distance (high latency is the challenge, pretty much anything with a longer round trip than LAN) isn't as fast as users would like. We don't believe that this is anything specific about TotalView. It is really more of a general observation about interactive tools and X11.

Since there are lots of reasons folks might need to debug on machines located at remote sites (collaboration and distributed user communities) we've added a remote display client into our latest version (TotalView 8.6). It sets up a secure and fast "virtual X11 sever" kind of connection via SSH tunnels. See http://www.totalviewtech.com/support/documentation/totalview/remote_disp... for more info.

Let me know (chris.gottbrath) (totalviewtech.com) if this helps or if you have any trouble.

Cheers,
Chris

Leave a Comment

Please sign in to add a comment. Not a member? Join today