How can I write to binary file faster?

How can I write to binary file faster?

I have a the following code for wrtiting to a binary file:

CALL system_clock(Time1, rate) 
 DO 275 I=1,NDOF 
 DO 274 J=1,UBW 
 IF (S(I,J).NE.0) THEN 
 WRITE (1) I
 WRITE (1) J+I-1
 WRITE (1) (S(I,J)) 
 CALL system_clock(Time2) 
 print *, "elapsed time: ", real(Time2-Time1) / real(rate)

I know by using less WRTIE statement I can make it faster. So inside the loop I am using the following code and it is faster:

IF (S(I,J).NE.0) THEN 
WRITE (1) I, J+I-1,  (S(I,J)) 

Is there any way to get ride of the loop (since it is time consuming) or make any other change to have a more efficient code? 

Please note that I want to have the order of I, J+I-1 and S(I,J) ( only non zero values) in my writing. Also since I am using a C++ program to read the binary file I have to use stream access.

Any suggestions are greatly appreciated. 

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I converted your code sample into a test of a number of file OPEN options:
 OPEN ( UNIT=11, FILE='Test2.bin', STATUS='UNKNOWN', FORM='UNFORMATTED')     ! 3 writes
 OPEN ( UNIT=11, FILE='Test3.bin', STATUS='UNKNOWN', FORM='UNFORMATTED')     ! 1 write
 OPEN ( UNIT=11, FILE='Test5.txt', STATUS='UNKNOWN')

I ran them on Win7_64 with an SSD drive and got very poor write times using:
Intel(R) Visual Fortran Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version Build 20120612

For most cases the elapsed time was very poor, although BUFFERED='YES' was reasonable:
 elapsed time: 0 Define S      0.1090000   
 elapsed time: 1 stream         27.78400   
 elapsed time: 2 unformatted    28.08000   
 elapsed time: 3 unformatted    9.719000   
 elapsed time: 4 direct         9.796000   
 elapsed time: 5 formatted      12.21500   
 elapsed time: 6 buffered       1.435000   

I tried an alternative compiler for some of the options and got:
Program entered 
 elapsed time: 0 Define S        0.192800   
 elapsed time: 1 transparent      1.16340   
 elapsed time: 2 unformatted      1.22750   
 elapsed time: 3 unformatted     0.732800   
 elapsed time: 4 direct           8.56580   
 elapsed time: 5 formatted        2.02770   

Quite a surprise ?  Perhaps I've missed something in the Intel OPEN ?

Both compilers were poor for DIRECT, so I tried a single type variable for the I/O list, but still did not work. Again a surprise.
I would have expected option 4 to be the best, then option 3 (ie 6), although I typically use DIRECT with much bigger records.

The code is attached, so if anyone as an alternative OPEN or WRITE, let us know


ps: I tried DIRECT and BUFFERED and got 1.061 seconds. Thankfully that makes sense.
Why is BUFFERED='YES' not the default ? It should not be too hard to test when a subsequent READ or REWIND occurs, although wouldn't they use the buffer also ?


Downloadapplication/zip stream-test.zip2.62 KB

I obtained on my PC (without SSD) little bit better results than in case 6 with


Too many things to check !

I corrected the ifort compile options in the attached batch file I was using, and introduced BUFFERED=NO/YES for all cases and now the results are more as expected. Again BUFFERED='YES' should be used and probably should be the default, unless I am missing something else. The latest attachment gives the revised results.

 Tests for BUFFERED=NO
     elapsed time: 1 stream           24.14700
     elapsed time: 2 unformatted      24.64700
     elapsed time: 3 unformatted      8.704000
     elapsed time: 4 direct/buffer    8.595000
     elapsed time: 5 formatted        21.45000
  Tests for BUFFERED=YES
     elapsed time: 1 stream          0.8730000
     elapsed time: 2 unformatted      1.139000
     elapsed time: 3 unformatted     0.7180000
     elapsed time: 4 direct/buffer   0.4050000
     elapsed time: 5 formatted        1.810000   

In summary, the expected result is:
binary is better than text (if the program to read the file is generated by the same compiler)
fewer big records are better than lots of small ones
access='DIRECT' is best if fixed length records can be adopted.
BUFFERED='YES' should be selected ( although this is non-standard fortran)

One of the goals of Fortran 90/95 was to improve portability of standard conforming code and a significant focus was file I/O. Defaulting BUFFERED='NO' is probably one step backwards.


ps : as this is a "intel-visual-fortran-compiler-for-windows" note windows; why can't cut and paste accept windows format text ?
This appears to be the most active of the Intel software development forums, but we continue to be subjected to a forum environment which tells us we should be Linux C users.


Downloadapplication/zip streamb.zip2.01 KB

You might try separating your sorting loop from the output activity, and then use the much faster direct API calls to write the file (see my post to your previous forum thread).

TYPE triplet
    INTEGER :: i
    INTEGER :: ij
    INTEGER :: sij
END TYPE triple
TYPE(triplet), ALLOCATABLE   :: output(:)
INTEGER         :: k
k = 0
DO i = 1, ndof
    DO j = 1, ubw
        IF (s(i,j) /= 0) THEN
            k = k + 1
            output(k)%i   = i
            output(k)%ij  = i+j-1
            output(k)%sij = s(i,j)
        END IF
    END DO
ihandl = open_the_file ('test.bin'c, 'W') 
CALL rw_file ('W', ihandl, k*SIZEOF(output(1)), LOC(output(1)))
CALL close_file (ihandl)

Setting  BUFFERED to YES is a key point. Thanks John Campbell!

You can thank Steve for recomending BUFFERED. It certainly has some effect.

If you are going to read the information back into the same program, binary is the best approach.
However, if you are going to read the information into another program, I would recomend using text. I have found that intermediate text files are much more convenient, as they are easier to read or import into Excel, especially if you need to check the contents.

The examples above show that there is not a big time penalty for this. The attached example is of a simple .csv format with maximum precision and minimum size. If you run it you should see a minimal penalty for the .csv formatting.



Downloadapplication/octet-stream stream-text.f901.24 KB

The default choice of buffering for some cases of direct access was changed with the xe2013 compilers.  Now it's necessary to set buffered='yes' or one of the other alternatives such as /assume buffered_io if you want the faster performance.

I guess the behavior without buffered_io is a hold-over from the time when there was no Fortran standard FLUSH so this was the only portable way to make each record visible outside the program immeidately after write.

It's a bit more complicated than that. The way buffering was done changed - that change helped some and hurt others. The release notes has a discussion of this. But buffering was not the default before.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today