Very slow writing to text file

Very slow writing to text file

Hi all,

I am running the 64 bit version of ifort on Ubuntu 12.04 (actually a virtual machine hosted on OS X).

I notice that the limiting step in my code is output, specifically writing a sequence of integers to a text file.  Writing ~400,000 integers to a ~6 MB text file takes 3 minutes.

I tried reproducing the problem with the simple code below:

program test

	implicit none

	integer (8) j, s1, s2, c

	call system_clock(s1)

	do j=1,100000
     write(12,*) j 
end do 
call system_clock(s2,c)

write(6,*) dble(s2-s1)/dble(c)

	end program test

The above takes a little under 20 seconds. If I use write(6,*) and print to the terminal it takes 2 seconds. If I compile using gfortran (writing to a file) it takes 0.1 seconds.

So it seems that this is specifically about ifort and specifically writing to text files as opposed to the terminal. It doesn't matter if I use a different extension (.txt instead of .out, say) for the output file, nor if I set the status to 'new' instead of 'replace' in the open command.

Thanks in advance for any help you may have on this issue.


12 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Do you mean that you are comparing gfortran buffered file I/O against ifort where you didn't set buffered_io?  The different defaults of these compilers and the facilities for changing this behavior are well documented.

It looks like I am. If I compile the test in my original post with ifort -assume buffered_io test.f90, then I get the same time as with gfortran. Thanks!

I am now trying to implement assume buffered_io option in my original code. Where would this go in a makefile? (sorry for the elementary question--still very new at this).

You might typically add your -assume flags to the Fortran flags option in your Makefile, e.g.

FFLAGS = -O3 -align array32byte -assume protect_parens,buffered_io,bytrerecl,minus0

If your Fortran compile rules use $(FFLAGS)

You will note that all those assume options, plus others, are included in the -standard-semantics option.

You have also run-time environment variable options and OPEN keyword options to control buffering; for example you can change the default for the units other than stdin, stdout, stderr by setting buffered_io but still choose unbuffered by the other means.


Tim Prince wrote:

You have also run-time environment variable options and OPEN keyword options to control buffering; for example you can change the default for the units other than stdin, stdout, stderr by setting buffered_io but still choose unbuffered by the other means.


The OPEN keywords for buffering are Intel extensions right? (not standard Fortran)

If one is reading and writing a lot of sequential access or stream access data, in general buffering should be faster, right? Is it worth using multi-buffered IO? Are there any rules of thumb or strategies to use to determine how to perform the fastest IO?


Buffering is an extension. There's a tradeoff between memory use and time when looking at buffering.

Retired 12/31/2016

Several compilers, including gfortran, ifort, and pgf90, have some combination of environment variables, compile options,  non-standard OPEN keywords, or C calls to the I/O library, to control buffering, as well as differing defaults.

While memory usage may have been a consideration in the past, the usual reason for not buffering (or setting line buffering rather than a larger buffer), is to enable checking the progress of the output file or to  improve diagnosis of a crash.  It's unlikely to find a system or device where larger than line-sized buffering isn't faster.

ifort and gfortran are not among the more extreme compilers as to default for stdout; it's often considered preferable when writing to stdout (*) to make each line appear immediately; this consideration was more important before flush became a Fortran standard operation.  As the compiler doesn't know where stdout will be directed, buffering that output can lead to perplexing situations.  The presence of buffered_io in the standard-semantics option is an acknowledgment that buffering is the usual expectation for files other than stdout.  In Fortran, it's reasonable to make line buffering a default for stdout (until you try advance='no') but that may not work for C. 

In the past, where files were often used as temporary data storage, besides invoking buffering, unformatted or stream files would be usual recommendations for improved performance.  Direct access files could be quite slow on some systems unless you found out the optimum record sizes for each operating system.  It was something odd such as 360 36-bit words on the system I used 90% of the time for my first 20 years of full-time employment.


Thanks so much for the detailed response. I guess I should just write some tests with different buffering schemes (different block size, etc.) to try to find good defaults for each system I run on. As Steve pointed out, at some point the buffer size will obviously be constrained by memory considerations.

Thanks again,

One small correction;   Although there are a number of I/O related options that are affected by -standard-semantics, "assume buffered-io" is not one of them.

This is cut from the ifort -help output:

          sets assume keywords to conform to the semantics of the
          Fortran standard.  May result in performance loss.
          assume keywords set by -standard-semantics:
            byterecl, fpe_summary, minus0, noold_maxminloc,
            noold_unit_star, noold_xor, protect_parens, realloc_lhs,
            std_intent_in, std_mod_proc_name, std_minus0_rounding,
          also sets -fpscomp logicals


I see that the current compilers show that as the standard-semantics list with no effect on buffering.

A relevant change made in the latest compiler is the additional option assume buffered_stdout, as buffered_io doesn't apply to unit (*).  Maybe the standard-semantics list differs among compiler versions.

Since you've quoted the bit about "May result in performance loss," as far as I know that refers to old_maxminloc being required for vectorization of maxloc and minloc (no definition of result for zero length vector).  I don't know of any other such performance concerns with standard-semantics, nor why maxminloc should have this problem.

Actually, the switch that has the biggest impact on performance is "assume realloc_lhs".

That requires code to look at the left-hand-side of an assignment, perform any automatic allocation (or reallocation, which might also involve finalization routines being called and/or recursive deallocations) before the assignment is even started.


Regarding maxloc and minloc, the difference is the requirements of the standard should any of the values be a NaN. The default code, in the presence of a NaN, might return the location of a NaN. With the option, extra code is generated to protect against this.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today