Slowness when writing many records to a direct access file

Slowness when writing many records to a direct access file

When writing records to a direct access file I am finding that the Intel Fortran compiler is much slower than my 17 year old Watcom Fortran compiler running exactly the same code. In fact the Intel compiler is around 10 times slower!! This doesn't seem right to me. Can anyone see what I am doing wrong in the code segment below? Note that if CASES=1970 and BMS=5167, it is writing over 10 million records and this is taking around 8 minutes on my I7 laptop, whereas the ancient Watcom compiler does it in less than a minute.

You can see that I have a PeekMessage statement in the loop so that it can process Windows messages, however I am finding that the speed is the same with or without it.

Help!

      INTEGER*4  RCD,CASES,BMS,CNUM,NMP,MEM,IO

      STRUCTURE /MACTIONS/
         INTEGER*2  NTCN,BKL
         REAL*8     FM(12),Extra(6),Diff(6)
      END STRUCTURE
      RECORD /MACTIONS/ MACT(BMS,CASES)

      TYPE(T_MSG)    Msg

      OPEN (MALUN,FILE='FRED.DAT'
     c        ,STATUS='NEW'
     c        ,ACCESS='DIRECT'
     c          ,FORM='UNFORMATTED'
     c        ,ACTION='READWRITE'
     c          ,RECL=200
     c        ,IOSTAT=IO)

      RCD=0
      DO CNUM=1,CASES
         PeekMessage (Msg,0,0,0,PM_REMOVE)
         DO NMP=1,BMS
            RCD=RCD+1
            WRITE (MALUN,REC=RCD,IOSTAT=IO) NMP
     c      ,(MACT(NMP,CNUM).FM(I),I=1,12)
     c      ,MACT(NMP,CNUM).NTCN,MACT(NMP,CNUM).BKL
     c      ,(MACT(NMP,CNUM).Extra(I),I=1,6)
     c      ,(MACT(NMP,CNUM).Diff(I),I=1,6)
         ENDDO
      ENDDO

      CLOSE (MALUN)

19 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I forgot to mention that in my actual code, the PeekMessage line is actually a call to a function that removes all messages in the queue rather than just a single PeekMessage call. In any case, it doesn't seem to matter if the PeekMessage call is there or not.

I should also mention that while the program is executing, everything else on my computer becomes extremely sluggish including the mouse movements, and that's why I initially thought it might be a Windows messaging problem.

Add in your open statement

BUFFERED='YES'

and

BLOCKSIZE=

and

BUFFERCOUNT=

 

 

That helped a bit, but it's still a lot slower than my old Watcom compiler. Everything else about the Intel compiler is super-fast.

 

Hi, Schulze

Direct access I/O allows user to write or read data randomly (i.e. not sequential) through the file. The ifort runtime library, as an optimization, enables Windows operating system caching (buffers) of the data it writes to or reads from a direct access file. This means, an extra chunk of the system's memory is used for maintaining such a buffer. When more records are written, memory usage is rising till completely used, program execution will become extremely slow.

 To disable the caching when memory is valuable, please follow the instruction below:

Set the environment variable FOR_DO_NOT_CACHE_DIRECT_FILE to the unit number(s) of the file(s) being processed. For example, if 3 files with unit numbers 8, 9 and 10 are being operated on then execute:
set FOR_DO_NOT_CACHE_DIRECT_FILE = 8, 9, 10.

General syntax: set FOR_DO_NOT_CACHE_DIRECT_FILE = ulist (where ulist = unit numbers of files. [comma separated list])

Then run the executble again. Please have a try if this helps.

Thank you.

Yolanda Chen Intel Developer Support Tools Knowledge Base: http://software.intel.com/en-us/articles/tools

Yolanda,

The environment variable FOR_DO_NOT_CACHE_DIRECT_FILE
Is it consulted (read) once at program start
Or, consulted (read) on each open?

If the latter, you could then programically use SETENVQQ to set/modify FOR_DO_NOT_CACHE_DIRECT_FILE at run time.

Schulzy,

You could also consider using USEROPEN, then do your own call to CreateFile with the appropriate flags

Jim Dempsey

www.quickthreadprogramming.com

What are the sizes of the files you are writing, as this might explain the problem ?
Check the value of RECL=, as this is system dependent.

For ifort, NMP appears to be a 4 byte integer, while each record is 200 x 4 bytes long.
What size record does Watfor write ?

Your OS might also explain some of the problem, how full the disk is or what type of disk is being used (local or network)
If it is very slow, are there any system errors being returned; are you testing IOSTAT=IO ?

Basically I would doubt that the problem is ifort, but rather some problem with the file system or record length.

John

Thanks Yolanda and Jim, I will try your suggestions shortly.

John, the code segment I posted was a rough and simplified version of my real code. The record length is actually passed into a "FileOpen" routine and in this case is 200 bytes. NMP is just a 4 byte integer that I am writing to the file along with the MACT structure. Together they total exactly 200 bytes in size which equals the record length that the file is opened with.

I am paranoid about testing IOSTAT=IO and always do it after each write even though I haven't included that in my code segment (I was trying keep it simple for this posting). The disk is local and has plenty of space (313 Gb free). The write loop works perfectly and doesn't return any errors. It is just a speed issue, nothing more. The Watcom compiler uses 100% identical code.

 

John, I just realised what you were getting at with your 4 x 200byte record length comment. I am using the /assume:byterecl compiler switch that lets me specify RECL in bytes.

 

I always try to avoid system dependent settings like BUFFERED=, BLOCKSIZE= or BUFFERCOUNT=
These should not explain slowness.
I also noted a significant difference with file system performance when moving from Windosw XP to Windows 7, as the Windows O/S file buffers became much more effective.
Inexplained slowness usually relates to file system problems, such as access rights, while non-standard buffering settings in ifort are less relevnt with Windows 7.
I have not tested Windows 8 to see if this is another aspect of numerical computation that Microsoft has seen as less important.

Be more lateral in looking for your problems as they probably will not relate to Fortran I/O performance.

John

If the slowness is related more to the operating system and environment rather than the compiler then why would the Watcom Fortran be so much faster? It is using the same source code and running on the same machine. I am using Windows 8.

By the way, adding BUFFERED='YES' doubled the speed and I didn't use BLOCKSIZE or BUFFERCOUNT because they didn't improve the speed at all when I initially played with them a bit.

Hi, Jim

The environment variable is consulted on each Open statement. Yes, you may use SETENVQQ to modify it at runtime.

Thank you.

Yolanda Chen Intel Developer Support Tools Knowledge Base: http://software.intel.com/en-us/articles/tools

There are a few possible issues

* alignment of I2 and R8 variables in the data structure could increase the memory record to more than 200 bytes.
* size of the file is 2gb, which could exceed the memory buffer capacity.
* conflict between windows 7 and ifort file buffering is the most likely.

You did not indicate the run times you were achieving or the type of disk you were using, but the version I tested takes about 60 seconds to run on my notebook ( which has a SSD )
This was using your values quoted for BMS and CASES.
I think the big difference between ifort and watfor (if there is a difference for the same conditions) would be how the file buffers are being flushed.
If you run task manager, as the program is running you will see that the file writes are being buffered.
Closing the file could flush the buffers, which would result in a delay frpom completion of write to exit.
Using the attached program, writing 2g in 60 seconds is about 30 mb per second, which is not a fast rate for a SSD.

I'd recommend using task manager and check the file buffer sizes.

John

Attachments: 

AttachmentSize
Download mactions.f903.37 KB

John,
I have used the /align:rec1byte switch which gets around the byte alignment problem and allows each record to fit into 200 bytes. I realise that this is probably frowned upon a bit, but I am doing it for backwards compatibility with old code that I will eventually upgrade.

I have an ASUS R550C notebook running Windows 8 with an Intel i7 2.0GHz, 8Gb RAM, 1Tb harddrive and 24Gb SSD.

Yolanda and Jim,
By setting FOR_DO_NOT_CACHE_DIRECT_FILE I have been able to get the write time down to a bit over 60 seconds, however it takes another 20-30 seconds to exit after I close, presumably while it is flushing the buffers.

I have also added a % counter to see how the write is going, and it shows that the first 33% runs in less than 10 seconds, followed by the rest being written a lot slower. There are also small sections after 33% where it briefly speeds up a bit and then slows down again. I assume that this is due to Windows doing buffering and other memory management things.

Yolanda,
In the Intel documentation in the OPEN: BUFFERED section it says that if BLOCKSIZE and BUFFERCOUNT are omitted then the default buffer size is 8192 bytes, whereas in the OPEN: BLOCKSIZE section it says that if BLOCKSIZE is omitted then the default buffer size is 128kb. Which one is correct?

If it is possible to defer the CLOSE, this could allow the program to continue, while the buffering catches up. Alternatively, if you have access to the SSD drive, this would also solve the flush delay, as my running on the SSD did not show a close delay or any varition on the write speed.

John

In my comment #13, I made the observation that "conflict between Windows 7 and ifort file buffering is the most likely."

While it has been my experience that the change in buffering of large files with Windows 7 has been a significant improvement over XP, I was surprised that no one challenged my suggestion that ifort buffering was less effective with Windows 7.

Is this because it is the case ? or is there limited experience of this aspect of ifort's buffering performance since Windows 7 was introduced ?

John

We made changes to how buffering works in the past year or so - it is detailed in the release notes. In the 15.0 release we have made more improvements that should reduce the chance of poor performance.

Steve - Intel Developer Support

Hi, Schulze

The BLOCKSIZE and buffer size are two distant things.

The size of the internal buffer is the initial size of the buffer (default 8kb) times BUFFERCOUNT. That maybe at one time BLOCKSIZE and buffersize was the same value, but this has not been true for a while. So the documentation should be updated for it.

Sorry for the late update. Hope this helps.

 

Yolanda Chen Intel Developer Support Tools Knowledge Base: http://software.intel.com/en-us/articles/tools

Leave a Comment

Please sign in to add a comment. Not a member? Join today