initializing large arrray with "data /" - maximum number of continuation lines

initializing large arrray with "data /" - maximum number of continuation lines

wolfpackNC's picture

disclaimer: this is for a f77 code. but if there is a compelling reason to make a 95 module that is possible.

I am trying to initialize several arrays size of 91,181 containing floating point [real*4] variables. I have the data in ascii test file but we need to avoid the I/O runtime penalty of reading files each run.

So my idea was to create a block data subprogram and initialize the arrays in there then I could avoid reading in data each run.

The problem is that I need massive amounts of continuation statements [ >1400] to do it this way. which I clearly can't do.

Are there any thoughts on better ways to do this?

Thanks!

15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Steve Lionel (Intel)'s picture

I would recommend reading them in from a file. Really. I can't imagine that the time to do that is significant. For best performance, put as many values in one record as you can.

Intel Fortran supports 511 continuation lines, which far exceeds the Fortran standard's minimum.

Steve
wolfpackNC's picture
Quoting - Steve Lionel (Intel) I would recommend reading them in from a file. Really. I can't imagine that the time to do that is significant. For best performance, put as many values in one record as you can.

Intel Fortran supports 511 continuation lines, which far exceeds the Fortran standard's minimum.

Well, I could always try that and measure the penalty.

You're right though. The time to read in the file is insignificant for one "run" of this subprogram. However, considering that this subprogram gets called by another over 62,000 times to finish the simulation all the time delays add up. Currently a full simulation takes up to 6 hours - which we are trying to cut down. Plus we would have to carry around the extra data file -- which we try to minimize as much as possible.

I do realize that 511 is usually plently of continuation lines. So I was trying to find another method to initialize the array.

Thanks for your reply. Any other suggestions?

Tim Prince's picture

It may take longer to set this up at load time than to read it from a file. It looks like you want to set it up outside of some loop, which might be done by making it a module array, USEing it early in the program to set it up, and again where the data are required (or, if, like some of ours, your customer prohibits f90, a labeled COMMON is possible).

rreis's picture
Quoting - tim18 It may take longer to set this up at load time than to read it from a file. It looks like you want to set it up outside of some loop, which might be done by making it a module array, USEing it early in the program to set it up, and again where the data are required (or, if, like some of ours, your customer prohibits f90, a labeled COMMON is possible).

I assume ASCII is because of portability. Before the simulation can't it be converted to binary (then it would be guaranted to be on the machine representation) and then all the reads would be from this binary.tmp file? This would ensure portability and also speed up I/O, no?

Ricardo Reis 'Non Serviam' @ http://www.lasef.ist.utl.pt @ http://www.radiozero.pt @ http://rreis.tumblr.com @ http://www.flickr.com/photos/rreis
wolfpackNC's picture
Quoting - rreis

I assume ASCII is because of portability. Before the simulation can't it be converted to binary (then it would be guaranted to be on the machine representation) and then all the reads would be from this binary.tmp file? This would ensure portability and also speed up I/O, no?

yeah. I guess I am running out of options here. I'll probably go with your suggestion or just read ascii and compare the time difference.

Thanks for all the suggestions.

jimdempseyatthecove's picture
Quoting - wolfpackNC

Well, I could always try that and measure the penalty.

You're right though. The time to read in the file is insignificant for one "run" of this subprogram. However, considering that this subprogram gets called by another over 62,000 times to finish the simulation all the time delays add up. Currently a full simulation takes up to 6 hours - which we are trying to cut down. Plus we would have to carry around the extra data file -- which we try to minimize as much as possible.

I do realize that 511 is usually plently of continuation lines. So I was trying to find another method to initialize the array.

Thanks for your reply. Any other suggestions?

You read the file and populate the array(s) in common once at program initialization time then call your subprogram in your process loop the 62,000 times.

program YourProgram
call Init ! Read common array data from file and other initialization
call DoWork
call Fini ! close files
end program

subroutine DoWork
do i=1,62000
call subprogram
end do
end DoWork

Jim Dempsey

www.quickthreadprogramming.com
wolfpackNC's picture
Quoting - wolfpackNC

yeah. I guess I am running out of options here. I'll probably go with your suggestion or just read ascii and compare the time difference.

Thanks for all the suggestions.

actually the subprogram *is* the program that has to read the data files. When I say subprogram, I really mean it is being called by an external program that I can't change. I know what you suggested would be the best way to do this but, it's just not an option right now.

I ended up just creating a subroutine that initialized the common using

A(1,1)=X;A(1,2)=Y;......

style.

Thanks for everyone's suggestions.

hirchert728's picture
Quoting - wolfpackNC

disclaimer: this is for a f77 code. but if there is a compelling reason to make a 95 module that is possible.

I am trying to initialize several arrays size of 91,181 containing floating point [real*4] variables. I have the data in ascii test file but we need to avoid the I/O runtime penalty of reading files each run.

So my idea was to create a block data subprogram and initialize the arrays in there then I could avoid reading in data each run.

The problem is that I need massive amounts of continuation statements [ >1400] to do it this way. which I clearly can't do.

Are there any thoughts on better ways to do this?

Thanks!

If you really want to use BLOCK DATA, note that instead of a statement of the general form

DATA A / /

you could use a series of statements of the form

DATA (A(I,1),I=LBOUND(A,1),UBOUND(A,1))/
/
DATA (A(I,2),I=LBOUND(A,1),UBOUND(A,1))/
/

I don't know what your column size is, but my guess is that they would fit within Intel's continuation limit (and probably also the limit in the Fortran standard). If not, you could change the implied DO-loop to only do a fraction of a column in each statement.

I don't know whether it makes a difference with the Intel compiler, but I know that with come compilers I have used, the load-time behavior of the program is orders of magnitude slower if you initialize in row order instead of column order.

[Note: I used LBOUND and UBOUND above because you tell us what your bounds are. If you have literal or symbolic constants for your bounds, you can use them in place of the LBOUND and UBOUND subexpressions.]

Having said all that, I will agree with other responders that on many systems it can be faster to read data from a file (especially if it is a binary file) than to initialize it with DATA initialization. To state what should be obvious, but might not be to you, even if the subroutine is being executed thousands of time, you only want to execute the READ statements the first time. In other words, you want to do something like the following:

LOGICAL::FIRST_TIME = .TRUE. ! this goes in your declarations

IF (FIRST_TIME) THEN
FIRST_TIME = .FALSE.
! Put the statements to read in the data here.
! I would probably use something like
! CALL DO_THE_DATA_READS
! so I could separate the reading from the rest of this subroutine.
END IF

-Kurt

wolfpackNC's picture
Quoting - hirchert
If you really want to use BLOCK DATA, note that instead of a statement of the general form

DATA A / /

you could use a series of statements of the form

DATA (A(I,1),I=LBOUND(A,1),UBOUND(A,1))/
/
DATA (A(I,2),I=LBOUND(A,1),UBOUND(A,1))/
/

I don't know what your column size is, but my guess is that they would fit within Intel's continuation limit (and probably also the limit in the Fortran standard). If not, you could change the implied DO-loop to only do a fraction of a column in each statement.

I don't know whether it makes a difference with the Intel compiler, but I know that with come compilers I have used, the load-time behavior of the program is orders of magnitude slower if you initialize in row order instead of column order.

[Note: I used LBOUND and UBOUND above because you tell us what your bounds are. If you have literal or symbolic constants for your bounds, you can use them in place of the LBOUND and UBOUND subexpressions.]

Having said all that, I will agree with other responders that on many systems it can be faster to read data from a file (especially if it is a binary file) than to initialize it with DATA initialization. To state what should be obvious, but might not be to you, even if the subroutine is being executed thousands of time, you only want to execute the READ statements the first time. In other words, you want to do something like the following:

LOGICAL::FIRST_TIME = .TRUE. ! this goes in your declarations

IF (FIRST_TIME) THEN
FIRST_TIME = .FALSE.
! Put the statements to read in the data here.
! I would probably use something like
! CALL DO_THE_DATA_READS
! so I could separate the reading from the rest of this subroutine.
END IF

-Kurt

"To state what should be obvious, but might not be to you, even if the subroutine is being executed thousands of time, you only want to execute the READ statements the first time."

Yes I realize this but it's just not possible. i.e. I am writing a program that gets called from someone else's. I have no way of changing that.

So every time my program runs it will have to reload the data. I know this is not the best way to do it but I can't change the driver program.

Now, if you say that reading a binary file would be faster than initializing a block data statement then that is something I really need to consider.

Steve Lionel (Intel)'s picture

The suggestion to use multiple DATA statements is a good one, but I'll warn you that the current Intel compiler does not implement this in an optimal fashion and you may find compile, link and load times very long. An unformatted read of a "binary" file, reading the whole array, will be quick.

Steve
jimdempseyatthecove's picture

Not seeing the application we can only throw suggestions at you and hope some stick.

An alternate approach to consider is assuming your application is called by this program you cannot modify, and assuming that the data set is constant for all the subsequent calls (seems to be from your description).

Then the suggestion is to make your application into two parts, a stub and a DLL. The stub checks to see if the DLL is loaded, and if so simply calls it to perform the function. If not (i.e. 1st time call of stub) then load the DLL and call the initialization routine to load the data to an array contained within the DLL. The DLL will have to be setup such that it is not unloaded on each call, but only when appropriate.

One way
Run your shell program that calls the DLL and initializest the data and then have the DLL perform a SYSTEM to launch the external application and then waits for the external application to complete.
The external application calls the DLL to perform the funciton (62,000 times) then exits thus completing the SYSTEM call of your shell.

Jim Dempsey

www.quickthreadprogramming.com
wolfpackNC's picture
Quoting - jimdempseyatthecove
Not seeing the application we can only throw suggestions at you and hope some stick.

An alternate approach to consider is assuming your application is called by this program you cannot modify, and assuming that the data set is constant for all the subsequent calls (seems to be from your description).

Then the suggestion is to make your application into two parts, a stub and a DLL. The stub checks to see if the DLL is loaded, and if so simply calls it to perform the function. If not (i.e. 1st time call of stub) then load the DLL and call the initialization routine to load the data to an array contained within the DLL. The DLL will have to be setup such that it is not unloaded on each call, but only when appropriate.

One way
Run your shell program that calls the DLL and initializest the data and then have the DLL perform a SYSTEM to launch the external application and then waits for the external application to complete.
The external application calls the DLL to perform the funciton (62,000 times) then exits thus completing the SYSTEM call of your shell.

Jim Dempsey

This sounds interesting. never done anything like that before but I'll look in to it.

^^ OK, so itsounds like I should relaly just read from a binary file to start.

I really appriciate all of your help.

Steve Lionel (Intel)'s picture

Jim, remember which forum we're in. No DLLs here. Shared Objects (.so), yes. I don't know how to do on Linux with .so files the various DLL tricks I can do on Windows.

Steve
hirchert728's picture
Quoting - wolfpackNC "To state what should be obvious, but might not be to you, even if the subroutine is being executed thousands of time, you only want to execute the READ statements the first time."

Yes I realize this but it's just not possible. i.e. I am writing a program that gets called from someone else's. I have no way of changing that.

So every time my program runs it will have to reload the data. I know this is not the best way to do it but I can't change the driver program.

Now, if you say that reading a binary file would be faster than initializing a block data statement then that is something I really need to consider.

Are you writing a program that is being invoked by another program (say by a call to SYSTEM) or a subroutine being called as part of a program where you are unable to change the caller (perhaps because it has been provided to you in binary form)?

If you are writing a program that will be invoked 62000 times, there's not much you can do to avoid the cost of loading the data 62000 times -- all you can do is evaluate whether it is faster to do this as part of the program load or by an explicit unformatted read in the program.

On the other had if you are writing a subroutine that will be called 62000 times, it should be possible to do the initialization just once by techniques like the one I presented at the end of my previous post. [In case it was not clear, I was showing code that might be imbedded in your subroutine so it could do something extra in the first call that it does not do in the other 61999 calls.] You claim not to be able to do this, but thus far you have presented no clear indication of why you think that to be the case.

-Kurt

P.S. Since Steve Lionel says that ifort handles DATA statements with implied DO loops less than optimally, I'll offer one other variation.

You started with something like

REAL A(nrows,ncols)
DATA A/
/

My previous suggestion was, in effect,

DATA (A(I,1),I=1,nrows)/
/
DATA (A(I,2),I=1,nrows)/
/
...
DATA (A(I,ncols),I=1,nrows)/
/

An alternative way of doing this might look like

REAL A1(nrows),A2(nrows),...,Ancols(nrows)
EQUIVALENCE (A(1,1),A1),(A(1,2),A2),...,(A(1,ncols),Ancols)
DATA A1/
/
DATA A2/
/
...
DATA Ancols/
/

This eliminates the implied DO loops, so the resulting data initialization might be closer to optimal, at the cost being a bit more verbose and the necessity of using EQUIVALENCE. (If A is in a common block, you simply put A1 through Ancols into the common block in place of A in the version of the common block in the BLOCK DATA, eliminating the EQUIVALENCEs.)

Login to leave a comment.