hi!
i am using a fortran-program which rewrites many files (in sum about 26000 files) into 1 file.
the input-files are opened one by one and closed after reading.
in ivf i have the issue, that the executable just stops without an error after opening/reading/closing about 16000 files. at the moment my workaround is to compile my code in compaq visual fortran. here i dont have any issues.
what can i do to compile the code in ivf?
thank you!
executable crashes / stops without error
executable crashes / stops without error
For more complete information about compiler optimizations, see our Optimization Notice.
it is a console program and it crashes without an error message in the console - just as i wrote in my 1. post!
we are just moving from xp to w7. in xp there is no error message from the operating system. i now tried the programm in w7 and i get an error from the os saying something like "executable does not work anymore. windows can search for a solution online". then you can choose between "search online for a solution and close program" or "close program". no error messages appear in the console.
sorry jim - i overlooked your post. i now added the diagnostic code - the program stops after printing IFILE = 15944. i checked the file which should be opened (as i already did before) and its ok.
Clearly you need to add some diagnostic code to help to trace where the problem lies. I would humbly suggest opening a text file and printing a confirmatory message to it after each successful file open and each successful file close, for a start and then see if it is one particular file that causes problems.
Are you properly closing files after use?
Are you re-using unit numbers or just incrementing the unit number?
If you want help from here, you should post some code showing your Open statement(s) and give some idea what data is being read in and what you do with it once you have read the data in.
I second the advice that Anthony Richards gave you. An incorrect program may work and give the mistakenly expected results -- that's covered by the Fortran Standard's phrase "undefined behavior". When such a program "works" with one compiler and fails with another, that may be the first sign that the program is faulty and needs to be corrected.
The following program copies its source code 26000 times to a single output file, and works as expected using the current release of the Intel compiler.
program wrmany
integer, parameter :: fout=11, fin=12
character(len=132) :: line
integer :: kount
open(fout,name='mulfil.txt',action='write')
do kount=1,26000
open(fin,name='katse.f90',action='read')
do
read(fin,'(A)',end=100)line
write(fout,'(A)')trim(line)
end do
100 close(fin)
end do
close(fout)
end program wrmany
The resulting file has the expected length:
$ ls -l mulfil.txt -rw-r--r-- 1 mece users 11024000 2010-10-08 06:43 mulfil.txt
i am fully aware, that the problem could be caused by bad code. i have checked the input-files. they seem to be ok. they are output-files written by another fortran-programm. here is my code, which i commented a little. i cannot post all input-files, because they are so many. i am doing hydrological modelling and the input-files i am reading are system-states, which are written every timestep - all in all ~26000 files/timesteps. thank you!
[fxfortran] program cdr_zoneoutput_converter c***** declarations ***** integer MAXCOL, MAXROW, MAXFILE parameter (MAXCOL=30) parameter (MAXROW=1000) parameter (MAXFILE=50) parameter (MAXDAY = 30000) character FILENAME*(MAXFILE)(MAXDAY) character INFILE_GRIDS*(MAXFILE) integer NCOL, NROW, ICOL, IROW, IFILE, NFILE character VALUES(MAXCOL,MAXROW)*13 c***** open output files ***** open(unit=1,file='output/BFZON.txt') open(unit=2,file='output/BWOZON.txt') open(unit=3,file='output/BW3ZON.txt') open(unit=4,file='output/DELTASZON.txt') open(unit=5,file='output/ETATZON.txt') open(unit=6,file='output/ETP0ZON.txt') open(unit=7,file='output/ETPEZON.txt') open(unit=8,file='output/ETPRZON.txt') open(unit=9,file='output/MELTZON.txt') open(unit=10,file='output/PRAINSOILZON.txt') open(unit=11,file='output/PSNOWZON.txt') open(unit=12,file='output/PZON.txt') open(unit=13,file='output/QAB1ZON.txt') open(unit=14,file='output/QAB2ZON.txt') open(unit=15,file='output/QAB3ZON.txt') open(unit=16,file='output/QABZON.txt') open(unit=17,file='output/QEX2ZON.txt') open(unit=18,file='output/QVS0ZON.txt') open(unit=19,file='output/SCOVZON.txt') open(unit=20,file='output/SMELTZON.txt') open(unit=21,file='output/SWWZON.txt') open(unit=22,file='output/TOTALSZON.txt') open(unit=23,file='output/TZON.txt') c***** user specifications ***** print*, 'Wieviel Zonen wurden berechnete (1-2000)? ' !some definition of zones read*, NROW NFILE=0 NCOL=24 do 900 IFILE=1, MAXDAY c ** read FILENAME ** open (120, file='list.txt', status='old') !list.txt contains the files to be opened (~26000) NFILE=NFILE+1 read (120, fmt=*, end=100) FILENAME(IFILE) 900 enddo 100 continue close (120) c*** read in cdr-outputfile *** do 1000 IFILE=1, NFILE-1 open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened) read (98,*) do 200, IROW=1, NROW read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL) 200 continue c*** write output_files *** ICOL = 2 do IROW = 1, NROW write (1, fmt='(A$)') VALUES (ICOL,IROW) enddo write (1,*) ICOL = 3 do IROW = 1, NROW write (2, fmt='(A$)') VALUES (ICOL,IROW) enddo write (2,*) ICOL = 4 do IROW = 1, NROW write (3, fmt='(A$)') VALUES (ICOL,IROW) enddo write (3,*) ICOL = 5 do IROW = 1, NROW write (4, fmt='(A$)') VALUES (ICOL,IROW) enddo write (4,*) ICOL = 6 do IROW = 1, NROW write (5, fmt='(A$)') VALUES (ICOL,IROW) enddo write (5,*) ICOL = 7 do IROW = 1, NROW write (6, fmt='(A$)') VALUES (ICOL,IROW) enddo write (6,*) ICOL = 8 do IROW = 1, NROW write (7, fmt='(A$)') VALUES (ICOL,IROW) enddo write (7,*) ICOL = 9 do IROW = 1, NROW write (8, fmt='(A$)') VALUES (ICOL,IROW) enddo write (8,*) ICOL = 10 do IROW = 1, NROW write (9, fmt='(A$)') VALUES (ICOL,IROW) enddo write (9,*) ICOL = 11 do IROW = 1, NROW write (10, fmt='(A$)') VALUES (ICOL,IROW) enddo write (10,*) ICOL = 12 do IROW = 1, NROW write (11, fmt='(A$)') VALUES (ICOL,IROW) enddo write (11,*) ICOL = 13 do IROW = 1, NROW write (12, fmt='(A$)') VALUES (ICOL,IROW) enddo write (12,*) ICOL = 14 do IROW = 1, NROW write (13, fmt='(A$)') VALUES (ICOL,IROW) enddo write (13,*) ICOL = 15 do IROW = 1, NROW write (14, fmt='(A$)') VALUES (ICOL,IROW) enddo write (14,*) ICOL = 16 do IROW = 1, NROW write (15, fmt='(A$)') VALUES (ICOL,IROW) enddo write (15,*) ICOL = 17 do IROW = 1, NROW write (16, fmt='(A$)') VALUES (ICOL,IROW) enddo write (16,*) ICOL = 18 do IROW = 1, NROW write (17, fmt='(A$)') VALUES (ICOL,IROW) enddo write (17,*) ICOL = 19 do IROW = 1, NROW write (18, fmt='(A$)') VALUES (ICOL,IROW) enddo write (18,*) ICOL = 20 do IROW = 1, NROW write (19, fmt='(A$)') VALUES (ICOL,IROW) enddo write (19,*) ICOL = 21 do IROW = 1, NROW write (20, fmt='(A$)') VALUES (ICOL,IROW) enddo write (20,*) ICOL = 22 do IROW = 1, NROW write (21, fmt='(A$)') VALUES (ICOL,IROW) enddo write (21,*) ICOL = 23 do IROW = 1, NROW write (22, fmt='(A$)') VALUES (ICOL,IROW) enddo write (22,*) ICOL = 24 do IROW = 1, NROW write (23, fmt='(A$)') VALUES (ICOL,IROW) enddo write (23,*) close (98) 1000 enddo close (1) close (2) close (3) close (4) close (5) close (6) close (7) close (8) close (9) close (10) close (11) close (12) close (13) close (14) close (15) close (16) close (17) close (18) close (19) close (20) close (21) close (22) close (23) print*,'finished normally' stop 300 Print*, 'ERROR opening ', FILENAME(IFILE) goto 302 301 Print*, 'ERROR reading ', FILENAME(IFILE) Print*, 'Aborted... ' 302 stop end could be caused by wrong coding. [/fxfortran]>>open(unit=6,file='output/ETP0ZON.txt')
Look for your error message in the above file
I suggest changing that unit number (and 5) to something else
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.comThese days the general concensus is to avoid using unit numbers less than 10, but certainly avoid 5 and 6.
AlsoI would suggest moving the close(98) to just after the "200 continue" and maybe add an "err=" clause.(keeps everything about unit 98 together) and consider replacing all those write loops with something like :
write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow)
write (1,*)
(Also replacing the unit numbers "1"..."9" with whatever he uses in accordance with my first sentence.)
I know it's a style thing, but to my mind is much more readable than the original code.
Les
Your program is overrunning the default maximum record length (132 bytes) for formatted I/O, since you are writing output files with records as large as NROW*13 in length. To correct this error, you need to use the RECL=... option in the OPEN statements for the output files.
You should probably use a format of (A,1x,$) instead of (A,$) in the WRITE statements so that you can separate the fields.
In general, using unit numbers 10 and above for external files has a better chance of avoiding clashes with special files (standard input, output, error, punch, printer, etc.).
So, instead of using units 1 to 23, you could use units 11 to 33 and see if the error persists.
Re Jim Dempsey's reply: what he pointed out was that since you had used the standard output unit (6) to open an external file, I/O and other error messages generated during the run would have been directed to the file rather than being displayed at the console.
thank you for your hints. i am a "free time" programmer, so some stuff is new to me.
i adopted the code (write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow) and unit numbers). i also changed the RECL to 1024, but i have no clue, if this is right. i didnt change the output formats, since a blank is good for me. still the executable crashes.
i have the same problem with another code i wrote to rewrite some ascii grids. maybe you can see some similarities in the codes i dont see. the similarity i see, is that also here many files are opened (~100000). here the code:
program rewrite_inca
c***** declaration *****
integer MAXCOL, MAXROW, MAXFILE
parameter (MAXCOL=1000)
parameter (MAXROW=1000)
parameter (MAXFILE=40)
character FILENAME*(MAXFILE), FOLDERNAME*(8)
integer NCOL, NROW, ICOL, IROW, IFILE, I, I1, FLIP, ZEILE
integer stitch, flag
real VAL(MAXCOL,MAXROW), NODATA_VALUES
character XLLCORNER*(MAXFILE), YLLCORNER*(MAXFILE)
character CELLSIZE*(MAXFILE)
flag = 0
open (36, file='RR_RWERROR_Log.txt')
do 500 IFILE=1, 150000
c ** read FILENAME **
open (20, file='list_folder.txt', err=150)
read (20, fmt=*, err=150, end=120) FOLDERNAME
goto 121
120 continue
flag = 1
121 continue
print*, 'DATE: ', FOLDERNAME
c ** read INCA-file
do 501 I1=1, 96
open (21, file='list_files_RR.txt', err=151)
read (21, fmt=*, err=151) FILENAME
print*, 'File: ', FILENAME
open (unit=30,file='RR/'//FOLDERNAME//'/'//FILENAME,
NCOL = 601
NROW = 351
FLIP = 0
do IROW=1, NROW
ZEILE=NROW-FLIP
STITCH = 0
do I=1, 60
read (30,*,err=600) (VAL((ICOL+STITCH),ZEILE), ICOL=1, 10)
STITCH = STITCH + 10
enddo
read (30,*,err=600) VAL(601,ZEILE)
FLIP = FLIP + 1
enddo
c***** create ascii grids *****
c ** INCA domain
XLLCORNER = '99500'
YLLCORNER = '249500'
CELLSIZE = '1000.'
NODATA_VALUES = -9999.
open (unit=89,file='d:tempdest
+/'//FOLDERNAME//FILENAME)
write (89,fmt=*) 'ncols', NCOL
write (89,fmt=*) 'nrows', NROW
write (89,fmt='(A,A)') ' xllcorner ', XLLCORNER
write (89,fmt='(A,A)') ' yllcorner ', YLLCORNER
write (89,fmt='(A,A)') ' cellsize ', CELLSIZE
write (89,fmt=*) 'NODATA_value', NODATA_VALUES
do IROW = 1, NROW
do ICOL = 1, NCOL
write (89, fmt='(F7.2$)') VAL (ICOL,IROW)
enddo
write (89, fmt=*)
enddo
close (30)
close (89)
501 continue
goto 602
600 continue
write (36,fmt=*) FOLDERNAME, ' - ', FILENAME
602 continue
close (21)
if (flag.eq.1) goto 100
500 continue
goto 100
150 continue
print*, 'ERROR opening/reading "list_folder.txt"...'
goto 100
151 continue
print*, 'ERROR opening/reading "list_files.txt"...'
goto 100
152 continue
print*, 'ERROR opening/reading Orig. INCA-file...'
100 continue
close(36)
print*,'finished normally'
end
i also changed the RECL to 1024, but i have no clue, if this is right.
This is probably not enough, unless NROW is less than 79. You need to increase RECL to the length of the longest formatted line output. If NROW were 2000, for example, RECL would need to be over 26000 (add to this to allow for field separators, end-of-line).
Use an old standby practice of inserting trace code into your program
do 1000 IFILE=1, NFILE-1
write(*,*) 'IFILE=', IFILE
open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened)
read (98,*)
write(*,*) 'read rows'
do 200, IROW=1, NROW
read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200 continue
write(*,*) 'read complete'
...
ICOL = 22
write(*,*) 'ICOL = ', ICOL
do IROW = 1, NROW
write (21, fmt='(A$)') VALUES (ICOL,IROW)
enddo
write(*,*) 'done'
write (21,*)
...
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.comIf your program crashes, then clearly it starts and runs for a time and so it is not a compilation problem at present.
If your program 'Crashes', then there will be one or more informative error messages output to the console (assuming it is a console program).
If you want to diagnose the problem, showing us the error message you get when the program 'crashes', in your words, is the minimum information we need. So please can you oblige? Otherwise it's blind guessing, which is a waste of everyone's time.
If you insert diagnostic code into your program, as is highly recommended by posters who are trying to help, then at least you should discover where in your programmed loop the program fails. Then you can start delving into the code, with the help of execution error messages, to try and pin down the exact cause of the program to fail.
Assuming you alsohave
read (98,*)
write(*,*) 'read rows'
and you didn't see the 'read rows' message then
I suggest you change the read to
read(98,*,iostat=ier,err=399)
Then at label 399 you print out the iostat error number
-2 means end-of-record condition for nonadvancing read
-1 means end-of-file condition
+ve integer >0means an error occurred See the list of Run-Time Error Messages in the help
(I thought there was a subroutine we could call to get the text from an iostat code but a quick skim of the help didn't show it. Maybe I didn't look hard enough)
Anyway now you know it occurs when IFILE is 15944 you can add more debug statements along the lines of
if (IFILE==15944) then
print "Filename = ",FILENAME(IFILE),"#" ! prove that you have the correct file name and
! check it doesn't contain special characters for example
endif
etc.
Les
i also implemented the advised code of les:
[fxfortran] do 1000 IFILE=1, NFILE-1 write(*,*) 'IFILE ', IFILE if (IFILE.gt.15000) write(*,*) 'FILE before open ',FILENAME(IFILE) open (unit=98,file='input/'//FILENAME(IFILE),status='old', + IOSTAT=IER,err=300) if (IFILE.gt.15000) write(*,*) 'FILE after open ', FILENAME(IFILE) if (IFILE.gt.15000) write(*,*) 'IOSTAT - Open ', IER read(98,*,iostat=ier,err=399) [/fxfortran]the exe doesnt necessarily stop at IFILE = 15944. it also happened at 15942 and 15901. so it must not be a problem of the input-files, as they were read before (exept 15944). IOSTAT is 0 after opening, unfortunately the program does not jump to error-label 399, as it should.
my command line looks like this:
Can I suggest that you compile/build/runa version of the exe with all of the check options on ?
i.e. /warn:all /check:all
The first may catch compile problems (if any) and the second catch any run-time problems with array/string bounds exceeded, uninitialised variables etc.
There is definitely something strange going on.
Les
i was already using /warn:all /check:all. no errors/warnings are shown.
opening the w7 taskmanager and looking at the "harddisk properties" (i have a german version - i dont know how its called in english) where one can see the files being used, it shows that many files are listed. the closing of the files seems to be rather slow. so i tried to idle the cpu for some time after 15000 files, so the files could be closed. the closing worked (or at least the taskmanager showed so), but the exe still stoped.
Just a stab in the dark...
Are any of your paths network mapped drives (e.g. D:)?
The reason I ask is I have an old (legacy) Win32 app (non-Fortran) that performs a very large number of file directory search/open/copy/close operations. Works fine from XP to XP but has problems XP to Vista (writing to Vista) where it gets several 1000's of files into the program a network resource limitation is reached. Apprently the OS is trying to throttle down the activity with an error that ought to be retried by application. Resuming (several times) completes the application. I did not add code to test for this error, then pause for a while, then resume.
Did you use Les's suggestion for collecting io status code in addition to taking error dispatch to READ (and WRITE)?
Jim
Blog: The Parallel Void
www.quickthreadprogramming.comWith respect to my above note, try this:
In your main loop add something like
if(mod(ifile,10000) == 0) then
write(*,*) 'Sleeping 30 seconds...'
sleepqq(30000) ! 30 second wait
endif
Jim
Blog: The Parallel Void
www.quickthreadprogramming.comthe drive i am running the program on is a local one.
i tried the delaying of the programm already - but with sleep(60). i now tried the sleepqq(30000) you suggested, but unfortunately it doesnt help.
i implemented the iostatus code. the status of the last opened file is 0. with the first read statement, the programm crashes without jumping to the error label (see reply #15).
Are you on a workstation or server?
If server, does the system have a system policy to detect and kill a runaway program.
Users on Linux get a nasty surprise is OOM_Killer is running on the system and decides to kill your application.
Other than trying to duplicate your situation we are running out of options here.
Last thing to try this: Create a batch file (CMD script)
: foo.bat
yourProgramHere yourArgsHere
IF ERRORLEVEL 1 ECHO Error level of 1 or greater %ERRORLEVEL%
Then run the batch
If you see the error message something is causing your program to exit abnormaly.
The error level value may or may not print out depending on CMD option /E:ON
Jim
Blog: The Parallel Void
www.quickthreadprogramming.comi am working on a workstation with w7. i cant imagine, that an application is killing my fortran-exe, because the exe compiled with CVF works.
created the batch file: the executable still crashes and i get the error level -1073741819.
i have attached 3 files:
cdr_output_to_timeseries_modMathew.f
list.txt
output_200701010100.txt
list.txt should be in the root-folder
output_200701010100.txt should be in \input
a folder \output is also needed.
maybe you can duplicate my situation.
thank you!
Attachments:
| Attachment | Size |
|---|---|
| Download | 9.1 KB |
| Download | 642.16 KB |
| Download | 184.82 KB |
Running test program now. Using IFV 11.0.66 on WinXP Pro x64 in 64-bit Debug build. With full debug checking this will take a while to reach problem point, ~5 ifiles per second. At 1700 now.
Jim
Blog: The Parallel Void
www.quickthreadprogramming.com>>created the batch file: the executable still crashes and i get the error level -1073741819
This is C0000005 STATUS_ACCESS_VIOLATION
In this situation it would mean file access violation.
Let's see if it occures on my system too
Jim
at 9000 now
Blog: The Parallel Void
www.quickthreadprogramming.comi run the program on xp 32 bit and w7 64 bit.
my about dialog says i using compiler:
Intel Visual Fortran Compiler Integration Package ID: w_cprof_p_11.1.054
Intel Visual Fortran Compiler Integration for Microsoft Visual Studio 2008, 11.1.3469.2008
Mathew,
I can reproduce the problem here. Crashes at file 16,150 with 0xC0000005
RSP = 0x30FF0
Which is just below the top of an unmapped page. IOW the stack local variables of the current stack frame. Then call fails due to write to non-existant memory for the return address. As to how it got into this situation??
a) something modified stack frame pointer on stack
b) something caused "infinate" recursion (error in error recovery/reporting routine)
So the error occures in both 32-bit and 64-bit application on Windows XP x64 and Windows 7 x64 using IVF 11.0.066 and IVF 11.1.054
Premier suppor should be able to reproduce this problem given your test program and test files.
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.comHere is a shorter program based on the posted program; this also shows the same buggy behavior. The shorter program also runs about 100 files/second on a 2 GHz Athlon-X2, using Intel 11.1.067 32 or 64 bit compilers. The same input/output files are used as given by MR-KATSE. The errors do not occur on Linux using the Intel compiler 11.1.073.
[fxfortran] program cdr_zoneoutput_converterinteger MAXCOL, MAXROW, MAXFILE, MAXDAY
parameter (MAXCOL=30)
parameter (MAXROW=1000)
parameter (MAXFILE=50)
parameter (MAXDAY = 30000)
parameter (IRL=10400)
character FILENAME*(MAXFILE)(MAXDAY)
character*12 onames(23)
data onames/'BFZON','BWOZON','BW3ZON','DELTAZON','ETATZON',
1 'ETAP0ZON','ETPEZON','ETPRZON','MELTZON',
2 'PRAINSOILZON','PSNOWZON','PZON','QAB1ZON',
3 'QAB2ZON','QAB3ZON','QABZON','QEX2ZON','QVS0ZON',
4 'SCOVZON','SMELTZON','SWWZON','TOTALSZON','TZON'/
integer NCOL, NROW, ICOL, IROW, IFILE, NFILE, IER
character VALUES(MAXCOL,MAXROW)*13
integer tval(8)
do iunit=11,33
open(unit=iunit,
1 file='output/' // trim(onames(iunit-10)) // '.txt',
2 ACCESS='sequential', RECL=IRL)
end do
NROW = 791
Print*, 'NROW = ', NROW
NFILE=0
NCOL=24
do 900 IFILE=1, MAXDAY
c ** read FILENAMES **
open (120, file='list.txt', status='old')
NFILE=NFILE+1
read (120, fmt=*, end=100) FILENAME(IFILE)
900 enddo
100 continue
close (120)
nfile=nfile-1
write(*,*)' nFile = ',nfile
c*** read in cdr-outputfile ***
do 1000 IFILE=1, NFILE
if(mod(ifile,100).eq.0)then
write(*,*) 'IFILE ', IFILE
endif
open (unit=98,file='input/'//FILENAME(IFILE),
+ status='old', IOSTAT=IER,err=300)
if(ier.ne.0) then
write(*,*)' Ifile, IOSTAT ',ifile,ier
endif
read(98,*,iostat=ier,err=399)
do 200, IROW=1, NROW
read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200 continue
close (98)
c*** write output_files ***
DO ICOL=2,24
write (icol+9, fmt='(791A)')
+ (VALUES (ICOL,IROW), IROW=1,NROW) !Format repeat count=NROW value
end do
1000 enddo
do ICOL=2,24
close (icol+9)
end do
print*,'finished normally'
stop
300 Print*, 'ERROR opening ', FILENAME(IFILE)
goto 302
301 Print*, 'ERROR reading ', FILENAME(IFILE)
Print*, 'Aborted... '
399 Print*, 'IOSTAT - read =', ier
302 stop
end
[/fxfortran]
The problem is caused by this line:
[fxfortran] open (unit=98,file='input/'//FILENAME(IFILE),status='old', + IOSTAT=IER,err=300)[/fxfortran]The compiler creates a stack temporary for the file= expression but does not remove it from the stack. After a long while, the stack is exhausted but apparently this is not detected with a normal stack overflow message.
A workaround is to assign the value 'input/'//FILENAME(IFILE) to a character variable and then pass the variable as the file= value. The program seems to work on Linux because the default stacksize is larger there.
I will report this to the developers. Issue ID is DPD200161714.
Here is a simple (and quicker) reproducer.
character(1000) padding do i=1,2000 write (padding,'(I5.5,A)') i,'.txt' open (unit=1,file='input'//padding,disp='delete') close (1) end do end
Steve
Gooddiagnosis Steve.
I guess that this same bug would be present with any statement that would generate a temp for dummy arg. Such as function or subroutine call with concatinated argument.
Would you know if
MyFile = 'input'//something
generates a temp or concatinates directly into MyFile?
Jim
Blog: The Parallel Void
www.quickthreadprogramming.comYou just have to know what you're looking at. I've been doing this a long time...
Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.
Steve
As long as the variable you pick is longer than the longest possible filename, it will be fine. You can probably make it shorter than I had it.
Steve
Quoting Steve Lionel (Intel)
You just have to know what you're looking at. I've been doing this a long time...
Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.
So this is an old problem that resurfaces from time to time. Time to get it fixed.
If you are looking to reduce the number of stack cleanups of these temps then consider having the compiler determine if a loop contains the creation of these temps then if so, create a hidden local save stack pointer variable for that loop, copy stack pointer to this variable immediately prior to loop start, then at front ofbody of loop restore stack pointer prior to first statement in scope of loop. Loops without such temporaries will not incure this additional overhead. Also, if the number of iterations of the loop is known to be small then you could bypass the save/restore code (providedtemp is also known to be small).
What you would be doing is trading off
Code that has known problem to potentially cause stack overflow given enough iterations.
against
Code that if passes first iteration is known to not (directly) cause stack overflow
at the expense of
mov esp,[ebp+offsetToHiddenSaveStackPointer]
A fair trade-off IMHO.
For anyone else reading this I suggest we take a straw poll and reply to this thread with your vote/comment.
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.comHello,
i am happy that your are able to find the reason for a problem that i have since many years.
(http://software.intel.com/en-us/forums/showthread.php?t=42116&o=a&s=lr)
Please find a good solution.
Thanks in advance
Frank
Jim's proposal for fixing the problem with stack temporaries overrunning stack limits is reasonable. However, I am inclined to consider means to fix the problem, more specifically how much stack growth is permitted before cleaning up, as implementation issues related to optimization.
In fact, if debug/check options have been specified, or optimization level zero has been requested, the compiler ought not to permit this stack overrun to occur or, if that cannot be avoided, the runtime should provide a clear message and a traceback.
With my shorter example code, I tried to get a traceback, but after stack overflow the program simply quit with no hint that anything went wrong. Rerunning the program with Cygwin/GDB made me note the stack overflow.
A user should not have to stoop to assembler level and monitor the ESP register if stack overflow is suspected.
Such behavior is what had Frank "tropfen" stumped for four years, and needs to be rectified. His thread ends with a reference to "Issue 356587". It is not clear if anything was done to resolve Issue 356587 between 2006 and now.
mecej4,
Other than for optimization related differences, if the problem is systemic in Release Build it should be systemic in Debug Build - Otherwise you will have less of a chance in finding the problem.
RE: Debug and Stack Overflow.
Debug mode should have a stack guard page at bottom of stack. If the stack ever encroaches into this guard page then a debug exception should be raised. The compiler team can decide on how to impliment this. Had this feature been available to the original poster then this problem would have quickly been identified by either the original poster or any of us others monitoring this thread.
Jim Dempsey
Blog: The Parallel Void
www.quickthreadprogramming.comI tried a slightly modified version of Steve Lionel's 7-line reproducer on Win-7. The Fortran run-time on this OS detects the stack overflow and prints a message, but does not give a
traceback.
program blowstack
character(len=1000) padding
iesp0=iesp() ! initial stack pointer
do i=1,2000
write (padding,'(I5.5,A)') i,'.txt'
open (unit=1,file='input'//padding)
close (1,status='delete')
write(*,'(1x,I4,2x,Z08)')i,iesp0-iesp() ! stack consumed
end do
end program blowstackThe code for the utility function iesp() is, for 32-bit Windows:
.686P
.model flat
PUBLIC _IESP
_TEXT SEGMENT
_IESP PROC
lea eax, dword ptr [esp + 4]
ret
_IESP ENDP
_TEXT ENDS
END
and, for 64-bit Windows:
PUBLIC IESP _TEXT SEGMENT IESP PROC lea rax, qword ptr [rsp + 8] ret IESP ENDP _TEXT ENDS END
For the default stack of 0x100000, the program crashes with the last few lines of output (this is for the 32-bit version; the numbers are slightly different for the 64-bit version):
1018 FA860 1019 FAC50 1020 FB040 forrtl: severe (170): Program Exception - stack overflow Stack trace terminated abnormally.
The compiler does generate stack checking code in most cases, which will give a reasonable error. I'm not sure what happened here. It may be that the stack check is done only for "automatic" variables, allocated at the beginning of the routine. I will ask.
Steve
Hello,
looking at your suggestions, i would prefer that there will be an automatic cleaning of the stack. I do not know during programing how many files will be opend during execution. Increasing the stack just for care (what i do currently) is not a good solution for me.
Frank
mecej4,
You can stay in FORTRAN and get the stack pointer
integer(C_PTR) :: StackLoc
...
StackLoc = LOC(StackLoc)
Place thatat start ofPROGRAM, then copy StackLoc to global InitialStackLock variable.
Then in subroutines, insert the above code. You can then check for stack consumed, but this will not tell you if you are getting close to stack overflow.
There is a C runtime library call that you can call from FORTRAN to obtain the remaining stack. Look in MSDN for the Windows version or in ??? for Linux.
On Windows single threaded there will be a fixed floor such as 0x10000 (verify this). For multi-threaded this will not be the case as each thread has a seperate stack and each stack has a lowest mapped location with guard page below that. On Linux (*ux) your stack grows till some other limiting value.
Jim
Blog: The Parallel Void
www.quickthreadprogramming.comThis issue was fixed in 12.0 Update 2.
Steve




