executable crashes / stops without error

executable crashes / stops without error

hi!
i am using a fortran-program which rewrites many files (in sum about 26000 files) into 1 file.
the input-files are opened one by one and closed after reading.
in ivf i have the issue, that the executable just stops without an error after opening/reading/closing about 16000 files. at the moment my workaround is to compile my code in compaq visual fortran. here i dont have any issues.
what can i do to compile the code in ivf?
thank you!

46 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

it is a console program and it crashes without an error message in the console - just as i wrote in my 1. post!

we are just moving from xp to w7. in xp there is no error message from the operating system. i now tried the programm in w7 and i get an error from the os saying something like "executable does not work anymore. windows can search for a solution online". then you can choose between "search online for a solution and close program" or "close program". no error messages appear in the console.

sorry jim - i overlooked your post. i now added the diagnostic code - the program stops after printing IFILE = 15944. i checked the file which should be opened (as i already did before) and its ok.

Clearly you need to add some diagnostic code to help to trace where the problem lies. I would humbly suggest opening a text file and printing a confirmatory message to it after each successful file open and each successful file close, for a start and then see if it is one particular file that causes problems.

Are you properly closing files after use?

Are you re-using unit numbers or just incrementing the unit number?

If you want help from here, you should post some code showing your Open statement(s) and give some idea what data is being read in and what you do with it once you have read the data in.

I second the advice that Anthony Richards gave you. An incorrect program may work and give the mistakenly expected results -- that's covered by the Fortran Standard's phrase "undefined behavior". When such a program "works" with one compiler and fails with another, that may be the first sign that the program is faulty and needs to be corrected.

The following program copies its source code 26000 times to a single output file, and works as expected using the current release of the Intel compiler.

program wrmany
      integer, parameter :: fout=11, fin=12
      character(len=132) :: line
      integer :: kount
      open(fout,name='mulfil.txt',action='write')
      do kount=1,26000
          open(fin,name='katse.f90',action='read')
          do
              read(fin,'(A)',end=100)line
              write(fout,'(A)')trim(line)
          end do
  100     close(fin)
      end do
      close(fout)
end program wrmany
 

The resulting file has the expected length:

$ ls -l mulfil.txt 
-rw-r--r-- 1 mece users 11024000 2010-10-08 06:43 mulfil.txt

i am fully aware, that the problem could be caused by bad code. i have checked the input-files. they seem to be ok. they are output-files written by another fortran-programm. here is my code, which i commented a little. i cannot post all input-files, because they are so many. i am doing hydrological modelling and the input-files i am reading are system-states, which are written every timestep - all in all ~26000 files/timesteps. thank you!

[fxfortran]      program cdr_zoneoutput_converter
      

c***** declarations *****
      integer MAXCOL, MAXROW, MAXFILE
      parameter (MAXCOL=30)
      parameter (MAXROW=1000)
      parameter (MAXFILE=50)
      parameter (MAXDAY = 30000)      
      character FILENAME*(MAXFILE)(MAXDAY)      
      character INFILE_GRIDS*(MAXFILE)      
      integer NCOL, NROW, ICOL, IROW, IFILE, NFILE
      character VALUES(MAXCOL,MAXROW)*13
      

c***** open output files *****
      
       open(unit=1,file='output/BFZON.txt')
       open(unit=2,file='output/BWOZON.txt')
       open(unit=3,file='output/BW3ZON.txt')
       open(unit=4,file='output/DELTASZON.txt')
       open(unit=5,file='output/ETATZON.txt')
       open(unit=6,file='output/ETP0ZON.txt')
       open(unit=7,file='output/ETPEZON.txt')
       open(unit=8,file='output/ETPRZON.txt')
       open(unit=9,file='output/MELTZON.txt')
       open(unit=10,file='output/PRAINSOILZON.txt')
       open(unit=11,file='output/PSNOWZON.txt')
       open(unit=12,file='output/PZON.txt')
       open(unit=13,file='output/QAB1ZON.txt')
       open(unit=14,file='output/QAB2ZON.txt')
       open(unit=15,file='output/QAB3ZON.txt')
       open(unit=16,file='output/QABZON.txt')
       open(unit=17,file='output/QEX2ZON.txt')
       open(unit=18,file='output/QVS0ZON.txt')
       open(unit=19,file='output/SCOVZON.txt')
       open(unit=20,file='output/SMELTZON.txt')
       open(unit=21,file='output/SWWZON.txt')
       open(unit=22,file='output/TOTALSZON.txt')
       open(unit=23,file='output/TZON.txt')

  
c***** user specifications *****      
      print*, 'Wieviel Zonen wurden berechnete (1-2000)? ' !some definition of zones
	read*, NROW
     

      NFILE=0
      NCOL=24

      do 900 IFILE=1, MAXDAY

c  ** read FILENAME **
      open (120, file='list.txt', status='old') !list.txt contains the files to be opened (~26000)
      NFILE=NFILE+1
      read (120, fmt=*, end=100) FILENAME(IFILE)
      
900   enddo

100   continue 
      close (120)   
  
c*** read in cdr-outputfile ***

      do 1000 IFILE=1, NFILE-1
      
      open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened)

      read (98,*) 

       do 200, IROW=1, NROW
        read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200    continue

c*** write output_files ***
       ICOL = 2
        do IROW = 1, NROW
         write (1, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (1,*)
       ICOL = 3
        do IROW = 1, NROW
         write (2, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (2,*)
       ICOL = 4
        do IROW = 1, NROW
         write (3, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (3,*)
       ICOL = 5
        do IROW = 1, NROW
         write (4, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (4,*)
       ICOL = 6
        do IROW = 1, NROW
         write (5, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (5,*)
       ICOL = 7
        do IROW = 1, NROW
         write (6, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (6,*)
       ICOL = 8
        do IROW = 1, NROW
         write (7, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (7,*)
       ICOL = 9
        do IROW = 1, NROW
         write (8, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (8,*)
       ICOL = 10
        do IROW = 1, NROW
         write (9, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (9,*)
       ICOL = 11
        do IROW = 1, NROW
         write (10, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (10,*)
       ICOL = 12
        do IROW = 1, NROW
         write (11, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (11,*)
       ICOL = 13
        do IROW = 1, NROW
         write (12, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (12,*)
       ICOL = 14
        do IROW = 1, NROW
         write (13, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (13,*)
       ICOL = 15
        do IROW = 1, NROW
         write (14, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (14,*)
       ICOL = 16
        do IROW = 1, NROW
         write (15, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (15,*)
       ICOL = 17
        do IROW = 1, NROW
         write (16, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (16,*)
       ICOL = 18
        do IROW = 1, NROW
         write (17, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (17,*)
       ICOL = 19
        do IROW = 1, NROW
         write (18, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (18,*)
       ICOL = 20
        do IROW = 1, NROW
         write (19, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (19,*)
       ICOL = 21
        do IROW = 1, NROW
         write (20, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (20,*)
       ICOL = 22
        do IROW = 1, NROW
         write (21, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (21,*)
       ICOL = 23
        do IROW = 1, NROW
         write (22, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (22,*)
       ICOL = 24
        do IROW = 1, NROW
         write (23, fmt='(A$)') VALUES (ICOL,IROW)
        enddo
         write (23,*)

      close (98)   
         
      
  
1000      enddo

       close (1)
       close (2)
       close (3)
       close (4)
       close (5)
       close (6)
       close (7)
       close (8)
       close (9)
       close (10)
       close (11)
       close (12)
       close (13)
       close (14)
       close (15)
       close (16)
       close (17)
       close (18)
       close (19)
       close (20)
       close (21)
       close (22)
       close (23)
      print*,'finished normally' 
      stop      

300   Print*, 'ERROR opening ', FILENAME(IFILE)
      goto 302
301   Print*, 'ERROR reading ', FILENAME(IFILE)
      Print*, 'Aborted... '            

302   stop
      end



could be caused by wrong coding.

[/fxfortran]

>>open(unit=6,file='output/ETP0ZON.txt')

Look for your error message in the above file
I suggest changing that unit number (and 5) to something else

Jim Dempsey

www.quickthreadprogramming.com

These days the general concensus is to avoid using unit numbers less than 10, but certainly avoid 5 and 6.
AlsoI would suggest moving the close(98) to just after the "200 continue" and maybe add an "err=" clause.(keeps everything about unit 98 together) and consider replacing all those write loops with something like :

write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow)
write (1,*)
(Also replacing the unit numbers "1"..."9" with whatever he uses in accordance with my first sentence.)
I know it's a style thing, but to my mind is much more readable than the original code.

Les

what do you mean with "Look for your error message in the above file"?

i changed unit=5 to 95 and unit=6 to 96. problem unfortunately still prevails.

Your program is overrunning the default maximum record length (132 bytes) for formatted I/O, since you are writing output files with records as large as NROW*13 in length. To correct this error, you need to use the RECL=... option in the OPEN statements for the output files.

You should probably use a format of (A,1x,$) instead of (A,$) in the WRITE statements so that you can separate the fields.

In general, using unit numbers 10 and above for external files has a better chance of avoiding clashes with special files (standard input, output, error, punch, printer, etc.).

So, instead of using units 1 to 23, you could use units 11 to 33 and see if the error persists.

Re Jim Dempsey's reply: what he pointed out was that since you had used the standard output unit (6) to open an external file, I/O and other error messages generated during the run would have been directed to the file rather than being displayed at the console.

thank you for your hints. i am a "free time" programmer, so some stuff is new to me.
i adopted the code (write (1, fmt='(A$)') (VALUES (2,IROW), irow=1,nrow) and unit numbers). i also changed the RECL to 1024, but i have no clue, if this is right. i didnt change the output formats, since a blank is good for me. still the executable crashes.

i have the same problem with another code i wrote to rewrite some ascii grids. maybe you can see some similarities in the codes i dont see. the similarity i see, is that also here many files are opened (~100000). here the code:

      program rewrite_inca

c***** declaration *****

      integer MAXCOL, MAXROW, MAXFILE
      parameter (MAXCOL=1000)
      parameter (MAXROW=1000)
      parameter (MAXFILE=40)
      
      character FILENAME*(MAXFILE), FOLDERNAME*(8)
      integer NCOL, NROW, ICOL, IROW, IFILE, I, I1, FLIP, ZEILE
      integer stitch, flag
      real VAL(MAXCOL,MAXROW), NODATA_VALUES
      character XLLCORNER*(MAXFILE), YLLCORNER*(MAXFILE)
      character CELLSIZE*(MAXFILE)
    
       flag = 0
       open (36, file='RR_RWERROR_Log.txt')

      do 500 IFILE=1, 150000


c  ** read FILENAME ** 

 open (20, file='list_folder.txt', err=150)
       read (20, fmt=*, err=150, end=120) FOLDERNAME

       goto 121
120   continue
       flag = 1
121   continue

       print*, 'DATE: ', FOLDERNAME

c  ** read INCA-file
      do 501 I1=1, 96 

       open (21, file='list_files_RR.txt', err=151)
       read (21, fmt=*, err=151) FILENAME
       print*, 'File: ', FILENAME
      open (unit=30,file='RR/'//FOLDERNAME//'/'//FILENAME,

      NCOL = 601
	NROW = 351

      FLIP = 0
      do IROW=1, NROW
         ZEILE=NROW-FLIP

	  STITCH = 0
	  do I=1, 60
           read (30,*,err=600) (VAL((ICOL+STITCH),ZEILE), ICOL=1, 10)
          STITCH = STITCH + 10
	  enddo
	read (30,*,err=600) VAL(601,ZEILE)
      FLIP = FLIP + 1
      enddo


c***** create ascii grids *****

c **  INCA domain
      XLLCORNER = '99500'
	YLLCORNER = '249500'
	CELLSIZE = '1000.'
      NODATA_VALUES = -9999.

      open (unit=89,file='d:tempdest
     +/'//FOLDERNAME//FILENAME)

      write (89,fmt=*) 'ncols', NCOL
      write (89,fmt=*) 'nrows', NROW
      write (89,fmt='(A,A)') ' xllcorner ', XLLCORNER
      write (89,fmt='(A,A)') ' yllcorner ', YLLCORNER
      write (89,fmt='(A,A)') ' cellsize ', CELLSIZE
      write (89,fmt=*) 'NODATA_value', NODATA_VALUES

      do IROW = 1, NROW
        do ICOL = 1, NCOL
        write (89, fmt='(F7.2$)') VAL (ICOL,IROW)
        enddo
       write (89, fmt=*)
      enddo
	   
	close (30)
	close (89)


501   continue

      goto 602
600   continue
      write (36,fmt=*) FOLDERNAME, ' - ', FILENAME
602   continue

      close (21)
      if (flag.eq.1) goto 100
      
500   continue

      goto 100
150   continue
      print*, 'ERROR opening/reading "list_folder.txt"...'
      goto 100
151   continue
      print*, 'ERROR opening/reading "list_files.txt"...'
      goto 100
152   continue
      print*, 'ERROR opening/reading Orig. INCA-file...'
100   continue 
      close(36)
      print*,'finished normally'

      end
      

i also changed the RECL to 1024, but i have no clue, if this is right.

This is probably not enough, unless NROW is less than 79. You need to increase RECL to the length of the longest formatted line output. If NROW were 2000, for example, RECL would need to be over 26000 (add to this to allow for field separators, end-of-line).

Use an old standby practice of inserting trace code into your program

do 1000 IFILE=1, NFILE-1   
      write(*,*) 'IFILE=', IFILE
      open (unit=98,file='input/'//FILENAME(IFILE),status='old',err=300) !each file is opened (and closed later, before the next one is opened)   
  
      read (98,*)    
      write(*,*) 'read rows'
  
       do 200, IROW=1, NROW   
        read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)   
200    continue
      write(*,*) 'read complete'
...
      ICOL = 22  
      write(*,*) 'ICOL = ', ICOL 
      do IROW = 1, NROW   
       write (21, fmt='(A$)') VALUES (ICOL,IROW)   
      enddo
      write(*,*) 'done'
      write (21,*)
...

Jim Dempsey

www.quickthreadprogramming.com

NROW is 791 - i set RECL to 10400 - this didnt help.
could it be a problem with file handles of the OS? i think that i read something like that some time ago.

If your program crashes, then clearly it starts and runs for a time and so it is not a compilation problem at present.

If your program 'Crashes', then there will be one or more informative error messages output to the console (assuming it is a console program).
If you want to diagnose the problem, showing us the error message you get when the program 'crashes', in your words, is the minimum information we need. So please can you oblige? Otherwise it's blind guessing, which is a waste of everyone's time.

If you insert diagnostic code into your program, as is highly recommended by posters who are trying to help, then at least you should discover where in your programmed loop the program fails. Then you can start delving into the code, with the help of execution error messages, to try and pin down the exact cause of the program to fail.

Assuming you alsohave

read (98,*)
write(*,*) 'read rows'
and you didn't see the 'read rows' message then
I suggest you change the read to
read(98,*,iostat=ier,err=399)
Then at label 399 you print out the iostat error number
-2 means end-of-record condition for nonadvancing read
-1 means end-of-file condition
+ve integer >0means an error occurred See the list of Run-Time Error Messages in the help

(I thought there was a subroutine we could call to get the text from an iostat code but a quick skim of the help didn't show it. Maybe I didn't look hard enough)

Anyway now you know it occurs when IFILE is 15944 you can add more debug statements along the lines of
if (IFILE==15944) then
print "Filename = ",FILENAME(IFILE),"#" ! prove that you have the correct file name and
! check it doesn't contain special characters for example
endif
etc.

Les

i also implemented the advised code of les:

[fxfortran]      do 1000 IFILE=1, NFILE-1
      
      write(*,*) 'IFILE ', IFILE    
      if (IFILE.gt.15000) write(*,*) 'FILE before open ',FILENAME(IFILE)
             
      open (unit=98,file='input/'//FILENAME(IFILE),status='old', 
     + IOSTAT=IER,err=300)
      
      if (IFILE.gt.15000) write(*,*) 'FILE after open ', FILENAME(IFILE)
      if (IFILE.gt.15000) write(*,*) 'IOSTAT - Open ', IER
        
       read(98,*,iostat=ier,err=399)
[/fxfortran]

the exe doesnt necessarily stop at IFILE = 15944. it also happened at 15942 and 15901. so it must not be a problem of the input-files, as they were read before (exept 15944). IOSTAT is 0 after opening, unfortunately the program does not jump to error-label 399, as it should.
my command line looks like this:

Can I suggest that you compile/build/runa version of the exe with all of the check options on ?
i.e. /warn:all /check:all
The first may catch compile problems (if any) and the second catch any run-time problems with array/string bounds exceeded, uninitialised variables etc.
There is definitely something strange going on.

Les

i was already using /warn:all /check:all. no errors/warnings are shown.

opening the w7 taskmanager and looking at the "harddisk properties" (i have a german version - i dont know how its called in english) where one can see the files being used, it shows that many files are listed. the closing of the files seems to be rather slow. so i tried to idle the cpu for some time after 15000 files, so the files could be closed. the closing worked (or at least the taskmanager showed so), but the exe still stoped.

Just a stab in the dark...

Are any of your paths network mapped drives (e.g. D:)?

The reason I ask is I have an old (legacy) Win32 app (non-Fortran) that performs a very large number of file directory search/open/copy/close operations. Works fine from XP to XP but has problems XP to Vista (writing to Vista) where it gets several 1000's of files into the program a network resource limitation is reached. Apprently the OS is trying to throttle down the activity with an error that ought to be retried by application. Resuming (several times) completes the application. I did not add code to test for this error, then pause for a while, then resume.

Did you use Les's suggestion for collecting io status code in addition to taking error dispatch to READ (and WRITE)?

Jim

www.quickthreadprogramming.com

With respect to my above note, try this:

In your main loop add something like

if(mod(ifile,10000) == 0) then
write(*,*) 'Sleeping 30 seconds...'
sleepqq(30000) ! 30 second wait
endif

Jim

www.quickthreadprogramming.com

the drive i am running the program on is a local one.
i tried the delaying of the programm already - but with sleep(60). i now tried the sleepqq(30000) you suggested, but unfortunately it doesnt help.
i implemented the iostatus code. the status of the last opened file is 0. with the first read statement, the programm crashes without jumping to the error label (see reply #15).

Are you on a workstation or server?
If server, does the system have a system policy to detect and kill a runaway program.

Users on Linux get a nasty surprise is OOM_Killer is running on the system and decides to kill your application.

Other than trying to duplicate your situation we are running out of options here.

Last thing to try this: Create a batch file (CMD script)

: foo.bat
yourProgramHere yourArgsHere
IF ERRORLEVEL 1 ECHO Error level of 1 or greater %ERRORLEVEL%

Then run the batch

If you see the error message something is causing your program to exit abnormaly.
The error level value may or may not print out depending on CMD option /E:ON

Jim

www.quickthreadprogramming.com

i am working on a workstation with w7. i cant imagine, that an application is killing my fortran-exe, because the exe compiled with CVF works.

created the batch file: the executable still crashes and i get the error level -1073741819.

i have attached 3 files:
cdr_output_to_timeseries_modMathew.f
list.txt
output_200701010100.txt

list.txt should be in the root-folder
output_200701010100.txt should be in \input
a folder \output is also needed.

maybe you can duplicate my situation.

thank you!

Attachments: 

Running test program now. Using IFV 11.0.66 on WinXP Pro x64 in 64-bit Debug build. With full debug checking this will take a while to reach problem point, ~5 ifiles per second. At 1700 now.

Jim

www.quickthreadprogramming.com

>>created the batch file: the executable still crashes and i get the error level -1073741819

This is C0000005 STATUS_ACCESS_VIOLATION

In this situation it would mean file access violation.

Let's see if it occures on my system too

Jim
at 9000 now

www.quickthreadprogramming.com

i run the program on xp 32 bit and w7 64 bit.

my about dialog says i using compiler:

Intel Visual Fortran Compiler Integration Package ID: w_cprof_p_11.1.054
Intel Visual Fortran Compiler Integration for Microsoft Visual Studio 2008, 11.1.3469.2008

Mathew,

I can reproduce the problem here. Crashes at file 16,150 with 0xC0000005

RSP = 0x30FF0

Which is just below the top of an unmapped page. IOW the stack local variables of the current stack frame. Then call fails due to write to non-existant memory for the return address. As to how it got into this situation??

a) something modified stack frame pointer on stack
b) something caused "infinate" recursion (error in error recovery/reporting routine)

So the error occures in both 32-bit and 64-bit application on Windows XP x64 and Windows 7 x64 using IVF 11.0.066 and IVF 11.1.054

Premier suppor should be able to reproduce this problem given your test program and test files.

Jim Dempsey

www.quickthreadprogramming.com

I'll take a look.

Retired 12/31/2016

Here is a shorter program based on the posted program; this also shows the same buggy behavior. The shorter program also runs about 100 files/second on a 2 GHz Athlon-X2, using Intel 11.1.067 32 or 64 bit compilers. The same input/output files are used as given by MR-KATSE. The errors do not occur on Linux using the Intel compiler 11.1.073.

[fxfortran]      program cdr_zoneoutput_converter

integer MAXCOL, MAXROW, MAXFILE, MAXDAY
parameter (MAXCOL=30)
parameter (MAXROW=1000)
parameter (MAXFILE=50)
parameter (MAXDAY = 30000)
parameter (IRL=10400)

character FILENAME*(MAXFILE)(MAXDAY)
character*12 onames(23)
data onames/'BFZON','BWOZON','BW3ZON','DELTAZON','ETATZON',
1 'ETAP0ZON','ETPEZON','ETPRZON','MELTZON',
2 'PRAINSOILZON','PSNOWZON','PZON','QAB1ZON',
3 'QAB2ZON','QAB3ZON','QABZON','QEX2ZON','QVS0ZON',
4 'SCOVZON','SMELTZON','SWWZON','TOTALSZON','TZON'/

integer NCOL, NROW, ICOL, IROW, IFILE, NFILE, IER
character VALUES(MAXCOL,MAXROW)*13
integer tval(8)

do iunit=11,33
open(unit=iunit,
1 file='output/' // trim(onames(iunit-10)) // '.txt',
2 ACCESS='sequential', RECL=IRL)
end do

NROW = 791
Print*, 'NROW = ', NROW

NFILE=0
NCOL=24

do 900 IFILE=1, MAXDAY

c ** read FILENAMES **
open (120, file='list.txt', status='old')
NFILE=NFILE+1
read (120, fmt=*, end=100) FILENAME(IFILE)

900 enddo

100 continue
close (120)
nfile=nfile-1
write(*,*)' nFile = ',nfile

c*** read in cdr-outputfile ***
do 1000 IFILE=1, NFILE
if(mod(ifile,100).eq.0)then
write(*,*) 'IFILE ', IFILE
endif
open (unit=98,file='input/'//FILENAME(IFILE),
+ status='old', IOSTAT=IER,err=300)
if(ier.ne.0) then
write(*,*)' Ifile, IOSTAT ',ifile,ier
endif
read(98,*,iostat=ier,err=399)

do 200, IROW=1, NROW
read (98,*, err=301) (VALUES(ICOL,IROW), ICOL=1, NCOL)
200 continue

close (98)
c*** write output_files ***
DO ICOL=2,24
write (icol+9, fmt='(791A)')
+ (VALUES (ICOL,IROW), IROW=1,NROW) !Format repeat count=NROW value
end do
1000 enddo

do ICOL=2,24
close (icol+9)
end do

print*,'finished normally'
stop

300 Print*, 'ERROR opening ', FILENAME(IFILE)
goto 302
301 Print*, 'ERROR reading ', FILENAME(IFILE)
Print*, 'Aborted... '
399 Print*, 'IOSTAT - read =', ier
302 stop
end
[/fxfortran]

The problem is caused by this line:

[fxfortran]      open (unit=98,file='input/'//FILENAME(IFILE),status='old', 
     + IOSTAT=IER,err=300)[/fxfortran]

The compiler creates a stack temporary for the file= expression but does not remove it from the stack. After a long while, the stack is exhausted but apparently this is not detected with a normal stack overflow message.

A workaround is to assign the value 'input/'//FILENAME(IFILE) to a character variable and then pass the variable as the file= value. The program seems to work on Linux because the default stacksize is larger there.

I will report this to the developers. Issue ID is DPD200161714.

Here is a simple (and quicker) reproducer.

character(1000) padding
do i=1,2000
write (padding,'(I5.5,A)') i,'.txt'
open (unit=1,file='input'//padding,disp='delete')
close (1)
end do

end
Retired 12/31/2016

Steve: wow, a 7-line reproducer? That really captures the essence!

Gooddiagnosis Steve.
I guess that this same bug would be present with any statement that would generate a temp for dummy arg. Such as function or subroutine call with concatinated argument.

Would you know if

MyFile = 'input'//something

generates a temp or concatinates directly into MyFile?

Jim

www.quickthreadprogramming.com

You just have to know what you're looking at. I've been doing this a long time...

Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.

Retired 12/31/2016

many thanks for the help and the workaround!

tested it and it works fine (at least as long the file names have the same length, or?)

As long as the variable you pick is longer than the longest possible filename, it will be fine. You can probably make it shorter than I had it.

Retired 12/31/2016

Quoting Steve Lionel (Intel)You just have to know what you're looking at. I've been doing this a long time...

Jim, the compiler pops temps off the stack in many cases, including assignment. Most of the time. The way the compiler works is that it looks for specific cases to do this, as usually it's not worth the bother - the stack will get popped when the routine exits. But I've seen a fair number of cases like this one where there's a large loop that builds up stack and eventually blows.

So this is an old problem that resurfaces from time to time. Time to get it fixed.

If you are looking to reduce the number of stack cleanups of these temps then consider having the compiler determine if a loop contains the creation of these temps then if so, create a hidden local save stack pointer variable for that loop, copy stack pointer to this variable immediately prior to loop start, then at front ofbody of loop restore stack pointer prior to first statement in scope of loop. Loops without such temporaries will not incure this additional overhead. Also, if the number of iterations of the loop is known to be small then you could bypass the save/restore code (providedtemp is also known to be small).

What you would be doing is trading off

Code that has known problem to potentially cause stack overflow given enough iterations.

against

Code that if passes first iteration is known to not (directly) cause stack overflow

at the expense of

mov esp,[ebp+offsetToHiddenSaveStackPointer]

A fair trade-off IMHO.

For anyone else reading this I suggest we take a straw poll and reply to this thread with your vote/comment.

Jim Dempsey

www.quickthreadprogramming.com

Hello,

i am happy that your are able to find the reason for a problem that i have since many years.
(http://software.intel.com/en-us/forums/showthread.php?t=42116&o=a&s=lr)

Please find a good solution.

Thanks in advance
Frank

Jim's proposal for fixing the problem with stack temporaries overrunning stack limits is reasonable. However, I am inclined to consider means to fix the problem, more specifically how much stack growth is permitted before cleaning up, as implementation issues related to optimization.

In fact, if debug/check options have been specified, or optimization level zero has been requested, the compiler ought not to permit this stack overrun to occur or, if that cannot be avoided, the runtime should provide a clear message and a traceback.

With my shorter example code, I tried to get a traceback, but after stack overflow the program simply quit with no hint that anything went wrong. Rerunning the program with Cygwin/GDB made me note the stack overflow.

A user should not have to stoop to assembler level and monitor the ESP register if stack overflow is suspected.

Such behavior is what had Frank "tropfen" stumped for four years, and needs to be rectified. His thread ends with a reference to "Issue 356587". It is not clear if anything was done to resolve Issue 356587 between 2006 and now.

Hello mecej4,

four years ago intel was not able to reproduce the problem. The Issue was closed.

Frank

mecej4,

Other than for optimization related differences, if the problem is systemic in Release Build it should be systemic in Debug Build - Otherwise you will have less of a chance in finding the problem.

RE: Debug and Stack Overflow.

Debug mode should have a stack guard page at bottom of stack. If the stack ever encroaches into this guard page then a debug exception should be raised. The compiler team can decide on how to impliment this. Had this feature been available to the original poster then this problem would have quickly been identified by either the original poster or any of us others monitoring this thread.

Jim Dempsey

www.quickthreadprogramming.com

I tried a slightly modified version of Steve Lionel's 7-line reproducer on Win-7. The Fortran run-time on this OS detects the stack overflow and prints a message, but does not give a
traceback.

program blowstack
  character(len=1000) padding
  iesp0=iesp()                  ! initial stack pointer
  do i=1,2000
    write (padding,'(I5.5,A)') i,'.txt'
    open (unit=1,file='input'//padding)
    close (1,status='delete')
    write(*,'(1x,I4,2x,Z08)')i,iesp0-iesp()   ! stack consumed
  end do
end program blowstack

The code for the utility function iesp() is, for 32-bit Windows:

.686P
.model flat
PUBLIC _IESP
_TEXT SEGMENT
_IESP PROC
      lea eax, dword ptr [esp + 4]
      ret
_IESP ENDP
_TEXT ENDS
END

and, for 64-bit Windows:

PUBLIC	IESP
_TEXT	SEGMENT
IESP	PROC
	lea rax, qword ptr [rsp + 8]
	ret
IESP	ENDP
_TEXT	ENDS
END

For the default stack of 0x100000, the program crashes with the last few lines of output (this is for the 32-bit version; the numbers are slightly different for the 64-bit version):

 1018     FA860
 1019     FAC50
 1020     FB040
forrtl: severe (170): Program Exception - stack overflow

Stack trace terminated abnormally.

The compiler does generate stack checking code in most cases, which will give a reasonable error. I'm not sure what happened here. It may be that the stack check is done only for "automatic" variables, allocated at the beginning of the routine. I will ask.

Retired 12/31/2016

Hello,

looking at your suggestions, i would prefer that there will be an automatic cleaning of the stack. I do not know during programing how many files will be opend during execution. Increasing the stack just for care (what i do currently) is not a good solution for me.

Frank

mecej4,

You can stay in FORTRAN and get the stack pointer

integer(C_PTR) :: StackLoc
...
StackLoc = LOC(StackLoc)

Place thatat start ofPROGRAM, then copy StackLoc to global InitialStackLock variable.
Then in subroutines, insert the above code. You can then check for stack consumed, but this will not tell you if you are getting close to stack overflow.

There is a C runtime library call that you can call from FORTRAN to obtain the remaining stack. Look in MSDN for the Windows version or in ??? for Linux.

On Windows single threaded there will be a fixed floor such as 0x10000 (verify this). For multi-threaded this will not be the case as each thread has a seperate stack and each stack has a lowest mapped location with guard page below that. On Linux (*ux) your stack grows till some other limiting value.

Jim

www.quickthreadprogramming.com

Jim, thanks for the pointers.

"..but this will not tell you if you are getting close to stack overflow." I thought of reading the .EXE header to obtain the stack limit, but decided that doing so was not worthwhile for a one-off job.

This issue was fixed in 12.0 Update 2.

Retired 12/31/2016

Leave a Comment

Please sign in to add a comment. Not a member? Join today