truncated strings

truncated strings

I was just surprised by a bug in my code about which I did not get a warning:

character(LEN=4) :: name

name = "DAVID"

There is no compiler warning that "DAVID" gets truncated when stored in name.

Is there any compiler setting to force this warning?  I could not see one.

Thanks,

David

24 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hello

It's an intrinsic behavior of fortran.

It's allowed to store a content in a variable with a actual size smaller than the content.
integer*4 to an integer*2 variable
character*100 to a character*2 variable
and so on

So there is no warning and the result can be not that you may have expected.

Be careful

As GVautier points out, assigning a 5-character value to a 4-character variable is not a bug, but a feature of the language. It is a bug only in the sense that the code does something different from what you intended.

A compiler, however, may be able to help you catch such code. There are also many utilities and Lint-like tools that you can run your code through. Ftnchek says:

      4 name = "DAVID"
             ^
Warning near line 4 col 6 file dav.f90: char*5 const "DAVID" truncated to
 char*4 NAME

GFortran with -Wall says something very similar to this. Silverfrost FTN95 says

0004) name = "DAVID"
COMMENT - Only the first 4 character(s) of this constant will be used
    NO ERRORS, 1 COMMENT  [<XCHAR> FTN95/Win32 v8.00.0]

Note the level of the message: COMMENT, not WARNING or ERROR.

You could have made it allocatable maybe and it would allocate on assignment or maybe define it as a parameter which I think would gibve an error on length mismatch.

Volunteer programmers must be proficient in C, the language in which ftnchek is written.

Interesting statement on the FTNCHEK website -- thanks for letting me know about the program.

John

Andrew's suggestion will make the error message go away...
However, it may also introduce a hidden error in your program should the remainder of the program require name be 4 characters in length.

Be careful of making changes to code when you do not fully comprehend the full implications of such change.

Jim Dempsey

https://www.youtube.com/watch?v=eWQIryll8y8

Shows an excellent example of what Jim is talking about

Quote:

jimdempseyatthecove wrote:
Andrew's suggestion will make the error message go away...

However, it may also introduce a hidden error in your program should the remainder of the program require name be 4 characters in length.

Be careful of making changes to code when you do not fully comprehend the full implications of such change.

Jim Dempsey

Indeed the need to analyse  the consequences of any change goes without saying. Matching character declaration lengths to an actual hard coded string is rather error prone IMO. "How  many characters are there  in this sentence?". The dynamic allocation and later use of len(variable_name) if required makes life simpler or you are left to guess a length that is much longer than you really need .

I used to be plagued by this a lot, and eventually learned something that others have not really pointed out. The real danger of this behavior (which I agree is a "feature" of the language) is not just that the name variable will be truncated. In fact that result should be readily apparent and not difficult to debug. The REAL problem is that, by assigning something longer than the allocated storage space, you are inadvertently overwriting a storage cell that may belong to something else. That "something else" is often some unused memory and is harmless, but it can easily belong to something important, which is then corrupted. That corrupted something can cause bizarre, unexpected, and unpredictable behavior that is extremely difficult to debug. All part of the joy of programming in Fortran.

Quote:

dboggs wrote:

I used to be plagued by this a lot, and eventually learned something that others have not really pointed out. The real danger of this behavior (which I agree is a "feature" of the language) is not just that the name variable will be truncated. .. The REAL problem is that, by assigning something longer than the allocated storage space, you are inadvertently overwriting a storage cell that may belong to something else. That "something else" is often some unused memory and is harmless, but it can easily belong to something important, which is then corrupted. That corrupted something can cause bizarre, unexpected, and unpredictable behavior that is extremely difficult to debug. All part of the joy of programming in Fortran.

All such problems for a simple assignment involving a truncated CHARACTER expression, as shown in the original post!?  I find that very hard to believe, it must be for some other, more complex situations or non-standard code or deprecated coding practices.

The problems that DBoggs describes may happen in general (I suppose the infamous Bufferoverflow of C covers such things), but for the specific case of #1, wherein a character variable of length 4 is assigned a string of length 5, there can be no clobbering of adjacent memory. The Fortran standard specifies that when a character variable is set equal to a character expression, the latter is truncated or padded with blanks to match the length of the variable. Please see 7.2.1.3, numbered item 10.

10 For an intrinsic assignment statement where the variable is of type character, the expr may have a different character length parameter in which case the conversion of expr to the length of the variable is as follows. (1) If the length of the variable is less than that of expr, the value of expr is truncated from the right until it is the same length as the variable. (2) If the length of the variable is greater than that of expr, the value of expr is extended on the right with blanks until it is the same length as the variable.

If the variable is allocatable, etc., rather than a simple character variable of known length, I suppose bad things can happen. 

>>The REAL problem is that, by assigning something longer than the allocated storage space, you are inadvertently overwriting a storage cell that may belong to something else.

Won't happen. The string gets truncated (or space padded in the event the input string is shorter than the output string).

Jim Dempsey

If GFortran and Silverfrost can issue a warning when attempting to assign an oversized string, I see no reason why IVF cannot do the same.

In my code that triggered this, I use a code template for testing a large number of subroutine calls.  The maximum length I gave to the string variable was quite reasonable, what I did not catch was that the code I am linking with has subroutine names of increasing length, and so I got caught out.

It would be great if I could be sure that such errors would be caught at compile time in future.

David

We have run into bugs where a longer string was assigned in another procedure.  This will be caught by turning on interface checks.  In the actual case it caused no problem for months until inlining caused it to overwrite another variable.

I will add this to the list of "usage warnings" that have been suggested.

Retired 12/31/2016

 

>>The REAL problem is that, by assigning something longer than the allocated storage space, you are inadvertently overwriting a storage cell that may belong to something else.

Won't happen. The string gets truncated (or space padded in the event the input string is shorter than the output string).

I made that statement a little to hastily, and it wasn't quite correct. Yes, it won't happen exactly like I described. What I was referring to (along with some memory struggle!) was one or more related activity, such as EQUIVALENCE or COMMON trickery, or (more likely) an internal write. I think that severe and hard-to-debug trouble can occur if a long character string is accidentally written (via internal write) to a character variable of shorter declared length.

It would be good to run a simple test to determine if this is in fact true, but someone here probably already knows?

CHARACTER(3) :: cthree
CHARACTER(5) :: cfive
INTEGER :: a, b, c
COMMON cthree, a, b, c ! Just to ensure that a is stored immediately after cthree in memory
WRITE (cthree, '(A)') cfive
! Variable a will now be corrupt?

Hello

Nothing wrong will happen except an IO error because the write result is too long for the character variable. That's all.

The only problematic thing about strings but that's not really related to the topic it's using length declared character argument in subroutine and pass real argument of shorter length.

Ex :

subroutine test(string)
character*50 string
string=""
end subroutine

character*20 string20
call test(string20)

The only solution I found is to replace length declared character argument by character*(*) declaration

The example in #17  would give compiler error >> "error #7938: Character length argument mismatch."

Quote:

andrew_4619 wrote:

The example in #17  would give compiler error >> "error #7938: Character length argument mismatch."

Only if both subprograms are in the same file, or a compiler option is specified that checks for such mismatches, or if extra code is generated to do similar checks at run time.

In a release mode compilation, the .OBJ file for the subroutine does not use the second argument on the stack, which is the hidden (in Fortran source code) string length argument. Instead, it uses the incorrectly declared length, 50. If the length of the string argument is re-declared as (*), the length is taken from the hidden argument on the stack.

One can check the disastrous effects of such errors with the code of #17. I put the subroutine and calling program into separate files. The resulting EXE hangs when run, and I had to abort it with Ctrl+C. Worse things can happen with bigger programs, which is why it is important to ensure that code does not contain such interface mismatches.

Quote:

mecej4 wrote:
One can check the disastrous effects of such errors with the code of #17. I put the subroutine and calling program into separate files. The resulting EXE hangs when run.......

All you say is correct, however....

It wouldn't hang for me as it would not build. I would never compile without interfaces checking** and for that matter external routines have no place in my world. I guess if someone adopts less than ideal coding practices and also chooses to not use the options that find errors at compiler time then good luck to them!

** yes I do realise if you have inherited some ancient/non-standard Fortran that might not be an option in the first instance (been there got the T shirt) ......

 

This discussion seems a bit absurd.
If I have a char*80 string VTEXT and want to analyze the first 8 chars why should not write:
character*8 code
code = vtext

 

 

 

 

Luigi,

That is valid.

gvautier was illustrating how, by declaring a dummy argument of subroutine (without interface checking), that you can declare a character string (or any other array) to be larger than the actual argument. He is cautioning you about this characteristic of Fortran. To avoid this, use interface checking (or not use it if you really need to).

Jim Dempsey

Quote:

Luigi R. wrote:

This discussion seems a bit absurd.
If I have a char*80 string VTEXT and want to analyze the first 8 chars why should not write:
character*8 code
code = vtext

Luigi,

I would prefer to have at least an option so that the compiler issue a warning for your example.

Why not code this explicitly

code = vtext(1:len(code))

If you only want the first 8 characters.

It seems to me that explicitly coding what you want to happen is better than hoping that it will.

Your example could be a typo, in that you meant to declare code as 80 characters long to match vtext, but declared it incorrectly.

We have many other warnings like this.  And as other posters have indicated gfortran and silverfrost to detect this and issue a warning.

David

There is the usual trade-off for a language (computer or natural) between the inconvenience of the verbosity (and some forms of verbosity result in a loss of clarity and are somewhat associated with the chance of making an error) of having to be explicit about something, and the convenience but possibility of  misinterpretation or mis-intention of an implicit action.  In a standard Fortran context the implicit conversions between integer and real, or the implicit conversions between different kinds of a particular intrinsic type are similar trade-offs.

A little care is needed with something like `code = vtext(1:len(code))` - it assumes that the length of code is always less than or equal to the length of vtext.  That's the sort of assumption that could break as a code evolves - make sure your explicit medicine is not worse than the disease.  So to be robust and explicit what you want is `code = vtext(:min(len(code),len(vtext))`, which is perhaps an example of where verbosity negatively impacts clarity.  That still relies on the implicit rules of character length parameter conversion, when len(vtext) is less than len(code).

From the point of view of a warning I think there is a difference between assignment of a literal constant to a shorter fixed length string and assignment of a more general expression, though `name = name // other` is also questionable.

To be mischievous, use of any fixed length character variable (apart from those with a length of one) should probably cop a warning these days...

Leave a Comment

Please sign in to add a comment. Not a member? Join today