Does Fortran have undefined behavior?
Asked Answered
F

3

13

In C and C++, there many operations that cause undefined behavior, i.e. situations that allow the compiler to do anything it wants. Examples include using a variable after deallocating it, deallocating a variable twice and dereferencing null pointers.

Does Fortran also have undefined behavior? I had a look at a specification draft, but failed to find anything in there. For instance, is using a variable after its deallocation guaranteed to crash the program, or may it silently do the wrong thing?

Forgather answered 19/8, 2019 at 14:22 Comment(0)
H
14

Yes, it has. It is just called differently. There are many things that you could do and will make your code not standard conforming, for which there is no requirement to for the processor (compiler) to diagnose such non-conformance (of course, many deviations must be diagnosed). Often the situations will be similar to C undefined-behaviour one (like accesing an array out-of-bounds, signed integer overflow,...). We just say that the code is not standard conforming, that means the standard does not prescribe the outcome of such a code. Such code is not covered but the standard and so anything can result if some compiler (processor) does compile it and you do run it.

That is different from processor dependent behaviour, that one is standard and just implementation dependent.

Just searching here at StackOverflow should give you plenty of examples. Like Is passing the same entity to arguments with different intent undefined behavior? How do Fortran and MPI_Reduce deal with integer overflow?

This answer just answers the question asked but does not attempt to list all possible kinds of UB that can happen in Fortran.

Hemihedral answered 19/8, 2019 at 14:34 Comment(1)
I tried to describe the situation in a consise and informal way. For a formal and more detailed and accurate description see the answer of francescalus.Roseboro
T
14

The Fortran standard does admit similar concepts to the idea of "undefined behaviour" of C and C++. It's often phrased by people supporting broken Fortran code as "the compiler may now start World War III", or similar.

The Fortran language specification has two ideas of conformance (see Fortran 2018, 4.2). The main one is what a program must look like to be able to considered a Fortran program. The second is what a processor must do in response to a submitted program unit to be considered a Fortran processor.

If a conforming Fortran processor is asked to process something that is not a Fortran program, the standard rarely says what must happen. There are some diagnoses that must be offered, but often there is not.

In the case of "using a variable after its deallocation", an attempt to do this is a violation of that part of the language standard that defines the Fortran program. A compiler may then "start World War III" without violating the Fortran standard, because the Fortran standard doesn't say that it must not (or must do something else).

Now, how do we look at the Fortran standard document and decide whether the not-quite-Fortran program has a particular required effect? The text from 4.2 mentions a number of situations where a compiler must have "the capability to detect and report the use within a submitted program unit". If your proposed program doesn't hit any of those you're in "undefined" territory.

The main time a program error must be reportable is in the case

the use within a submitted program unit of a form or relationship that is not permitted by the numbered syntax rules or constraints

Let's consider arbitrarily Fortran 2018, 15.5.1C1523 (R1520) Syntax of a procedure reference. We see things like "R1520":

R1520 function-reference is procedure-designator ( [ actual-arg-spec-list ] )

and "C1523":

C1523 (R1520) The procedure-designator shall designate a function.

before we have a list of things like:

The data-ref in a procedure-designator shall not be an unallocated allocatable variable or a pointer that is not associated.

In this case, the rule R1520, numbered constraint C1523 (which applies to this rule) and following text give constraints on the Fortran program. If your submitted program doesn't meet those, it's not a conforming Fortran program.

A compiler asked to process such a non-conforming program, where that program violates R1520 or C1523, must be able to detect that (based on the above). A compiler doesn't have to complain about or detect violations of the un-numbered text. It's allowed to assume that any program it's presented with doesn't break such an un-numbered restriction.

This one here I quote is (coincidentally) one example of a prohibition on a program incorrectly using a previously deallocated variable.

If a processor operates in a "compile-then-run" way, the numbered rules/constraints are typically ones that can be assessed "at compile time".

Another specific and significant example of undefined behaviour is using a variable without first giving it a value ("defining" it). The Fortran standard simply says (Fortran 2018 9.2 p2):

A reference is permitted only if the variable is defined. A reference to a data pointer is permitted only if the pointer is associated with a target object that is defined. A variable becomes defined with a value when events described in 19.6.5 occur.

This isn't a numbered syntax rule or constraint and is a (significant) burden on the program that the compiler is allowed to assume has been met.

Tithable answered 19/8, 2019 at 14:35 Comment(11)
A small correction to an otherwise excellent explanation - a compiler isn't required to complain about violations of syntax rules or constraints, it merely must "contain the capability to detect and report" such things. It doesn't have to do it by default, thus allowing the compiler to support non-standard syntax and relationships as extensions. In this case, you may need to add a switch to ask for standards checking if you want such things diagnosed.Fehr
As someone who had to care about these things for a very long time, you are of course correct. Thanks. Hopefully I corrected that sloppiness, but if you think I've missed one please let me know again.Tithable
Thanks both @Tithable and @vladimir-f! I wish I could have marked both of your excellent answers as correct.Chondriosome
How does the FORTRAN Standard handle non-portable actions which most implementations should process consistently on a quality-of-implementation basis, but which implementations need not handle in a fashion consistent with sequential program execution in cases where doing so would be impractical? The C and C++ Standards deliberately characterize such actions as Undefined Behavior, on the basis that quality implementations should be need no "permission" to process them usefully when practical, and non-portable-but-Conforming C programs need no "permission" to exploit this.Special
@supercat, if your question is about allowing compilers to optimize and/or auto-parallelize, then the Fortran standard has this within the "defined behaviour" framework. Undefined behaviour in Fortran (contrasting with implementation specific) is often a way of ensuring that the programmer cannot assume things which lead to several possibly interpretations between which a compiler must choose. For example, there are times when a programmer can't rely on a function being evaluated; for QoI most compilers may skip the reference.Tithable
@francescalus: There are many circumstances in which a non-optimized program might perform one way, but a useful optimization might cause a program to perform in a different way which may also meet application requirements. The C Standard's approach to allow such optimizations is to ensure that if an optimization could cause the behavior of some sequence of steps to be inconsistent with sequential execution, at least one step within that sequence is be characterized as invoking Undefined Behavior, thus requiring that programmers ensure no such sequence of operations can ever occur.Special
In that sense, @supercat, yes Fortran takes much the same approach as the C family. Fortran doesn't say "this behaviour is undefined" so much as "if you do this you don't have a Fortran program and it's not the Fortran standard's job to tell a Fortran compiler how to compile something which isn't a Fortran program". Many restrictions are in place precisely to allow a compiler to undertake optimization. In particular "code motion optimizations" are explicitly allowed and the programmer is responsible for marking ordering constraints.Tithable
@francescalus: The $50,000 question, IMHO, would be whether programmers would be required merely to ensure that all behaviors that could stem naturally from a particular optimization would meet requirements, or whether compilers would be allowed to behave in completely arbitrary fashion in such cases. For example, if a compiler determines both (1) no individual action within a loop has side effects sequenced before some other piece of code that tests whether x is less than 120, and (2) the loop will never terminate if x exceeds 100, should a compiler be allowed to...Special
... (1) process the loop as written, but omit the comparison that follows; (2) omit the code for the loop, and perform the comparison that follows; (3) omit both the code for the loop and the comparison that follows. IMHO, the best optimization opportunities could be achieved by allowing #1 or #2, but forbidding #3. Requiring that a programmer check whether i is less than 101 before entering a loop, even if hanging the program would be an acceptable consequence if it isn't, would yield performance inferior to what could have been produced with #1 or #2 without that check.Special
@supercat, Fortran doesn't have an explicit "as though" rule, but instead the compiler merely must "fulfill the interpretations" which is much the same thing. Fortran (unlike C, I believe) does not allow compilers to assume that all loops terminate, but a Fortran compiler can, say, replace the test if(LEXP) ... with if(.true.)... (and then process ... trivially) if it can prove that LEXP will always evaluate true at that point. (Of course, something like x<120 may itself have side-effects, which must be preserved.)Tithable
@francescalus: the C Standard allows compilers to "assume" most loops will terminate, without specifying what they may do based upon that assumption. IMHO, a better way of allowing the intended optimizations would be to say that if a block of code has a single statically-reachable exit point, it need only be treated as observably sequenced before some later action if some individual action within that block would be likewise sequenced.Special
S
1

A major difference between Fortran and C is that the latter language, as designed by Dennis Ritchie, specified the behavior of many actions which would have been impossible or erroneous in FORTRAN, in ways that could be used to eliminate the need for some of FORTRAN's more sophisticated constructs.

For example, in Dennis Ritchie's language, code which knew the total number of elements in a multi-dimensional array could iterate through the entire array using just the last subscript. A piece of code like:

double a[5][5],b[5][5];
void add_arrays()
{
  int i;
  for (i=0; i<25; i++)
    a[0][i] += b[0][i];
}

would not have been regarded as erroneous code that happened to work, but rather as code that exploited a common C idiom. [Note that the idiomatic way of doing the above in FORTRAN or Fortran would be to simply use a matrix addition].

By the time the C Standard was written, there would be at least three ways that implementations might process a function like:

double test(int i)
{
  a[1][0] = 1.0;
  a[0][i] = 2.0;
  return a[1][0];
}
  1. In the idiomatic C fashion, with the second assignment computing an address and writing to it, allowing for the possibility that the write might affect a[1][0].

  2. As above, but with the compiler assuming that a[1][0] will not be affected by the second assignment because it is in a different row of the array.

  3. With a definite bounds check, which would trap if i is not in the range 0 to 4.

Each of the above interpretations would make a compiler more suitable for some tasks and less suitable for others. Rather than suggest that any of the above interpretations was inferior to any other, the Standard instead characterizes an access to a[0][i] as "non-portable or erroneous" for values of i in the range 5 to 24; depending upon the implementation, on some implementations, such actions would be erroneous, but on others they would, despite being non-portable, be correct.

Although situations may arise where a Fortran compiler would process code in a manner analogous to #2 (yielding semantics inconsistent with sequential program execution but without yielding a diagnostic), I don't think the language has many situations where such actions could be viewed as erroneous by some implementations, and non-portable but correct by others.

Special answered 30/9, 2021 at 15:51 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.