The article has a little problem of logic.
It (correctly) identifies that a certain usage of C functions has behavior which is not defined by ISO C. But then, to avoid this undefined behavior, the article proposes a solution: replace that usage with platform-specific functions. Unfortunately, the use of platform-specific functions is also undefined according to ISO C. Therefore, the advice does not solve the problem of undefined behavior.
The quote in my copy of the 1999 standard confirms that the alleged behavior is indeed undefined:
A binary stream need no meaningfully support fseek calls with a whence value of SEEK_END. [ISO 9899:1999 7.19.9.2 paragraph 3]
But undefined behavior does not mean "bad behavior"; it is simply behavior for which the ISO C standard gives no definition. Not all undefined behaviors are the same.
Some undefined behaviors are areas in the language where meaningful extensions can be provided. The platform fills the gap by defining a behavior.
Providing a working fseek
which can seek from SEEK_END
is an example of an extension in place of undefined behavior. It is possible to confirm whether or not a given platform supports fseek
from SEEK_END
, and if this is provisioned, then it is fine to use it.
Providing a separate function like lseek
is also an extension in place of undefined behavior (the undefined behavior of calling a function which is not in ISO C and not defined in the C program). It is fine to use that, if available.
Note that those platforms which have functions like the POSIX lseek
will also likely have an ISO C fseek
which works from SEEK_END
. Also note that on platforms where fseek
on a binary file cannot seek from SEEK_END
, the likely reason is that this is impossible to do (no API can be provided to do it and that is why the C library function fseek
is not able to support it).
So, if fseek
does provide the desired behavior on the given platform, then nothing has to be done to the program; it is a waste of effort to change it to use that platform's special function. On the other hand, if fseek
does not provide the behavior, then likely nothing does, anyway.
Note that even including a nonstandard header which is not in the program is undefined behavior. (By omission of the definition of behavior.) For instance if the following appears in a C program:
#include <unistd.h>
the behavior is not defined after that. [See References below.] The behavior of the preprocessing directive #include
is defined, of course. But this creates two possibilities: either the header <unistd.h>
does not exist, in which case a diagnostic is required. Or the header does exist. But in that case, the contents are not known (as far as ISO C is concerned; no such header is documented for the Library). In this case, the include directive brings in an unknown chunk of code, incorporating it into the translation unit. It is impossible to define the behavior of an unknown chunk of code.
#include <platform-specific-header.h>
is one of the escape hatches in the language for doing anything whatsoever on a given platform.
In point form:
- Undefined behavior is not inherently "bad" and not inherently a security flaw (though of course it can be! E.g. buffer overruns linked to the undefined behaviors in the area of pointer arithmetic and dereferencing.)
- Replacing one undefined behavior with another, only for the purpose of avoiding undefined behavior, is pointless.
- Undefined behavior is just a special term used in ISO C to denote things that are outside of the scope of ISO C's definition. It does not mean "not defined by anyone in the world" and doesn't imply something is defective.
- Relying on some undefined behaviors is necessary for making most real-world, useful programs, because many extensions are provided through undefined behavior, including platform-specific headers and functions.
- Undefined behavior can be supplanted by definitions of behavior from outside of ISO C. For instance the POSIX.1 (IEEE 1003.1) series of standards defines the behavior of including
<unistd.h>
. An undefined ISO C program can be a well defined POSIX C program.
- Some problems cannot be solved in C without relying on some kind of undefined behavior. An example of this is a program that wants to seek so many bytes backwards from the end of a file.
References:
- Dan Pop in comp.std.c, Dec. 2002: http://groups.google.com/group/comp.std.c/msg/534ab15a7bc4e27e?dmode=source
- Chris Torek, comp.std.c, on the subject of nonstandard functions being undefined behavior, Feb. 2002: http://groups.google.com/group/comp.lang.c/msg/2fddb081336543f1?dmode=source
- Chris Engebretson, comp.lang.c, April 1997: http://groups.google.com/group/comp.lang.c/msg/3a3812dbcf31de24?dmode=source
- Ben Pfaff, comp.lang.c, Dec 1998 [Jestful answer citing undefinedness of the inclusion of nonstandard headers]: http://groups.google.com/group/comp.lang.c/msg/73b26e6892a1ba4f?dmode=source
- Lawrence Kirby, comp.lang.c, Sep 1998 [Explains effects of nonstandard headers]: http://groups.google.com/group/comp.lang.c/msg/c85a519fc63bd388?dmode=source
- Christian Bau, comp.lang.c, Sep 1997 [Explains how the undefined behavior of
#include <pascal.h>
can bring in a pascal keyword for linkage.] http://groups.google.com/group/comp.lang.c/msg/e2762cfa9888d5c6?dmode=source
fseek
/ftell
(actuallyfseeko
/ftello
, if you have POSIX, so you can deal with large files) is the preferred way to determine file size. Thestat
-based alternative will fail to determine sizes of some non-regular-files that do have well-defined sizes, such as block devices (disk partitions, etc.). – Battik