Where is hex code of the "EOF" character?
Asked Answered
G

6

51

As far as know in the end of all files, specially text files, there is a Hex code for EOF or NULL character. And when we want to write a program and read the contents of a text file, we send the read function until we receive that EOF hexcode.

My question : I downloaded some tools to see a hex view of a text file. but I can't see any hex code for EOF(End Of File/NULL) or EOT(End Of Text)


ASCII/Hex code tables :

enter image description here

This is output of Hex viewer tools:

enter image description here


Note : My input file is a text file that its content is "Where is hex code of "EOF"?"

Appreciate your time and consideration.

Glottic answered 28/7, 2014 at 9:6 Comment(6)
Your assumption in the first sentence is wrong, in the vast majority of cases there is no such character physically present in the file. EOF is a symbolic value provided by the library to notify you, the programmer, that the file end has been reached. The operating system doesn't need to know where the file ends (or rather it doesn't store this information in the file itself).Dorettadorette
@Dorettadorette I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.Glottic
Unlikely. In cmd.exe ^Z is treated as the end of input so if you do something like type whatever.txt it will break when it hits ^Z if the file happens to contain one, but this only applies to the Windows command line. io libraries for programming should happily parse it as just another character.Dorettadorette
^Z was common in MS-DOS text files, and still is for many transfer protocols. I expect most SO users cannot remember MS-Kermit, xmoden, ymodem etc. It is still produced by ind$file and is a chore to remove. It throws nasty messages in gedit, so yes it does exist.Haws
@Dorettadorette in some cases the OS may not be reading from a file system, so it would need to know the file size in advance otherwise to know where the end occurs. Applies to stream or raw.Haws
Ctrl+Z (U+001A or ␚) is a character used by convention by some text-based file tools. The POSIX defiintion of a text file says it must end with a newline (U+000A or ␊ or \n) just like every other line in the file. These are merely conventions for some tools and systems, not requirements, so expecting a specific character isn't going to be reliable.Niggerhead
B
57

There is no such thing as a EOF character. The operating system knows exactly how many bytes a file contains (this is stored alongside other metadata like permissions, creation date, and the name), and hence can tell programs that try to read the eleventh byte of a ten byte file: You've reached the end of file, there are no more bytes to read.

In fact, the "EOF" value returned for example by C functions like getchar is explicitly an int value outside the range of a byte, so it cannot possibly be stored in a file!

Sometimes, certain file formats insist on adding NUL terminators (probably because that's how strings are usually stored in C), though usually these delimit multiple records in a single file, not the file as a whole. And such decoration usually disqualifies a file from being considered a "text file".

ASCII codes like ETX and NUL date back to the days of teletypewriters and friends. NUL is used in C for in-memory strings, but this has no bearing on file systems.

Beberg answered 28/7, 2014 at 9:16 Comment(13)
I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.Glottic
@User1-St Depends on how you read the file and do the search (as I said, many C functions consider NUL to signify the end of a string in memory) but there are no insurmountable difficulties.Beberg
How I can cheat my program. let assume my program consider Null to signify the end of file. In this case, if I add a "0x00" in the middle of the hex view of my file, the program will cheated?Glottic
@User1-St Yes, almost by definition. That's why you should write your program not do something that silly ;-)Beberg
:D So let write a program not do something that silly :)) thank you.Glottic
If your runtime makes a distinction between text and binary mode, and you are expecting control chararcters (< 20h), make sure you open in binary mode, just to be sure. You can convert to text afterwards.Kleper
@delnan Where operation system save metadata file? Can I find it in the hard disk?Glottic
@User1-St The metadata is stored somewhere on the hard disk (where and how depends a lot on the filesystem) but it's not a file itself! The metadata can usually be accessed through other APIs (for example stat on Unix-y systems).Beberg
@owlstead , would you please explain more clear? I don't understand your comment! thank youGlottic
@delnan is it possible to make change in it or it is protected? do you know how to access it in windows? What APIs? Thank you agian, very much!! :)Glottic
@User1-St I fear explaining it all goes beyond the scope of these comments. Sit down, read around a bit (stat, the organization of a simple file system like FAT), think hard and try to come up with one or a couple of questions that you can ask separately on Stack Overflow.Beberg
@delnan If the text mode is responding to control characters, you may never see them back, it could well be that it stops reading after a 00h character. You either should know how the runtime behaves or you should open in binary mode.Kleper
@owlstead Open what file in binary mode? There is no 00h in the end of text file.Glottic
T
24

There was - a long long time ago - an End Of File marker but it hasn't been used in files for many years.

You can demonstrate a distant echo of it on windows using:

C:\>copy con junk.txt
Hello
Hello again
- Press <Ctrl> and <z>
C:\>dump junk.txt
junk.txt:
00000000  4865 6c6c 6f0d 0a48 656c 6c6f 2061 6761 Hello..Hello aga
00000010  696e 0d0a                               in..
C:\>

Note the use of Ctrl-Z as an EOT marker.

However, notice also that the Ctrl-Z does not appear in the file any more - it used to appear as a 0x1a but only on some operating systems and even then not consistently.

Use of ETX (0x03) stopped even before those dim and distant times.

Tannic answered 28/7, 2014 at 9:42 Comment(0)
A
10

There is no such thing as EOF. EOF is just a value returned by file reading functions to tell you the file pointer reached the end of the file.

Assess answered 28/7, 2014 at 9:13 Comment(8)
I wrote a program that searched a text file for character "A" . And if there is no "A" in the text, move the file to a special directory. I want to know is there any way to cheat my program? for example "adding a NULL/EOF/EOT hex code in the middle of my input text"? thank you.Glottic
As long as your program is running on someone elses machine, they can always "cheat" it.Assess
How? did you mean they can give a text file to my program that have "A" in its content, and my program not notice that?Glottic
if your program is running on someone elses machine and they REALLY want to cheat it, they can, even with a debugger like OllyDbg or by hooking API functions, etc, theres lots of ways to cheat programs.Assess
I want to know is there any any way to cheat the program by only changing the text file? Assume that they can't install or edit anything in the host (that my program installed in it.)Glottic
If you wrote your program correctly, then no, they can't "cheat" itAssess
Sprry, Is this right or not? "the program keep reading the text file until receive a special hex code" that the special hex code depends on the programin language that I use.Glottic
No! When the read function return FEOF or 1. How the program understand that a point is the end of a file?Glottic
S
6

The EOT byte (0x04) is used to this day by unix tty terminals to indicate end of input. You type it with a Ctrl + D (ie. ^D) to end input to shells or any other program reading from stdin.

However, as others have pointed out, this is distinct from EOF, which is a condition rather than a piece of data per se.

Straddle answered 7/5, 2018 at 22:45 Comment(0)
S
3

There once were even different EOF characters (for different operating systems). No longer seen one. (Typically files were in blocks of 128 bytes.) For coding a PITA, like nowadays BOMs.

Instead there is still a int read() that normally delivers a byte value, but for EOF delivers -1.

The NUL character is a string terminator in C. In java you can have a NUL character in the middle of a string. To be cooperative with C, the UTF-8 bytes generated use a multi-byte encoding both for Unicode characters > 127 and for NUL.

(Some of this is probably known already.)

Stylobate answered 28/7, 2014 at 9:22 Comment(5)
UTF-8 does not generate multiple bytes for NUL. ASCII code 0 is not special, UTF-8 is fully ASCII compatible. More relevant for C is the fact that no UTF-8 multi-byte sequence contains a 0 byte (or any byte < 128 for that matter) so NUL termination can store all Unicode code points except U+0000.Beberg
@delnan: The so-called Modified UTF-8 uses multi-byte encoding for NUL too, giving 0xC0, 0x80. In this way a NUL char in a C UTF-8 string may be handled.Stylobate
But modified UTF-8 is not UTF-8. It's also quite obscure.Beberg
en.wikipedia.org/wiki/UTF-8#Modified_UTF-8 mentions object serialisation. Also DataOutputStream uses this in [writeUTF}(docs.oracle.com/javase/7/docs/api/java/io/…). You are right: official UTF-8 requires the shortest multi-byte sequence: 0x00.Stylobate
@User1-St: okay, this is the fourth answer I read and the fourth time you added that question. Don't do that, it's annoying and against the policy of SO. "Follow-up" questions are not meant to be asked in comments; they should be edited into your post (if relevant to the original question - this is not) or asked separately. But mostly, it is plain annoying.Nasalize
H
1

In the 7bit Wintel world it is 0x1A or chr(26).

It is still commonly found in older text files and archives and is still produced by some file transmission protocols. In particular text files downloaded from BBS systems were commonly terminated with this character.

There are other such sentinel values for older systems, and like EOL (CR,LF,CR+LF) needs to be anticipated from time to time.

It can be a source of annoyance to see it still being used, on the same level as return(0) for instance.

Haws answered 22/2, 2019 at 2:47 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.