What is the difference between plaintext and binary data?
Asked Answered
F

5

20

Many languages have functions which only process "plaintext", not binary. Does this mean that only characters within the ASCII range will be allowed?

Binary is just a series of bytes, isn't it similar to plaintext which is just a series of bytes interpreted as characters? So, can plaintext store the same data formats / protocols as binary?

Flashcube answered 16/9, 2009 at 19:3 Comment(3)
FYI, there ain't no such thing as plain text. joelonsoftware.com/articles/Unicode.htmlEpitome
FYI, plaintext in this context is not UTF-8 and it cannot represent Unicode, because as I said before, it is a series of bytes, nothing fancier.Flashcube
I suppose I was being a bit snide. After all, you did include the air quotes around "plaintext".Epitome
F
5

One thing it often means is that the language might feel free to interpret certian control characters, such as the values 10 or 13, as logical line terminators. In other words, an output operation might automagicly append these characters at the end, and an input operation might strip them from the input (and/or terminate reading there).

In contrast, language I/O operations that advertise working on "binary" data will usually include an input parameter for the length of data to operate on, since there is no other way (short of reading past end of file) to know when it is done.

Fulvia answered 16/9, 2009 at 19:12 Comment(3)
Suppose the function I'm supplying plaintext to, takes it as a string. Can it not measure the length before transmission, instead of relying on control chars?Flashcube
That depends on the language. In Ada, certianly. In C, the only way to do that is to look for a line terminator (ASCII 0). That means you are unable to output that value into a file using "ASCII" I/O routines, but can using the length-based "binary" routines.Fulvia
Sure, and it might add a control character (such as \r\n), or even do character set conversions to that string, if the data is treated as binary, nothing would be added or altered.Pubis
I
13

a plain text is human readable, a binary file is usually unreadable by a human, since it's composed of printable and non-printable characters.

Try to open a jpeg file with a text editor (e.g. notepad or vim) and you'll understand what I mean.

A binary file is usually constructed in a way that optimizes speed, since no parsing is needed. A plain text file is editable by hand, a binary file not.

Incarnation answered 16/9, 2009 at 19:6 Comment(4)
I hope this is the dawn of Chuck Norris-style Jon Skeet jokes.Expostulate
Good answer, but I was referring to the programming context, can functions that only accept plaintext, store the same data formats / protocols as functions that accept binary?Flashcube
the problem is that binary files don't have newlines, so it's just difficult, but not impossible.Incarnation
@presleyster and fbrereto: see meta.stackexchange.com/questions/9134/jon-skeet-factsFulvia
L
7

"Plaintext" can have several meanings.

The one most useful in this context is that it is merely a binary files which is organized in byte sequences that a particular computers system can translate into a finite set of what it considers "text" characters.

A second meaning, somewhat connected, is a restriction that said system should display these "text characters" as symbols readable by a human as members of a recognizable alphabet. Often, the unwritten implication is that the translation mechanism is ASCII.

A third, even more restrictive meaning, is that this system must be a "simple" text editor/viewer. Usually implying ASCII encoding. But, really, there is VERY little difference between you, the human, reading text encoded in some funky format and displayed by a proprietary program, vs. VI text editor reading ASCII encoded file.

Within programming context, your programming environment (comprized by OS + system APIs + your language capabilities) defines both a set of "text" characters, and a set of encodings it is able to read to convert to these "text" characters. Please note that this may not necessarily imply ASCII, English, or 8 bits - as an example, Perl can natively read and use the full Unicode set of "characters".

To answer your specific question, you can definitely use "character" strings to transmit arbitrary byte sequences, with the caveat that string termination conventions must apply. The problem is that the functions that already exist to "process character data" would probably not have any useful functionality to deal with your binary data.

Legault answered 16/9, 2009 at 19:11 Comment(0)
J
5

Generally, it depends on the language/environment/functionality.

Binary data is always that: binary. It is transferred without modification.

"Plain text" mode may mean one or more of the following things:

  • the stream of bytes is split into lines. The line delimiters are \r, \n, or \r\n, or \n\r. Sometimes it is OS-dependent (like *nix likes \n, while windows likes \r\n). The line ending may be adjusted for the reading application
  • character encoding may be adjusted. The environment might detect and/or convert the source encoding into the encoding the application expects
  • probably some other conversions should be added to this list, but I can't think of any more at this moment
Jernigan answered 16/9, 2009 at 19:12 Comment(0)
F
5

One thing it often means is that the language might feel free to interpret certian control characters, such as the values 10 or 13, as logical line terminators. In other words, an output operation might automagicly append these characters at the end, and an input operation might strip them from the input (and/or terminate reading there).

In contrast, language I/O operations that advertise working on "binary" data will usually include an input parameter for the length of data to operate on, since there is no other way (short of reading past end of file) to know when it is done.

Fulvia answered 16/9, 2009 at 19:12 Comment(3)
Suppose the function I'm supplying plaintext to, takes it as a string. Can it not measure the length before transmission, instead of relying on control chars?Flashcube
That depends on the language. In Ada, certianly. In C, the only way to do that is to look for a line terminator (ASCII 0). That means you are unable to output that value into a file using "ASCII" I/O routines, but can using the length-based "binary" routines.Fulvia
Sure, and it might add a control character (such as \r\n), or even do character set conversions to that string, if the data is treated as binary, nothing would be added or altered.Pubis
V
3

Technically nothing. Plain text is a form of binary data. However a major difference is how values are stored. Think of how an integer might be stored. In binary data it would use a two's complement format, probably taking 32 bits of space. In text format a number would be stored instead as a series of unicode digits. So the number 50 would be stored as 0x32 (padded to take up 32 bits) in binary but would be stored as '5' '0' in plain text.

Vasilek answered 16/9, 2009 at 19:59 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.