A file is just a series of bytes, and without further information, you cannot tell whether these bytes are supposed to be code points in some string encoding (say, ASCII or UTF-8 or ANSI-something) or something else. You will have to resort to heuristics, such as:
- Try to parse the file in a number of known encodings and see if the parsing succeeds. If it does, chances are you have a text file.
- If you expect text files in Western languages only, you can assume that the majority of characters lies in the ASCII range (0..127), more specifically, (33..127) plus whitespace (tab, newline, carriage return, space). Count occurrences of each distinct byte value, and if the overwhelming part of your document is in the 'typical western characters' set, it's usually safe to assume it's a text file.
- Extending the previous approach; sample a sufficiently large quantity of text in the languages you expect, and build a character frequency profile. To check your file, compare the file's character frequency profile against your test data and see if it's close enough.
But here's another solution: Just treat everything you receive as text, applying the necessary transformations where needed (e.g. HTML-encode when sending to a web browser). As long as you prevent the file from being interpreted as binary data (such as a user double-clicking the file), the worst you'll produce is gibberish data.