using java.util.Scanner to read a file byte by byte
Asked Answered
V

5

6

I'm trying to read a one line file character by character using java.util.Scanner. However I'm getting this exception":

Exception in thread "main" java.util.InputMismatchException: For input string: "contents of my file"
    at java.util.Scanner.nextByte(Scanner.java:1861)
    at java.util.Scanner.nextByte(Scanner.java:1814)
    at p008.main(p008.java:18) <-- line where I do scanner.nextByte()

Here's my code:

public static void main(String[] args) throws FileNotFoundException {
    File source = new File("file.txt");
    Scanner scanner = new Scanner(source);
    while(scanner.hasNext()) {
        System.out.println((char)scanner.nextByte());
    }
    scanner.close()
}

Does anyone have any ideas as to what I might be doing wrong?

Edit: I realized I wrote hasNext() instead of hasNextByte(). However if I do that it doesn't print out anything.

Violaviolable answered 11/1, 2010 at 0:12 Comment(1)
a Scanner is for parsing character input. I suspect you need an InputStream.Hemlock
U
11

Why on earth would you want to use a scanner to read a file byte by byte? That's like using a wheelbarrow to transport your pocket change. (If you really need a wheelbarrow for your pocket change, let me know so I can become your friend).

But seriously: Class InputStream reads bytes from a file, simply and reliably, and does nothing else.

Class scanner was recently introduced into the Java API so textbook examples could pull data out of a file with less pain than is usually involved with using the cascade of new BufferedReader(new InputStream). Its specialty is inputting numbers and strings from free-form input files. The nextByte() method actually reads one or a few decimal digits from the input stream (if they're there) and converts the number thus scanned into a single byte value.

And if you're reading bytes, why do you want to output them as chars? Bytes are not chars, and brute-force interconverting will fail in some places. If you want to see the values of those bytes, print them out as they are and you'll see small integers between 0 and 255.

If you want to read chars from a file, FileReader is the class for you.

Unbidden answered 11/1, 2010 at 0:22 Comment(2)
I have a text file starting with the word "Abstract" ( what a suprise.. ). Anyway when I try reading with: Scanner scanner = new Scanner(file); byte b = scanner.nextByte(); I am getting java.util.InputMismatchException. Why am I not seeing any values between 0 and 255, can you please help? The file is UTF-8.Knowledgeable
My answer explained this, but perhaps not very well. Scanner reads and interprets text-form input, not low-level bytes! Try creating a file whose first line reads 1 10 100 1000 hello and reading that with Scanner.nextByte(). You will successfully read and return as bytes the numbers 1, 10 and 100 but suffer an exception on 1000 and (if you read past that) on "hello" because those aren't values that can be represented in a byte.Unbidden
W
4

Scanner is for parsing text data - its nextByte() method expects the input to consist of digits (possibly preceded by a sign).

You probably want to use a FileReader if you're actually reading text data, or a FileInputStream if it's binary data. Or a FileInputStream wrapped in an InputStreamReader if you're reading text with a specific character encoding (unfortunately, FileReader does not allow you to specify the encoding but uses the platform default encoding implicitly, which is often not good).

Wendelina answered 11/1, 2010 at 0:23 Comment(4)
Sorry, what do you mean "parsing text data" and "reading text data" ?Knowledgeable
@KorayTugay: reading means just taking whatever comes, one byte (or character) after another. Parsing means that you expect the data to have a specific structure or format, such as a string consisting of digits preceded by an optional minus sign, so that you can interpret it as a number.Wendelina
Thanks for the comment. So nextByte method in Scanner class is for "reading digits" only?Knowledgeable
Sorry to necro but your FileReader link directs to FilterReader, not FileReader :)Poliomyelitis
H
2

When troubleshooting Scanner, check for underlying I/O errors:

if(scanner.ioException() != null) {
  throw scanner.ioException();
}

Though I'm with the others - this probably isn't the right class for the job. If you want byte input, use an InputStream (in this case, FileInputStream). If you want char input, use a Reader (e.g. InputStreamReader).

Hydrolytic answered 11/1, 2010 at 10:53 Comment(0)
S
1

Scanner is all about reading delimited text (see the docs).

nextByte will keep reading until it gets to whichever delimiter you specified (whitespace by default) and then try to convert that string into a byte.

So if you have 123 456 in a file, one call to nextByte will return 123, not 49 (the decimal value for the 1 character).


If you want to read byte-by-byte, you could use FileInputStream.

Springlet answered 15/8, 2015 at 20:22 Comment(0)
A
0

Let me just address the question about why you would want to force fetch one byte. Suppose I am trying to parse this line: " (literalize this that another) " The open and close parentheses are not really delimiters, and they may or may not have a white space delimiter between '(' and "lit..." If you try to fetch with hasNext(), you get "(literalize ... ". I think we need to force fetch "(" and then fetch "literalize" but I don't know how to do that.

Alkalinize answered 16/8 at 14:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.