Reading an ASCII file with FileChannel and ByteArrays
Asked Answered
T

6

5

I have the following code:

        String inputFile = "somefile.txt";
        FileInputStream in = new FileInputStream(inputFile);
        FileChannel ch = in.getChannel();
        ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);  // BUFSIZE = 256

        /* read the file into a buffer, 256 bytes at a time */
        int rd;
        while ( (rd = ch.read( buf )) != -1 ) {
            buf.rewind();
            for ( int i = 0; i < rd/2; i++ ) {
                /* print each character */
                System.out.print(buf.getChar());
            }
            buf.clear();
        }

But the characters get displayed at ?'s. Does this have something to do with Java using Unicode characters? How do I correct this?

Toulouselautrec answered 18/9, 2008 at 15:9 Comment(0)
D
7

You have to know what the encoding of the file is, and then decode the ByteBuffer into a CharBuffer using that encoding. Assuming the file is ASCII:

import java.util.*;
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;

public class Buffer
{
    public static void main(String args[]) throws Exception
    {
        String inputFile = "somefile";
        FileInputStream in = new FileInputStream(inputFile);
        FileChannel ch = in.getChannel();
        ByteBuffer buf = ByteBuffer.allocateDirect(BUFSIZE);  // BUFSIZE = 256

        Charset cs = Charset.forName("ASCII"); // Or whatever encoding you want

        /* read the file into a buffer, 256 bytes at a time */
        int rd;
        while ( (rd = ch.read( buf )) != -1 ) {
            buf.rewind();
            CharBuffer chbuf = cs.decode(buf);
            for ( int i = 0; i < chbuf.length(); i++ ) {
                /* print each character */
                System.out.print(chbuf.get());
            }
            buf.clear();
        }
    }
}
Demiurge answered 18/9, 2008 at 15:39 Comment(1)
If you want to avoid printing each character out separately, you could just use buf.flip() instead of buf.rewind(), and pass the whole chbuf to System.out.print()Outlook
L
3

buf.getChar() is expecting 2 bytes per character but you are only storing 1. Use:

 System.out.print((char) buf.get());
Lavernelaverock answered 18/9, 2008 at 15:27 Comment(0)
C
2

Changing your print statement to:

System.out.print((char)buf.get());

Seems to help.

Chiasmus answered 18/9, 2008 at 15:27 Comment(0)
L
2

Depending on the encoding of somefile.txt, a character may not actually be composed of two bytes. This page gives more information about how to read streams with the proper encoding.

The bummer is, the file system doesn't tell you the encoding of the file, because it doesn't know. As far as it's concerned, it's just a bunch of bytes. You must either find some way to communicate the encoding to the program, detect it somehow, or (if possible) always ensure that the encoding is the same (such as UTF-8).

Louise answered 18/9, 2008 at 15:34 Comment(0)
C
1

Is there a particular reason why you are reading the file in the way that you do?

If you're reading in an ASCII file you should really be using a Reader.

I would do it something like:

File inputFile = new File("somefile.txt");
BufferedReader reader = new BufferedReader(new FileReader(inputFile));

And then use either readLine or similar to actually read in the data!

Canaveral answered 18/9, 2008 at 15:21 Comment(2)
I have an enormous amount of data, and I am trying to optimize reading time. Reference: nadeausoftware.com/articles/2008/02/…Toulouselautrec
@Jake, in your example you read bytes and then decode to chars. Why do you assume that's faster than using a BufferedReader ? The interesting benchmarks you point at do not read characters.Ardeen
S
0

Yes, it is Unicode.

If you have 14 Chars in your File, you only get 7 '?'.

Solution pending. Still thinking.

Shewmaker answered 18/9, 2008 at 15:20 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.