Getting meaningful text from Java.io.Reader
Asked Answered
S

7

8

I have a program that I'm writing where I am using another company's library to download some reports from their website. I want to parse these reports before I write them to a file, because if they match certain criteria, I want to disregard them.

Problem is, their method, called download() returns a java.io.Reader. The only method available to me is

int read(char[] cbuf);

Printing this returned array out gives me meaningless characters. I want to be able to identify what character set I'm working with or convert it to a byte array but I can't figure out how to do it. I've tried

//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");

and I'm afraid to downcast to a more useful reader because I can't know for sure if it will work or not. Any suggestions?

EDIT

When I say it prints out "meaningless characters", I don't mean that it looks like the example given by Jon Skeet. It's really hard to describe because I'm not at my machine right now, but I think it's an encoding issue. The characters seem to have indentations and structure similar to the look of the reports. I'll try these suggestions as soon as I get back on Tuesday (I'm only an intern, so I haven't bothered with setting up a remote account or anything).

Sextillion answered 30/12, 2011 at 20:47 Comment(2)
Have you tried BufferedReader? There is no reason why it should not work...Gillies
If System.out.print(cbuf[i]) gives you garbage for i=0, 1, 2.., then the other company's lib has a problem, or you did not configure it well.Bilge
B
19

Try this:

BufferedReader in = new BufferedReader(retrievedFile);
String line = null;
StringBuilder rslt = new StringBuilder();
while ((line = in.readLine()) != null) {
    rslt.append(line);
}
System.out.println(rslt.toString());

Don't typecast the Reader to any class because you don't know the real type of it. Instead, use BufferedReader and pass Reader into it. And BufferedReader take any subclass of java.io.Reader as the argument so it is save to use it.

Boman answered 30/12, 2011 at 21:14 Comment(2)
Worked perfectly in my scenario.Arabel
This implementation is wrong!! Any line terminators in the BufferedReader will be lost because BufferedReader.readLine does not include the line terminators!Romp
P
6

Printing out the char[] itself will probably give you something like:

[C@1c8825a5

That's just the normal output of calling toString on a char array in Java. It sounds like you want to convert it into a String, which you can do with a String(char[]) constructor. Here's some sample code:

public class Test {
    public static void main(String[] args) {
        char[] chars = "hello".toCharArray();
        System.out.println((Object) chars);

        String text = new String(chars);
        System.out.println(text);
    }
}

On the other hand, java.io.Reader doesn't have a read method returning a char[] - it has methods which either return a single character at a time, or (more usefully) accept a char[] to fill with data, and return the amount of data read. This is actually what your sample code shows. You just need to use the char array and the number of characters read to create the new String. For example:

char[] buffer = new char[4096];
int charsRead = reader.read(buffer);
String text = new String(buffer, 0, charsRead);

However, note that it may not return all the data in one go. You could read it line by line using BufferedReader, or loop to fetch all of the information. Guava contains useful code in its CharStreams class. For example:

String allText = CharStreams.toString(reader);

or

List<String> lines = CharStreams.readLines(reader);
Paleopsychology answered 30/12, 2011 at 21:3 Comment(2)
Jon, my mistake about the method, I forgot it returns an int and takes a char[]. Printing out the char array looks like an encoding problem. It might even be that the company messed up their implementation, or I have configured it wrong, a la @amadeus's postSextillion
@Tom: If it's an encoding problem, that could only be because the implementation is messed up - if you're given a Reader, you don't need to take care of the encoding at all. Are you able to give any details of the library?Paleopsychology
S
1

What meaningless chars does it give. Probably null chars, because you don't read all the chars from the reader, but at most 2048 chars, and you ignore the returned value from the read method (which tell you how many chars were actually read.

If you want to read the whole thing into a String, you'll have to loop until the returned value is negative, and append the chars read at each iteration (from 0 to numChars) to a StringBuilder.

StringBuilder builder = new StringBuilder();
char[] cbuf = new char[2048];
int numChars;
while ((numChars = reader.read(cbuf)) >= 0) {
    builder.append(cbuf, 0, numChars);
}
String s = builder.toString();
Snips answered 30/12, 2011 at 20:54 Comment(1)
Example is missing the definition of the cbuf array. This is the most efficient solution IMO.Manage
L
0

Wrap it in something more useful, like a StringReader or a BufferedReader:

http://docs.oracle.com/javase/6/docs/api/

.

Laure answered 30/12, 2011 at 20:50 Comment(0)
C
0

Since the file is a text file create a BufferedReader from your Reader and read it line by line - that should help make more sense of it.

Crawler answered 30/12, 2011 at 20:50 Comment(0)
J
0

As an alternative you can read a string from a java.io.Reader using java.util.Scanner using try with resources which should automatically close the reader.

Here is an example:

Reader in = ...
try (Scanner scanner = new Scanner(in).useDelimiter("\\Z")) {
    String text = scanner.next();
    ... // Do something with text
}

In this situation the call to scanner.next() will read all characters, because the delimiter is the end of file.

The following one liner will also read the whole text but will not close the reader:

String text = new Scanner(in).useDelimiter("\\Z").next();
Jotunheim answered 13/11, 2018 at 14:19 Comment(0)
S
0

Since Java 1.8, you can use the BufferedReader.lines() method, returning Stream<String>.

So, this code will return whole content, with a custom line separator "\n":

String content = new BufferedReader(reader)
    .lines()
    .collect(Collectors.joining("\n"));
Swarey answered 21/5, 2022 at 13:23 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.