Reading binary file from URLConnection
Asked Answered
D

2

21

I'm trying to read a binary file from a URLConnection. When I test it with a text file it seems to work fine but for binary files it doesn't. I'm using the following mime-type on the server when the file is send out:

application/octet-stream

But so far nothing seems to work. This is the code that I use to receive the file:

file = File.createTempFile( "tempfile", ".bin");
file.deleteOnExit();

URL url = new URL( "http://somedomain.com/image.gif" );

URLConnection connection = url.openConnection();

BufferedReader input = new BufferedReader( new InputStreamReader( connection.getInputStream() ) );

Writer writer = new OutputStreamWriter( new FileOutputStream( file ) );

int c;

while( ( c = input.read() ) != -1 ) {

   writer.write( (char)c );
}

writer.close();

input.close();
Deductive answered 11/7, 2010 at 5:33 Comment(0)
C
35

This is how I do it,

input = connection.getInputStream();
byte[] buffer = new byte[4096];
int n;

OutputStream output = new FileOutputStream( file );
while ((n = input.read(buffer)) != -1) 
{
    output.write(buffer, 0, n);
}
output.close();
Catadromous answered 11/7, 2010 at 15:12 Comment(2)
The n > 0 test is unnecessary. According to the javadocs, the only case where zero can be returned is when buffer.length is zero.Hoffmann
... and in any case a zero length write is harmless.Geordie
H
15

If you are trying to read a binary stream, you should NOT wrap the InputStream in a Reader of any kind. Read the data into a byte array buffer using the InputStream.read(byte[], int, int) method. Then write from the buffer to a FileOutputStream.

The way you are currently reading/writing the file will convert it into "characters" and back to bytes using your platform's default character encoding. This is liable to mangle binary data.

(There is a charset (LATIN-1) that provides a 1-to-1 lossless mapping between bytes and a subset of the char value-space. However this is a bad idea even when the mapping works. You will be translating / copying the binary data from byte[] to char[] and back again ... which achieves nothing in this context.)

Hoffmann answered 11/7, 2010 at 6:18 Comment(5)
Or you can try wrapping up your InputStream into BufferedInputStream.Housen
@Housen - that is true, but it will only help if you are going to do lots of small reads. If you exclusively do large block reads, a BufferedInputStream will actually reduce throughput a bit.Hoffmann
This is correct; InputStreamReader will transform byte data to UTF-16 character data (in this case, using the default platform encoding, which is a bad idea even for text/plain). A Java char is not an octet as it is in some other languages.Rateable
@StephenC, regarding your last (+1 useful) comment - What buffer-size would still be considered as causing "lots of small reads" (by your definition)? In other words, how small "should" the byte[] read-buffer be, to justify usage of BufferedInputStream?Schram
I can't give you an exact number. It depends on the relative costs of a syscall, the sizes of the buffer and the byte[], and so on. But my real point is to not assume that using a buffered stream always makes things faster.Hoffmann

© 2022 - 2024 — McMap. All rights reserved.