Most efficient way to create InputStream from OutputStream
Asked Answered
S

5

94

This page: http://blog.ostermiller.org/convert-java-outputstream-inputstream describes how to create an InputStream from OutputStream:

new ByteArrayInputStream(out.toByteArray())

Other alternatives are to use PipedStreams and new threads which is cumbersome.

I do not like the idea of copying many megabytes to new in memory byte array. Is there a library that does this more efficiently?

EDIT:

By advice from Laurence Gonsalves, i tried PipedStreams and it turned out they are not that hard to deal with. Here's the sample code in clojure:

(defn #^PipedInputStream create-pdf-stream [pdf-info]
  (let [in-stream (new PipedInputStream)
        out-stream (PipedOutputStream. in-stream)]
    (.start (Thread. #(;Here you write into out-stream)))
    in-stream))
Siouan answered 4/8, 2009 at 6:13 Comment(1)
An alternative and more complete Clojure implementation see ring-core: github.com/ring-clojure/ring/blob/…Myceto
O
78

If you don't want to copy all of the data into an in-memory buffer all at once then you're going to have to have your code that uses the OutputStream (the producer) and the code that uses the InputStream (the consumer) either alternate in the same thread, or operate concurrently in two separate threads. Having them operate in the same thread is probably much more complicated that using two separate threads, is much more error prone (you'll need to make sure that the consumer never blocks waiting for input, or you'll effectively deadlock) and would necessitate having the producer and consumer running in the same loop which seems way too tightly coupled.

So use a second thread. It really isn't that complicated. The page you linked to had reasonable example. Here's a somewhat modernized version, which also closes the streams:

try (PipedInputStream in = new PipedInputStream()) {
    new Thread(() -> {
        try (PipedOutputStream out = new PipedOutputStream(in)) {
            writeDataToOutputStream(out);
        } catch (IOException iox) {
            // handle IOExceptions
        }
    }).start();
    processDataFromInputStream(in);
}
Ozzie answered 4/8, 2009 at 7:6 Comment(13)
I think you also need to create new PipedInputStream for each consumer thread. If you read from Pipe from another thread, it will give you an error.Kono
@Lawrence: I don't understand your rationale for using 2 threads ... UNLESS it is a requirement that all characters read from the InputStream are written to the OutputStream in a timely fashion.Laflamme
Thx. I overlooked PipedStreams at first because i thought it would be too cumbersome to deal with them. Turned out no big deal at all, especially from clojure.Siouan
Stephen: you can't read something until it's been written. So with only one thread you either need to write everything first (creating a big in-memory array that Vagif wanted to avoid) or you need to have them alternate being very careful to have the reader never block waiting for input (because if he does, the writer will never get to execute either).Ozzie
@Laurence - +1 even though looking at your rep it doesn't look like you need it -:) Just wanted to say that this is very nicely explained! (both, answer and comment)Immoderacy
@amol Thanks. I probably should have added that this all follows from the fact that Input/OutputStream use blocking IO. With non-blocking IO it's possible to do the alternating approach: just make sure that when the reader doesn't have enough input it "yields" to the writer.Ozzie
is this suggestion safe to use in a JEE environment where the container probably is running a lot of his own threads?Organism
@Organism if new Thread isn't appropriate in your container for whatever reason, then see if there's a thread pool you can use.Ozzie
@LaurenceGonsalves, doesn't the writing thread also have to close out? Otherwise the reader will think that the pipe is broken once the writing thread has died.Heronry
@Heronry I don't think closing is strictly required in the case of PipedInputStream/PipedOutputStream, these classes don't use any OS resources that need cleaning up by close, and in particular they don't use an actual OS-level "pipe", so even when the writing thread dies, the reader will not know. There is no signal sent-- there are just no more bytes ever written to the stream. That said, I agree with you that closing them is good form if nothing else.Ozzie
@LaurenceGonsalves, if you do not close the PipedOutputStream the reader will either block infinitely or, if the writer thread died, will throw an exception, see stackoverflow.com/a/29725367. The PipedInputStream should also be closed to make sure that if (for some reason) the writer writes to the PipedOutputStream again, it will not block infinitely waiting for space in the buffer if it is full.Heronry
Another reason that a separate thread is required is due to these implementations in particular. From the docs: Typically, data is written to a PipedOutputStream object by one thread and data is read from the connected PipedInputStream by some other thread. Attempting to use both objects from a single thread is not recommended as it may deadlock the thread.Mimeograph
@Heronry I know this is a late response, but you're absolutely right. Even though closing doesn't clean up "resources", it signals to the other side that it's done. I've updated the code snippet (and also modernized the code, somewhat).Ozzie
H
15

There is another Open Source library called EasyStream that deals with pipes and thread in a transparent way. That isn't really complicated if everything goes well. Problems arise when (looking at Laurence Gonsalves example)

class1.putDataOnOutputStream(out);

Throws an exception. In that example the thread simply completes and the exception is lost, while the outer InputStream might be truncated.

Easystream deals with exception propagation and other nasty problems I've been debugging for about one year. (I'm the mantainer of the library: obviously my solution is the best one ;) ) Here is an example on how to use it:

final InputStreamFromOutputStream<String> isos = new InputStreamFromOutputStream<String>(){
 @Override
 public String produce(final OutputStream dataSink) throws Exception {
   /*
    * call your application function who produces the data here
    * WARNING: we're in another thread here, so this method shouldn't 
    * write any class field or make assumptions on the state of the outer class. 
    */
   return produceMydata(dataSink)
 }
};

There is also a nice introduction where all other ways to convert an OutputStream into an InputStream are explained. Worth to have a look.

Hyperaesthesia answered 2/6, 2011 at 11:30 Comment(1)
The tutorial for using their class is available at code.google.com/p/io-tools/wiki/Tutorial_EasyStreamTicklish
R
13

A simple solution that avoids copying the buffer is to create a special-purpose ByteArrayOutputStream:

public class CopyStream extends ByteArrayOutputStream {
    public CopyStream(int size) { super(size); }

    /**
     * Get an input stream based on the contents of this output stream.
     * Do not use the output stream after calling this method.
     * @return an {@link InputStream}
     */
    public InputStream toInputStream() {
        return new ByteArrayInputStream(this.buf, 0, this.count);
    }
}

Write to the above output stream as needed, then call toInputStream to obtain an input stream over the underlying buffer. Consider the output stream as closed after that point.

Recurve answered 1/5, 2016 at 0:25 Comment(1)
note: this is more efficient if the data is produced without any blocking operations (avoids threading overhead). Otherwise however PipedStreams will be waaay more efficient.Scruffy
R
7

I think the best way to connect InputStream to an OutputStream is through piped streams - available in java.io package, as follow:

// 1- Define stream buffer
private static final int PIPE_BUFFER = 2048;

// 2 -Create PipedInputStream with the buffer
public PipedInputStream inPipe = new PipedInputStream(PIPE_BUFFER);

// 3 -Create PipedOutputStream and bound it to the PipedInputStream object
public PipedOutputStream outPipe = new PipedOutputStream(inPipe);

// 4- PipedOutputStream is an OutputStream, So you can write data to it
// in any way suitable to your data. for example:
while (Condition) {
     outPipe.write(mByte);
}

/*Congratulations:D. Step 4 will write data to the PipedOutputStream
which is bound to the PipedInputStream so after filling the buffer
this data is available in the inPipe Object. Start reading it to
clear the buffer to be filled again by the PipedInputStream object.*/

In my opinion there are two main advantages for this code:

1 - There is no additional consumption of memory except for the buffer.

2 - You don't need to handle data queuing manually

Returnee answered 28/1, 2015 at 12:11 Comment(1)
This would be awesome, but the javadocs say that if you read and write to these in the same thread you could get deadlock. I wish they had updated this with NIO!Theron
D
2

I usually try to avoid creating a separate thread because of the increased chance of deadlock, the increased difficulty of understanding the code, and the problems of dealing with exceptions.

Here's my proposed solution: a ProducerInputStream that creates content in chunks by repeated calls to produceChunk():

public abstract class ProducerInputStream extends InputStream {

    private ByteArrayInputStream bin = new ByteArrayInputStream(new byte[0]);
    private ByteArrayOutputStream bout = new ByteArrayOutputStream();

    @Override
    public int read() throws IOException {
        int result = bin.read();
        while ((result == -1) && newChunk()) {
            result = bin.read();
        }
        return result;
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        int result = bin.read(b, off, len);
        while ((result == -1) && newChunk()) {
            result = bin.read(b, off, len);
        }
        return result;
    }

    private boolean newChunk() {
        bout.reset();
        produceChunk(bout);
        bin = new ByteArrayInputStream(bout.toByteArray());
        return (bout.size() > 0);
    }

    public abstract void produceChunk(OutputStream out);

}
Deltoro answered 28/11, 2016 at 13:19 Comment(1)
Interesting idea, but sadly this will only work if you are in control of the code that produces the data. If another 3rd party library writes GBs of data to the OutputStream without returning control then you might as well copy everything into memory which defies the point of this class.Card

© 2022 - 2024 — McMap. All rights reserved.