How do I convert an OutputStream to an InputStream?
Asked Answered
D

14

425

I am on the stage of development where I have two modules and from one I got output as a OutputStream, and a second one which accepts only InputStream.

Do you know how to convert OutputStream to InputStream (not vice versa, I mean really this way) so that I am able to connect these two parts?

Dangerfield answered 25/4, 2011 at 13:11 Comment(4)
See #1226409Philip
@c0mrade, the op wants something like IOUtils.copy, only in the other direction. When someone writes into an OutputStream, it becomes available for someone else to use in an InputStream. This is basically what PipedOutputStream/PipedInputStream do. Unfortunately the Piped streams can't be built from other streams.Hanoi
so the PipedOutputStream/PipedInputStream is the solution?Dangerfield
Basically in order for PipedStreams to work in your case, your OutputStream would need to be constructed like new YourOutputStream(thePipedOutputStream) and new YourInputStream(thePipedInputStream) which probably is not the way your stream works. So I don't think this is the solution.Hanoi
P
123

An OutputStream is one where you write data to. If some module exposes an OutputStream, the expectation is that there is something reading at the other end.

Something that exposes an InputStream, on the other hand, is indicating that you will need to listen to this stream, and there will be data that you can read.

So it is possible to connect an InputStream to an OutputStream

InputStream----read---> intermediateBytes[n] ----write----> OutputStream

As someone metioned, this is what the copy() method from IOUtils lets you do. It does not make sense to go the other way... hopefully this makes some sense

UPDATE:

Of course the more I think of this, the more I can see how this actually would be a requirement. I know some of the comments mentioned Piped input/ouput streams, but there is another possibility.

If the output stream that is exposed is a ByteArrayOutputStream, then you can always get the full contents by calling the toByteArray() method. Then you can create an input stream wrapper by using the ByteArrayInputStream sub-class. These two are pseudo-streams, they both basically just wrap an array of bytes. Using the streams this way, therefore, is technically possible, but to me it is still very strange...

Proficiency answered 25/4, 2011 at 13:36 Comment(12)
copy() do this IS to OS according to API, I need it to do backwardsDangerfield
See my edit on the top, it is necesarry for me to make some conversionDangerfield
Make your output a ByteArrayOutputStream, and then make you input in the second case a BAIS. Does this solution not apply? Perhaps I am misunderstanding.Proficiency
The usecase is very simple: Imagine you have a serialization library (for example, serializing to JSON) and a transport layer (say, Tomcat) which takes an InputStream. So you need to pipe the OutputStream from JSON over an HTTP connection which wants to read from an InputStream.Aletaaletha
This is useful when unit testing and you are super pedantic about avoiding touching the file system.Candleberry
I decided to use an XML serializer to copy a tricky directed acyclic graph that would've been difficult to write a proper explicit copy constructor for. In this case what I want to do is give the serializer an outputStream that fills a buffer, and then give the serializer an input stream that reads from the same buffer. Does this seem like a fair place to use this 'drive to output stream -> pull from input stream' scheme or am I missing something blindingly obvious?Kila
@Aletaaletha 's comment is spot on. Another use case is using PDFBox to build PDFs during an HTTP request. PDFBox using an OutputStream to save a PDF object, and the REST API accepts an InputStream to reply to the client. Hence, an OutputStream -> InputStream is a very real-world use case.Pubes
"you can always get the full contents by calling the toByteArray() method" the point of using streams is to not load the whole content into memory!!Avulsion
"It does not make sense to go the other way..." Unless you're trying to write objects to a GCP or AWS bucket. Both of their APIs require input streams for the WRITE operation, presumably so they have complete control over the data transfer. This makes plugging them into code that was written for the file system difficult.Sweetie
@L.Blanc, yes, the s3 putObject API expects an InputStream: putObject(String var1, String var2, InputStream var3, ObjectMetadata var4)Pulpiteer
This is totally wrong. The OP asked the OS->IS direction and that makes perfectly sense when you have a data processing pipeline where a step yields output for the next. Most of Unix commands work like that. To do this in Java, use easystream, which is based on stream pipes. This approach implies that one of the processes (usually the OS producer) is in the background (ie, in a thread). Even when you want to pipe an IS to an OS, you want to do it lazily (using Apache copy() or Java9 transferTo()) you don't want to waste memory by first reading the IS in a buffer. Stop programming so badly.Soldier
Why is this an accepted answer? The OP asked the other way around, which happened to be a use case that I am running into as well. As other comments mentioned, simply converting to byte array would have memory impact.Syllabize
G
288

There seem to be many links and other such stuff, but no actual code using pipes. The advantage of using java.io.PipedInputStream and java.io.PipedOutputStream is that there is no additional consumption of memory. ByteArrayOutputStream.toByteArray() returns a copy of the original buffer, so that means that whatever you have in memory, you now have two copies of it. Then writing to an InputStream means you now have three copies of the data.

The code using lambdas (hat-tip to @John Manko from the comments):

PipedInputStream in = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(in);
// in a background thread, write the given output stream to the
// PipedOutputStream for consumption
new Thread(() -> {originalOutputStream.writeTo(out);}).start();

One thing that @John Manko noted is that in certain cases, when you don't have control of the creation of the OutputStream, you may end up in a situation where the creator may clean up the OutputStream object prematurely. If you are getting the ClosedPipeException, then you should try inverting the constructors:

PipedInputStream in = new PipedInputStream(out);
new Thread(() -> {originalOutputStream.writeTo(out);}).start();

Note you can invert the constructors for the examples below too.

Thanks also to @AlexK for correcting me with starting a Thread instead of just kicking off a Runnable.


The code using try-with-resources:

// take the copy of the stream and re-write it to an InputStream
PipedInputStream in = new PipedInputStream();
    new Thread(new Runnable() {
        public void run () {
            // try-with-resources here
            // putting the try block outside the Thread will cause the
            // PipedOutputStream resource to close before the Runnable finishes
            try (final PipedOutputStream out = new PipedOutputStream(in)) {
                // write the original OutputStream to the PipedOutputStream
                // note that in order for the below method to work, you need
                // to ensure that the data has finished writing to the
                // ByteArrayOutputStream
                originalByteArrayOutputStream.writeTo(out);
            }
            catch (IOException e) {
                // logging and exception handling should go here
            }
        }
    }).start();

The original code I wrote:

// take the copy of the stream and re-write it to an InputStream
PipedInputStream in = new PipedInputStream();
final PipedOutputStream out = new PipedOutputStream(in);
new Thread(new Runnable() {
    public void run () {
        try {
            // write the original OutputStream to the PipedOutputStream
            // note that in order for the below method to work, you need
            // to ensure that the data has finished writing to the
            // ByteArrayOutputStream
            originalByteArrayOutputStream.writeTo(out);
        }
        catch (IOException e) {
            // logging and exception handling should go here
        }
        finally {
            // close the PipedOutputStream here because we're done writing data
            // once this thread has completed its run
            if (out != null) {
                // close the PipedOutputStream cleanly
                out.close();
            }
        }   
    }
}).start();

This code assumes that the originalByteArrayOutputStream is a ByteArrayOutputStream as it is usually the only usable output stream, unless you're writing to a file. The great thing about this is that since it's in a separate thread, it also is working in parallel, so whatever is consuming your input stream will be streaming out of your old output stream too. That is beneficial because the buffer can remain smaller and you'll have less latency and less memory usage.

If you don't have a ByteArrayOutputStream, then instead of using writeTo(), you will have to use one of the write() methods in the java.io.OutputStream class or one of the other methods available in a subclass.

Gaulish answered 26/5, 2014 at 16:20 Comment(30)
I voted this up, but it's better to pass out to in's constructor, otherwise you might get a closed pipe exception on in due to race condition (which I experienced). Using Java 8 Lambdas: PipedInputStream in = new PipedInputStream(out); ((Runnable)() -> {originalOutputStream.writeTo(out);}).run(); return in;Pubes
@JohnManko hmm... I've never had that problem. Did you experience this because another thread or the main thread is calling out.close() ? It's true that this code assumes that your PipedOutputStream is longer-lived than your originalOutputStream which should be true, but it doesn't assume how you control your streams. That is left up to the developer. There's nothing in this code that would cause a closed or broken pipe exception.Gaulish
No, my case stems from when I store PDFs in Mongo GridFS, and then stream to the client using Jax-RS. MongoDB supplies an OutputStream, but Jax-RS requires an InputStream. My path method would return to the container with an InputStream before the OutputStream was fully established, it seems (perhaps the buffer hadn't been cached yet). Anyway, Jax-RS would throw of pipe closed exception on the InputStream. Odd, but that's what happened half the time. Changing to the code above prevents that.Pubes
@JohnManko I was looking into this more and I saw from the PipedInputStream Javadocs: A pipe is said to be broken if a thread that was providing data bytes to the connected piped output stream is no longer alive. So what I'm suspecting is that if you're using the example above, the thread is completing before Jax-RS is consuming the input stream. At the same time, I looked at the MongoDB Javadocs. GridFSDBFile has an input stream, so why not just pass that to Jax-RS?Gaulish
@JohnManko At the same time, the order of the initialization shouldn't really matter, so I will update the answer to clarify that point. My other comment was intended to find the root cause of your problem, because the initialization order really shouldn't matter. (I know that it works for you. That suggest to me that there's a timing problem between the input stream and the output stream. As your system takes on more load or other factors, those timings will change and you may run into the problem again. Flipping the initialization shouldn't be a real fix — from what the Javadocs tell me.)Gaulish
There is additional consumption of memory, because of the extra heap for new ThreadAccentual
@DennisCheung yeah, of course. Nothing's free, but it's certainly going to be smaller than a 15MB copy. Optimizations would include using a thread pool instead to reduce the GC churn with constant thread/object creation.Gaulish
Why not just InputStream is = new ByteArrayInputStream(myOut.toByteArray()) ? This does not create a copy, rather a pointer I believe. Yours method spawns a thread to do the copy, even if you "hide" it with lambdas.Outdo
@MitjaGustin It does use a new thread, but threads are relatively expensive, compared to memory. Most programs are memory-constrained, not CPU-constrained, especially if you are dealing with large amounts of data. If you look at the source code (developer.classpath.org/doc/java/io/…), you'll see that toByteArray() does indeed make a copy of the output stream's buffer. Also, toByteArray() only exists for ByteArrayOutputStream, not for other types of OutputStream (developer.classpath.org/doc/java/io/OutputStream.html). I just used BAOS as an example.Gaulish
@MitjaGustin The thread also gives more flexibility. My example runs instantaneously, so the data is immediately available to the InputStream, but you can execute the thread whenever you want, in case you need the OutputStream data for something else. If you're using an OutputStream other than BAOS, you would need to modify your run loop to use the write() methods of the OutputStream parent class.Gaulish
@Gaulish doesn't the writing thread have to close out once it has finished? Otherwise the pipe will be considered broken when trying to read after the writing thread died.Superjacent
@Superjacent I didn't close it in the example is because you want to close the OutputStream as close to the original creation. You don't know how else it will be used, so you cannot assume that it may not be used elsewhere. Also, separating the creation and the closing of a stream in different places is a multi-threaded nightmare for debugging, so you don't want to do that. For completeness, you could have a ThreadExecutor and serialize this copy and the close into two different executions, but it's outside the scope of this example.Gaulish
I just reread my comment from 12/5. I meant that "threads are relatively inexpensive", not expensive!Gaulish
@mikeho, but you cannot close out where it was created. You have in that thread no information about when writing finished (unless you wait for the reading thread or use executors and futures). Could you please at least mention that closing out is really important? Failing to do this results in a broken pipe exception on the reader side (or infinite blocking).Superjacent
@Superjacent yeah, I can do that. Executors and futures are exactly what I'm referring to in my comment but I'll edit itGaulish
@mikeho, oh I think you misunderstood me. Closing the ByteArrayOutputStream is not what I meant (it also has not effect). I meant closing out (the PipedOutputStream), see for example stackoverflow.com/a/29725367Superjacent
@Superjacent Okay, I fixed it. Sorry, I haven't looked at this code in 5 years.Gaulish
Keep in mind that PipedInputStream and PipedOutputStream need to be both in a separate thread, otherwise a deadlock can occur after a certain size (see Java doc: docs.oracle.com/javase/7/docs/api/java/io/PipedInputStream.html)Mon
@Mon right! Good to call that out. The example I gave simply prepares the InputStream for consumption and does so in its own thread as to not block the main thread.Gaulish
why do you create a thread to call originalByteArrayOutputStream.writeTo ?Emetic
@olivier boissé, because you may need to read and write the data simultaneously. The consumer of the input stream shouldn't have to care about where the data comes from, so you don't want the consumer to manage the outputstream. This also guarantees that the current thread won't deadlock waiting for input because the same thread cannot write to the output—something that will happen if you have both input and output on the same thread.Gaulish
thanks for the answer. I think you can remove the finally block in your code and use a try-with-resources try (out) {Emetic
Yes, I should update the example sometime. When I first wrote up this answer 7 (!!) years ago, try-with-resources didn’t exist.Gaulish
yes, this is what the OP actually wants, not naive reading-in-memory-then-writing, and there is easystreams already doing that (https://mcmap.net/q/80862/-how-do-i-convert-an-outputstream-to-an-inputstream).Soldier
@Soldier true, but why add more libraries if you can do it natively? The solution I propose doesn't read into memory until it requires output, so you're not preloading anything and wasting space.Gaulish
@mikeho, 1) because "natively" here means you are reinventing yet another wheel, and the fact you even ask it, makes me wish to retire from what IT has become nowadays 2) yes, the solution you propose, at least doesn't waste memory as many others have proposed to do, I was highlighting that.Soldier
@Soldier I appreciate your comments. I've also been in the industry for 25+ years and what I've seen is the opposite of you: that many libraries are abandoned and thus aren't updated to use new APIs or fix security holes. The Java SDK, on the other hand, is continually updated, so relying on a reputable source is better. Easystream was last updated in 2016. That's 6 years ago. In the meantime, Java SDK/JDK 9-18 released between 2017 and 2022. SDK 19 is coming out in September 2022.Gaulish
Apart from easystream (yes, it might be abandoned, but it still works fine and its open code is quite readable or modificabile), are you really saying you want to stick to JDK only, because other libs could be abandoned one day? Would you rewrite elephants like Guava, Spring, or, why not, your OS? Seriously? And how can you be sure JDK itself will never be abandoned? They've already ditched many packages after version 8, causing me to waste endless hours to fix legacy code.Soldier
Am I missing something here? ((Runnable)() -> {originalOutputStream.writeTo(out);}).run(); is running in the same thread as it's called from. It has the same effect as just running originalOutputStream.writeTo(out); in the host thread. What does the job, is new Thread(() -> {originalOutputStream.writeTo(out);}).start();Fadein
@AlexK. you're right! sorry, I'll update the answer.Gaulish
P
123

An OutputStream is one where you write data to. If some module exposes an OutputStream, the expectation is that there is something reading at the other end.

Something that exposes an InputStream, on the other hand, is indicating that you will need to listen to this stream, and there will be data that you can read.

So it is possible to connect an InputStream to an OutputStream

InputStream----read---> intermediateBytes[n] ----write----> OutputStream

As someone metioned, this is what the copy() method from IOUtils lets you do. It does not make sense to go the other way... hopefully this makes some sense

UPDATE:

Of course the more I think of this, the more I can see how this actually would be a requirement. I know some of the comments mentioned Piped input/ouput streams, but there is another possibility.

If the output stream that is exposed is a ByteArrayOutputStream, then you can always get the full contents by calling the toByteArray() method. Then you can create an input stream wrapper by using the ByteArrayInputStream sub-class. These two are pseudo-streams, they both basically just wrap an array of bytes. Using the streams this way, therefore, is technically possible, but to me it is still very strange...

Proficiency answered 25/4, 2011 at 13:36 Comment(12)
copy() do this IS to OS according to API, I need it to do backwardsDangerfield
See my edit on the top, it is necesarry for me to make some conversionDangerfield
Make your output a ByteArrayOutputStream, and then make you input in the second case a BAIS. Does this solution not apply? Perhaps I am misunderstanding.Proficiency
The usecase is very simple: Imagine you have a serialization library (for example, serializing to JSON) and a transport layer (say, Tomcat) which takes an InputStream. So you need to pipe the OutputStream from JSON over an HTTP connection which wants to read from an InputStream.Aletaaletha
This is useful when unit testing and you are super pedantic about avoiding touching the file system.Candleberry
I decided to use an XML serializer to copy a tricky directed acyclic graph that would've been difficult to write a proper explicit copy constructor for. In this case what I want to do is give the serializer an outputStream that fills a buffer, and then give the serializer an input stream that reads from the same buffer. Does this seem like a fair place to use this 'drive to output stream -> pull from input stream' scheme or am I missing something blindingly obvious?Kila
@Aletaaletha 's comment is spot on. Another use case is using PDFBox to build PDFs during an HTTP request. PDFBox using an OutputStream to save a PDF object, and the REST API accepts an InputStream to reply to the client. Hence, an OutputStream -> InputStream is a very real-world use case.Pubes
"you can always get the full contents by calling the toByteArray() method" the point of using streams is to not load the whole content into memory!!Avulsion
"It does not make sense to go the other way..." Unless you're trying to write objects to a GCP or AWS bucket. Both of their APIs require input streams for the WRITE operation, presumably so they have complete control over the data transfer. This makes plugging them into code that was written for the file system difficult.Sweetie
@L.Blanc, yes, the s3 putObject API expects an InputStream: putObject(String var1, String var2, InputStream var3, ObjectMetadata var4)Pulpiteer
This is totally wrong. The OP asked the OS->IS direction and that makes perfectly sense when you have a data processing pipeline where a step yields output for the next. Most of Unix commands work like that. To do this in Java, use easystream, which is based on stream pipes. This approach implies that one of the processes (usually the OS producer) is in the background (ie, in a thread). Even when you want to pipe an IS to an OS, you want to do it lazily (using Apache copy() or Java9 transferTo()) you don't want to waste memory by first reading the IS in a buffer. Stop programming so badly.Soldier
Why is this an accepted answer? The OP asked the other way around, which happened to be a use case that I am running into as well. As other comments mentioned, simply converting to byte array would have memory impact.Syllabize
R
46

As input and output streams are just start and end point, the solution is to temporary store data in byte array. So you must create intermediate ByteArrayOutputStream, from which you create byte[] that is used as input for new ByteArrayInputStream.

public void doTwoThingsWithStream(InputStream inStream, OutputStream outStream){ 
  //create temporary bayte array output stream
  ByteArrayOutputStream baos = new ByteArrayOutputStream();
  doFirstThing(inStream, baos);
  //create input stream from baos
  InputStream isFromFirstData = new ByteArrayInputStream(baos.toByteArray()); 
  doSecondThing(isFromFirstData, outStream);
}

Hope it helps.

Rate answered 8/5, 2014 at 8:35 Comment(1)
baos.toByteArray() creates a copy with System.arraycopy. Thanks to @Gaulish for pointing out developer.classpath.org/doc/java/io/…Outdo
M
28
ByteArrayOutputStream buffer = (ByteArrayOutputStream) aOutputStream;
byte[] bytes = buffer.toByteArray();
InputStream inputStream = new ByteArrayInputStream(bytes);
Maidenly answered 27/1, 2017 at 7:13 Comment(3)
You should not use this since toByteArray() method body is like this return Arrays.copyOf(buf, count); which returns a new array.Skippy
java.io.FileOutputStream cannot be cast to java.io.ByteArrayOutputStreamIntertype
This can fail terribly as not every OutputStream (superclass) is a ByteArrayOutputStream (child class)Bigford
L
20

The easystream open source library has direct support to convert an OutputStream to an InputStream: http://io-tools.sourceforge.net/easystream/tutorial/tutorial.html

// create conversion
final OutputStreamToInputStream<Void> out = new OutputStreamToInputStream<Void>() {
    @Override
    protected Void doRead(final InputStream in) throws Exception {
           LibraryClass2.processDataFromInputStream(in);
           return null;
        }
    };
try {   
     LibraryClass1.writeDataToTheOutputStream(out);
} finally {
     // don't miss the close (or a thread would not terminate correctly).
     out.close();
}

They also list other options: http://io-tools.sourceforge.net/easystream/outputstream_to_inputstream/implementations.html

  • Write the data the data into a memory buffer (ByteArrayOutputStream) get the byteArray and read it again with a ByteArrayInputStream. This is the best approach if you're sure your data fits into memory.
  • Copy your data to a temporary file and read it back.
  • Use pipes: this is the best approach both for memory usage and speed (you can take full advantage of the multi-core processors) and also the standard solution offered by Sun.
  • Use InputStreamFromOutputStream and OutputStreamToInputStream from the easystream library.
Lysol answered 6/6, 2013 at 23:19 Comment(1)
Yes!, use easystream!Cassis
W
19

You will need an intermediate class which will buffer between. Each time InputStream.read(byte[]...) is called, the buffering class will fill the passed in byte array with the next chunk passed in from OutputStream.write(byte[]...). Since the sizes of the chunks may not be the same, the adapter class will need to store a certain amount until it has enough to fill the read buffer and/or be able to store up any buffer overflow.

This article has a nice breakdown of a few different approaches to this problem:

http://blog.ostermiller.org/convert-java-outputstream-inputstream

Westerfield answered 26/5, 2012 at 18:53 Comment(1)
thanks @mckamey, the method based on Circular Buffers is exactly what i need !Schnabel
M
15

I encountered the same problem with converting a ByteArrayOutputStream to a ByteArrayInputStream and solved it by using a derived class from ByteArrayOutputStream which is able to return a ByteArrayInputStream that is initialized with the internal buffer of the ByteArrayOutputStream. This way no additional memory is used and the 'conversion' is very fast:

package info.whitebyte.utils;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;

/**
 * This class extends the ByteArrayOutputStream by 
 * providing a method that returns a new ByteArrayInputStream
 * which uses the internal byte array buffer. This buffer
 * is not copied, so no additional memory is used. After
 * creating the ByteArrayInputStream the instance of the
 * ByteArrayInOutStream can not be used anymore.
 * <p>
 * The ByteArrayInputStream can be retrieved using <code>getInputStream()</code>.
 * @author Nick Russler
 */
public class ByteArrayInOutStream extends ByteArrayOutputStream {
    /**
     * Creates a new ByteArrayInOutStream. The buffer capacity is
     * initially 32 bytes, though its size increases if necessary.
     */
    public ByteArrayInOutStream() {
        super();
    }

    /**
     * Creates a new ByteArrayInOutStream, with a buffer capacity of
     * the specified size, in bytes.
     *
     * @param   size   the initial size.
     * @exception  IllegalArgumentException if size is negative.
     */
    public ByteArrayInOutStream(int size) {
        super(size);
    }

    /**
     * Creates a new ByteArrayInputStream that uses the internal byte array buffer 
     * of this ByteArrayInOutStream instance as its buffer array. The initial value 
     * of pos is set to zero and the initial value of count is the number of bytes 
     * that can be read from the byte array. The buffer array is not copied. This 
     * instance of ByteArrayInOutStream can not be used anymore after calling this
     * method.
     * @return the ByteArrayInputStream instance
     */
    public ByteArrayInputStream getInputStream() {
        // create new ByteArrayInputStream that respects the current count
        ByteArrayInputStream in = new ByteArrayInputStream(this.buf, 0, this.count);

        // set the buffer of the ByteArrayOutputStream 
        // to null so it can't be altered anymore
        this.buf = null;

        return in;
    }
}

I put the stuff on github: https://github.com/nickrussler/ByteArrayInOutStream

Myrna answered 4/9, 2014 at 19:57 Comment(10)
what if the content does not fit into the buffer?Gadroon
Then you should not use a ByteArrayInputStream in the first place.Myrna
This solution will have all bytes in memory. For small files this will be ok but then you can also use getBytes() on ByteArrayOutput StreamGadroon
If you mean toByteArray this would cause the internal buffer to be copied, which would take twice as much memory as my approach. Edit: Ah i understand, for small files this works of course..Myrna
Waste of time. ByteArrayOutputStream has a writeTo method to transfer content to another output streamPolygon
@TonyBenBrahim but this copies the buffer, which doubles the memory usage and costs time.Myrna
There is an error in getInputStream(). The count value must be passed to the constructor. The ByteArrayInputStream must be created with new ByteArrayInputStream(this.buf, 0, this.count).Tolan
@Christiand'Heureuse Yes, thanks. I fixed that on github a while ago, but forgot to edit it here.Myrna
This method suits me in the case that data buffered in memory but not want to double itTeddman
this answer, while not inherently bad, fails to mention important performance gotchas and provides paramless constructor in ByteArrayInOutStream, that should never be used. See my answer for details.Bifocals
B
4

In cases when your data is not very big, is produced without any blocking operations and its overall size can be reasonably accurately estimated, creating an additional thread for PipedStreams may be an unnecessary overhead. Using ByteArrayStreams the "normal" way, results in an unnecessary buffer copying in ByteArrayOutputStream.toByteArray(), but there's a trick to avoid it: the underlying buffer of ByteArrayOutputStream is a protected field (meaning it's a part of Java's documented API and will not change), so you can create a simple subclass that gives you an access to it:

public class NoCopyByteArrayOutputStream extends ByteArrayOutputStream {
  public byte[] getBuffer() { return buf; }
  public NoCopyByteArrayOutputStream(int initialSize) { super(initialSize); }
}

After that you can proceed in a similar way like with standard ByteArrayStreams:

final var outputBytes = new NoCopyByteArrayOutputStream(estimatedSize);
// produce data and output it to outputBytes here...

final var inputBytes = new ByteArrayInputStream(
        outputBytes.getBuffer(), 0, outputBytes.size())
// consume data from inputBytes here...

Note that the buffer may be bigger, than the number of bytes that were actually written to outputBytes output stream, hence the the 3-param constructor ByteArrayInputStream(bytes, offset, length) must be used and length value obtained from outputBytes's size.

Remember that if your data production process involves any blocking operations (like network I/O or even reading from a local file), you should definitely use PipedStreams (as described in the other answer) to not delay unnecessarily partial data consumption from inputBytes.
Secondly, if you heavily underestimate the amount of data and pass significantly too low initialSize to the constructor of NoCopyByteArrayOutputStream, it will also have significant performance consequences due to repeated buffer extending, which involves copying of the whole data that was written up till the given point (hence the paramless constructor was not provided, as it doesn't make sense to use this class if the data size cannot be estimated with a reasonable accuracy).
Finally, if the data passed between streams is big, then PipedStreams will usually be much more efficient in terms of memory, provided that the reading thread does not get blocked too much and generally reads the data "on the fly" as soon as it appears.

Bifocals answered 8/6, 2023 at 22:15 Comment(0)
C
1

The library io-extras may be useful. For example if you want to gzip an InputStream using GZIPOutputStream and you want it to happen synchronously (using the default buffer size of 8192):

InputStream is = ...
InputStream gz = IOUtil.pipe(is, o -> new GZIPOutputStream(o));

Note that the library has 100% unit test coverage (for what that's worth of course!) and is on Maven Central. The Maven dependency is:

<dependency>
  <groupId>com.github.davidmoten</groupId>
  <artifactId>io-extras</artifactId>
  <version>0.1</version>
</dependency>

Be sure to check for a later version.

Chiller answered 26/4, 2018 at 0:43 Comment(0)
I
0

From my point of view, java.io.PipedInputStream/java.io.PipedOutputStream is the best option to considere. In some situations you may want to use ByteArrayInputStream/ByteArrayOutputStream. The problem is that you need to duplicate the buffer to convert a ByteArrayOutputStream to a ByteArrayInputStream. Also ByteArrayOutpuStream/ByteArrayInputStream are limited to 2GB. Here is an OutpuStream/InputStream implementation I wrote to bypass ByteArrayOutputStream/ByteArrayInputStream limitations (Scala code, but easily understandable for java developpers):

import java.io.{IOException, InputStream, OutputStream}

import scala.annotation.tailrec

/** Acts as a replacement for ByteArrayOutputStream
  *
  */
class HugeMemoryOutputStream(capacity: Long) extends OutputStream {
  private val PAGE_SIZE: Int = 1024000
  private val ALLOC_STEP: Int = 1024

  /** Pages array
    *
    */
  private var streamBuffers: Array[Array[Byte]] = Array.empty[Array[Byte]]

  /** Allocated pages count
    *
    */
  private var pageCount: Int = 0

  /** Allocated bytes count
    *
    */
  private var allocatedBytes: Long = 0

  /** Current position in stream
    *
    */
  private var position: Long = 0

  /** Stream length
    *
    */
  private var length: Long = 0

  allocSpaceIfNeeded(capacity)

  /** Gets page count based on given length
    *
    * @param length   Buffer length
    * @return         Page count to hold the specified amount of data
    */
  private def getPageCount(length: Long) = {
    var pageCount = (length / PAGE_SIZE).toInt + 1

    if ((length % PAGE_SIZE) == 0) {
      pageCount -= 1
    }

    pageCount
  }

  /** Extends pages array
    *
    */
  private def extendPages(): Unit = {
    if (streamBuffers.isEmpty) {
      streamBuffers = new Array[Array[Byte]](ALLOC_STEP)
    }
    else {
      val newStreamBuffers = new Array[Array[Byte]](streamBuffers.length + ALLOC_STEP)
      Array.copy(streamBuffers, 0, newStreamBuffers, 0, streamBuffers.length)
      streamBuffers = newStreamBuffers
    }

    pageCount = streamBuffers.length
  }

  /** Ensures buffers are bug enough to hold specified amount of data
    *
    * @param value  Amount of data
    */
  private def allocSpaceIfNeeded(value: Long): Unit = {
    @tailrec
    def allocSpaceIfNeededIter(value: Long): Unit = {
      val currentPageCount = getPageCount(allocatedBytes)
      val neededPageCount = getPageCount(value)

      if (currentPageCount < neededPageCount) {
        if (currentPageCount == pageCount) extendPages()

        streamBuffers(currentPageCount) = new Array[Byte](PAGE_SIZE)
        allocatedBytes = (currentPageCount + 1).toLong * PAGE_SIZE

        allocSpaceIfNeededIter(value)
      }
    }

    if (value < 0) throw new Error("AllocSpaceIfNeeded < 0")
    if (value > 0) {
      allocSpaceIfNeededIter(value)

      length = Math.max(value, length)
      if (position > length) position = length
    }
  }

  /**
    * Writes the specified byte to this output stream. The general
    * contract for <code>write</code> is that one byte is written
    * to the output stream. The byte to be written is the eight
    * low-order bits of the argument <code>b</code>. The 24
    * high-order bits of <code>b</code> are ignored.
    * <p>
    * Subclasses of <code>OutputStream</code> must provide an
    * implementation for this method.
    *
    * @param      b the <code>byte</code>.
    */
  @throws[IOException]
  override def write(b: Int): Unit = {
    val buffer: Array[Byte] = new Array[Byte](1)

    buffer(0) = b.toByte

    write(buffer)
  }

  /**
    * Writes <code>len</code> bytes from the specified byte array
    * starting at offset <code>off</code> to this output stream.
    * The general contract for <code>write(b, off, len)</code> is that
    * some of the bytes in the array <code>b</code> are written to the
    * output stream in order; element <code>b[off]</code> is the first
    * byte written and <code>b[off+len-1]</code> is the last byte written
    * by this operation.
    * <p>
    * The <code>write</code> method of <code>OutputStream</code> calls
    * the write method of one argument on each of the bytes to be
    * written out. Subclasses are encouraged to override this method and
    * provide a more efficient implementation.
    * <p>
    * If <code>b</code> is <code>null</code>, a
    * <code>NullPointerException</code> is thrown.
    * <p>
    * If <code>off</code> is negative, or <code>len</code> is negative, or
    * <code>off+len</code> is greater than the length of the array
    * <code>b</code>, then an <tt>IndexOutOfBoundsException</tt> is thrown.
    *
    * @param      b   the data.
    * @param      off the start offset in the data.
    * @param      len the number of bytes to write.
    */
  @throws[IOException]
  override def write(b: Array[Byte], off: Int, len: Int): Unit = {
    @tailrec
    def writeIter(b: Array[Byte], off: Int, len: Int): Unit = {
      val currentPage: Int = (position / PAGE_SIZE).toInt
      val currentOffset: Int = (position % PAGE_SIZE).toInt

      if (len != 0) {
        val currentLength: Int = Math.min(PAGE_SIZE - currentOffset, len)
        Array.copy(b, off, streamBuffers(currentPage), currentOffset, currentLength)

        position += currentLength

        writeIter(b, off + currentLength, len - currentLength)
      }
    }

    allocSpaceIfNeeded(position + len)
    writeIter(b, off, len)
  }

  /** Gets an InputStream that points to HugeMemoryOutputStream buffer
    *
    * @return InputStream
    */
  def asInputStream(): InputStream = {
    new HugeMemoryInputStream(streamBuffers, length)
  }

  private class HugeMemoryInputStream(streamBuffers: Array[Array[Byte]], val length: Long) extends InputStream {
    /** Current position in stream
      *
      */
    private var position: Long = 0

    /**
      * Reads the next byte of data from the input stream. The value byte is
      * returned as an <code>int</code> in the range <code>0</code> to
      * <code>255</code>. If no byte is available because the end of the stream
      * has been reached, the value <code>-1</code> is returned. This method
      * blocks until input data is available, the end of the stream is detected,
      * or an exception is thrown.
      *
      * <p> A subclass must provide an implementation of this method.
      *
      * @return the next byte of data, or <code>-1</code> if the end of the
      *         stream is reached.
      */
    @throws[IOException]
    def read: Int = {
      val buffer: Array[Byte] = new Array[Byte](1)

      if (read(buffer) == 0) throw new Error("End of stream")
      else buffer(0)
    }

    /**
      * Reads up to <code>len</code> bytes of data from the input stream into
      * an array of bytes.  An attempt is made to read as many as
      * <code>len</code> bytes, but a smaller number may be read.
      * The number of bytes actually read is returned as an integer.
      *
      * <p> This method blocks until input data is available, end of file is
      * detected, or an exception is thrown.
      *
      * <p> If <code>len</code> is zero, then no bytes are read and
      * <code>0</code> is returned; otherwise, there is an attempt to read at
      * least one byte. If no byte is available because the stream is at end of
      * file, the value <code>-1</code> is returned; otherwise, at least one
      * byte is read and stored into <code>b</code>.
      *
      * <p> The first byte read is stored into element <code>b[off]</code>, the
      * next one into <code>b[off+1]</code>, and so on. The number of bytes read
      * is, at most, equal to <code>len</code>. Let <i>k</i> be the number of
      * bytes actually read; these bytes will be stored in elements
      * <code>b[off]</code> through <code>b[off+</code><i>k</i><code>-1]</code>,
      * leaving elements <code>b[off+</code><i>k</i><code>]</code> through
      * <code>b[off+len-1]</code> unaffected.
      *
      * <p> In every case, elements <code>b[0]</code> through
      * <code>b[off]</code> and elements <code>b[off+len]</code> through
      * <code>b[b.length-1]</code> are unaffected.
      *
      * <p> The <code>read(b,</code> <code>off,</code> <code>len)</code> method
      * for class <code>InputStream</code> simply calls the method
      * <code>read()</code> repeatedly. If the first such call results in an
      * <code>IOException</code>, that exception is returned from the call to
      * the <code>read(b,</code> <code>off,</code> <code>len)</code> method.  If
      * any subsequent call to <code>read()</code> results in a
      * <code>IOException</code>, the exception is caught and treated as if it
      * were end of file; the bytes read up to that point are stored into
      * <code>b</code> and the number of bytes read before the exception
      * occurred is returned. The default implementation of this method blocks
      * until the requested amount of input data <code>len</code> has been read,
      * end of file is detected, or an exception is thrown. Subclasses are encouraged
      * to provide a more efficient implementation of this method.
      *
      * @param      b   the buffer into which the data is read.
      * @param      off the start offset in array <code>b</code>
      *                 at which the data is written.
      * @param      len the maximum number of bytes to read.
      * @return the total number of bytes read into the buffer, or
      *         <code>-1</code> if there is no more data because the end of
      *         the stream has been reached.
      * @see java.io.InputStream#read()
      */
    @throws[IOException]
    override def read(b: Array[Byte], off: Int, len: Int): Int = {
      @tailrec
      def readIter(acc: Int, b: Array[Byte], off: Int, len: Int): Int = {
        val currentPage: Int = (position / PAGE_SIZE).toInt
        val currentOffset: Int = (position % PAGE_SIZE).toInt

        val count: Int = Math.min(len, length - position).toInt

        if (count == 0 || position >= length) acc
        else {
          val currentLength = Math.min(PAGE_SIZE - currentOffset, count)
          Array.copy(streamBuffers(currentPage), currentOffset, b, off, currentLength)

          position += currentLength

          readIter(acc + currentLength, b, off + currentLength, len - currentLength)
        }
      }

      readIter(0, b, off, len)
    }

    /**
      * Skips over and discards <code>n</code> bytes of data from this input
      * stream. The <code>skip</code> method may, for a variety of reasons, end
      * up skipping over some smaller number of bytes, possibly <code>0</code>.
      * This may result from any of a number of conditions; reaching end of file
      * before <code>n</code> bytes have been skipped is only one possibility.
      * The actual number of bytes skipped is returned. If <code>n</code> is
      * negative, the <code>skip</code> method for class <code>InputStream</code> always
      * returns 0, and no bytes are skipped. Subclasses may handle the negative
      * value differently.
      *
      * The <code>skip</code> method of this class creates a
      * byte array and then repeatedly reads into it until <code>n</code> bytes
      * have been read or the end of the stream has been reached. Subclasses are
      * encouraged to provide a more efficient implementation of this method.
      * For instance, the implementation may depend on the ability to seek.
      *
      * @param      n the number of bytes to be skipped.
      * @return the actual number of bytes skipped.
      */
    @throws[IOException]
    override def skip(n: Long): Long = {
      if (n < 0) 0
      else {
        position = Math.min(position + n, length)
        length - position
      }
    }
  }
}

Easy to use, no buffer duplication, no 2GB memory limit

val out: HugeMemoryOutputStream = new HugeMemoryOutputStream(initialCapacity /*may be 0*/)

out.write(...)
...

val in1: InputStream = out.asInputStream()

in1.read(...)
...

val in2: InputStream = out.asInputStream()

in2.read(...)
...
Ipomoea answered 10/4, 2018 at 17:16 Comment(0)
B
0

As some here have answered already, there is no efficient way to just ‘convert’ an OutputStream to an InputStream. The trick to solve a problem like yours is to execute all code that requires the OutputStream into its own thread. By using piped streams, we can then transfer the data out of the created thread over into an InputStream.

Example usage:

public static InputStream downloadFileAsStream(final String uriString) throws IOException {
        final InputStream inputStream = runInOwnThreadWithPipedStreams((outputStream) -> {
            try {
                downloadUriToStream(uriString, outputStream);
            } catch (final Exception e) {
                LOGGER.error("Download of uri '{}' has failed", uriString, e);
            }
        });
        return inputStream;
    }

Helper function:

public static InputStream runInOwnThreadWithPipedStreams(
            final Consumer<OutputStream> outputStreamConsumer) throws IOException {
        final PipedInputStream inputStream = new PipedInputStream();
        final PipedOutputStream outputStream = new PipedOutputStream(inputStream);
        new Thread(new Runnable() {
            public void run() {
                try {
                    outputStreamConsumer.accept(outputStream);
                } finally {
                    try {
                        outputStream.close();
                    } catch (final IOException e) {
                        LOGGER.error("Closing outputStream has failed. ", e);
                    }
                }
            }
        }).start();
        return inputStream;
    }

Unit Test:

@Test
void testRunInOwnThreadWithPipedStreams() throws IOException {

    final InputStream inputStream = LoadFileUtil.runInOwnThreadWithPipedStreams((OutputStream outputStream) -> {
        try {
            IOUtils.copy(IOUtils.toInputStream("Hello World", StandardCharsets.UTF_8), outputStream);
        } catch (final IOException e) {
            LoggerFactory.getLogger(LoadFileUtilTest.class).error(e.getMessage(), e);
        }
    });

    final String actualResult = IOUtils.toString(inputStream, StandardCharsets.UTF_8);
    Assertions.assertEquals("Hello World", actualResult);
}
Blackwood answered 8/12, 2022 at 12:51 Comment(0)
M
-1

If you want to make an OutputStream from an InputStream there is one basic problem. A method writing to an OutputStream blocks until it is done. So the result is available when the writing method is finished. This has 2 consequences:

  1. If you use only one thread, you need to wait until everything is written (so you need to store the stream's data in memory or disk).
  2. If you want to access the data before it is finished, you need a second thread.

Variant 1 can be implemented using byte arrays or filed. Variant 1 can be implemented using pipies (either directly or with extra abstraction - e.g. RingBuffer or the google lib from the other comment).

Indeed with standard java there is no other way to solve the problem. Each solution is an implementataion of one of these.

There is one concept called "continuation" (see wikipedia for details). In this case basically this means:

  • there is a special output stream that expects a certain amount of data
  • if the ammount is reached, the stream gives control to it's counterpart which is a special input stream
  • the input stream makes the amount of data available until it is read, after that, it passes back the control to the output stream

While some languages have this concept built in, for java you need some "magic". For example "commons-javaflow" from apache implements such for java. The disadvantage is that this requires some special bytecode modifications at build time. So it would make sense to put all the stuff in an extra library whith custom build scripts.

Minna answered 11/10, 2013 at 20:48 Comment(0)
T
-2

Though you cannot convert an OutputStream to an InputStream, java provides a way using PipedOutputStream and PipedInputStream that you can have data written to a PipedOutputStream to become available through an associated PipedInputStream.
Sometime back I faced a similar situation when dealing with third party libraries that required an InputStream instance to be passed to them instead of an OutputStream instance.
The way I fixed this issue is to use the PipedInputStream and PipedOutputStream.
By the way they are tricky to use and you must use multithreading to achieve what you want. I recently published an implementation on github which you can use.
Here is the link . You can go through the wiki to understand how to use it.

Townie answered 25/4, 2017 at 8:30 Comment(0)
K
-3

Updated post:

Essentially these are the basic 3 steps what we have to do:

    // Sample data
    String data = "Foo! Bar!!";

    // 1. Write the data to a ByteArrayOutputStream
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    try (OutputStream os = baos) {
        os.write(data.getBytes());
    }

    // 2. Convert the ByteArrayOutputStream to a byte array
    byte[] bytes = baos.toByteArray();

    // 3. Use the byte array to create a ByteArrayInputStream
    try (InputStream is = new ByteArrayInputStream(bytes)) {
        int byteRead;
        while ((byteRead = is.read()) != -1) {
            System.out.print((char) byteRead);
        }
    }

NOTE: This approach is simple and effective for in-memory operations. However, be cautious about memory consumption if you're dealing with large amounts of data, since the entire data will be kept in memory twice (once in the ByteArrayOutputStream and once in the byte array). If you're handling large streams, you might want to consider using a temporary file or another mechanism that doesn't rely on keeping everything in memory.

Krigsman answered 15/10, 2013 at 9:1 Comment(2)
to String --> size problemQuaint
Besides, calling toString().getBytes() on a stream *will not return the contents of the stream.Skiba

© 2022 - 2024 — McMap. All rights reserved.