Robust skipping of data in a java.io.InputStream and its subtypes
Asked Answered
W

3

13

I'm processing a binary stream and need to skip efficiently past a range of data that I'm not interested in, to some data that will be processed.

InputStream.skip(long) doesn't make much in the way of guarantees:

Skips over and discards n bytes of data from this input stream. The skip method may, for a variety of reasons, end up skipping over some smaller number of bytes, possibly 0. This may result from any of a number of conditions; reaching end of file before n bytes have been skipped is only one possibility. The actual number of bytes skipped is returned.

I need to know that one of two things has happened:

  1. The stream ended
  2. The bytes were skipped

Simple enough. However, the leniency afforded in this description means that, for example, BufferedInputStream can just skip a few bytes and return. Sure, it tells me that it's skipped just those few, but it's not clear why.

So my question is: can you make use of InputStream.skip(long) in such a way as that you know when either the stream ends or the skip completes successfully?

Wingard answered 27/12, 2012 at 16:18 Comment(0)
P
13

I don't think we can get a really robust implementation because the skip() method contract is rather bizarre. For one thing, the behaviour at EOF is not well defined. If I want to skip 8 bytes and is.skip(8) returns 0, it's not trivial to decide if I should try again, there is a danger of an infinite loop if some implementation chooses to return 0 at EOF. And available() is not to be trusted, either.

Hence, I propose the following:

/**
 * Skips n bytes. Best effort.
 */
public static void myskip(InputStream is, long n) throws IOException {
    while(n > 0) {
        long n1 = is.skip(n);
        if( n1 > 0 ) {
            n -= n1;
        } else if( n1 == 0 ) { // should we retry? lets read one byte
            if( is.read() == -1)  // EOF
                break;
            else 
                n--;
        } else // negative? this should never happen but...
        throw new IOException("skip() returned a negative value. This should never happen");
    }
}

Shouldn't we return a value to inform the number of bytes "really skipped"? Or a boolean to inform that EOF was reached? We cannot do that in a robust way. For example, if we call skip(8) for a FileInputStream object, it will return 8 even if we are at EOF, or if the file has only 2 bytes. But the method is robust in the sense that it does what we want to: skip n bytes (if possible) and let me continue processing it (if my next read returns -1 I'll know that EOF was reached).

Pencil answered 18/1, 2013 at 14:26 Comment(4)
Your answer concretely details what I have been concerned about. The code I posted seems to work in practice, but I'm not confident that it'd work for all implementations of InputStream. Your extension looks interesting and I'll try it out shortly in the class where I need it. Currently my API tries to report whether the skip succeeded, so I may need to modify client code if no guarantee is possible. Thanks very much.Wingard
You can fix the FileInputStream.skip() issue: use your while loop for n-1 bytes; then, after loop, call in.read() once. If it returns -1, your skip hit EOF, otherwise your skip was successful. Also, don't forget a n==0 check up at the top.Hyalo
@KannanGoundan Interesting suggestion. A drawback, of course, is that it would require at least two readings from the stream (one skip plus one read) which in some scenarios might affect performance.Pencil
This looks more or less the same as Guava's ByteStreams.skipFully method, so it's probably right.Masterpiece
W
2

This seems to be working for skipping n bytes:

long skippedTotal = 0;
while (skippedTotal != n) {
    long skipped = _stream.skip(n - skippedTotal);
    assert(skipped >= 0);
    skippedTotal += skipped;
    if (skipped == 0)
        break;
}
boolean skippedEnough = skippedTotal == n;

However it's not clear that it will work for all implementations of InputStream that could be passed to my library. I'm wondering whether implementing my own buffered skip method is the way to go.

Wingard answered 27/12, 2012 at 16:21 Comment(3)
I don't see how any InputStream implementation can depart from the contract that says they return how many bytes were really skipped.Ellynellynn
@EJP, I agree. I'm concerned with knowing whether fewer bytes were skipped due to some kind of IO artefact (buffering or so) or because the stream ended. If the stream hasn't ended, skip could still return zero. At what point do you know that skipping isn't working due to there being no more bytes versus perhaps that it's waiting for bytes over a network?Wingard
The problem I see with this is that we cannot be sure that we should not retry when skipped == 0. Furthermore, the boolean skippedEnough is not to be trusted. See my answer.Pencil
C
-1

I am 6 years late to this question.

In principal, there is no difference between skip(int n) and readFully(int n). In skip case, you are not interested in the bytes.

For live stream, ie. tcp socket or a file which is appended to, skip(n) could block(wait) once it «skips» 0 bytes depending on user preference to wait.

Getting back EOF or -1 indicates end of stream and that should be returned to the end user since nothing else will happening past that point.

To efficiently skip bytes in the file, i’d explore random io, channel. but that optimization can’t be made generic across any input stream.

Celebes answered 19/9, 2018 at 1:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.