Byte InputRange from file
Asked Answered
S

2

6

How to construct easily a raw byte-by-byte InputRange/ForwardRange/RandomAccessRange from a file?

Stockbroker answered 16/5, 2015 at 10:30 Comment(0)
D
13
file.byChunk(4096).joiner

This reads a file in 4096-byte chunks and lazily joins the chunks together into a single ubyte input range.

joiner is from std.algorithm, so you'll have to import it first.

Dukey answered 16/5, 2015 at 17:53 Comment(3)
I wish I could upvote this five times, that's super useful and I did not know that!Downes
@AdamD.Ruppe sounds like a good topic for this week's This Week in D!Baedeker
Imagine this wrapped into a future, and you just get an event when the data is loaded... Good stuff.Febrifuge
D
7

The easiest way to make a raw byte range from a file is to just read it all right into memory:

import std.file;
auto data = cast(ubyte[]) read("filename");
// data is a full-featured random access range of the contents

If the file is too large for that to be reasonable, you could try a memory-mapped file http://dlang.org/phobos/std_mmfile.html and use the opSlice to get an array off it. Since it is an array, you get full range features, but since it is memory mapped by the operating system, you get lazy reading as you touch the file.

For a simple InputRange, there's LockingTextReader (undocumented) in Phobos, or you could construct one yourself over byChunk or even fgetc, the C function. fgetc would be the easiest to write:

struct FileByByte {
    ubyte front;
    void popFront() { front = cast(ubyte) fgetc(fp); }
    bool empty() { return feof(fp); }
    FILE* fp;
    this(FILE* fp) { this.fp = fp; popFront(); /* prime it */ }
}

I haven't actually tested that but i'm pretty sure it'd work. (BTW the file open and close is separate from this because ranges are supposed to be just views into data, not managed containers. You wouldn't want the file closed just because you passed this range into a function.)

This is not a forward nor random access range though. Those are trickier to do on streams without a lot of buffering code and I think that'd be a mistake to try to write - generally, ranges should be cheap, not emulating features the underlying container doesn't natively support.

EDIT: The other answer has a non-buffering way! https://mcmap.net/q/1597333/-byte-inputrange-from-file That's awesome.

Downes answered 16/5, 2015 at 12:57 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.