In Haskell, will calling length on a Lazy ByteString force the entire string into memory?
Asked Answered
P

4

9

I am reading a large data stream using lazy bytestrings, and want to know if at least X more bytes is available while parsing it. That is, I want to know if the bytestring is at least X bytes long.

Will calling length on it result in the entire stream getting loaded, hence defeating the purpose of using the lazy bytestring?

If yes, then the followup would be: How to tell if it has at least X bytes without loading the entire stream?

EDIT: Originally I asked in the context of reading files but understand that there are better ways to determine filesize. Te ultimate solution I need however should not depend on the lazy bytestring source.

Prompter answered 25/3, 2010 at 15:54 Comment(0)
D
11

Yes.

length . take x.

Decasyllable answered 25/3, 2010 at 16:9 Comment(5)
Thanks. So the answer is yes, using length will load the entire string into memory?Prompter
Yes, length will force the whole list and "load: it. Thus, the countermeasure in form of "take"Parra
So, to be clear we all understand, yes length will force the string into memory, so if you call take x first, then at least you are only forcing the first x bytes into memory.Contingent
@MtnViewMark: it will probably force more than x bytes, since strictness of lazy bytestrings happens on a per-chunk basis. Forcing the first character of a chunk also forces the rest of that chunk.Fluoroscopy
Sorry for the short answer, I wrote it quickly and didn't have time to come back and expand it. Yes, length will force the entire string, while length . take x will force only the chunks containing the first x bytes, which might be more than x bytes total. The default chunk size is somewhere in the vicinity of 31kB, I think.Decasyllable
C
1

Is there a reason you're not using hFileSize :: Handle -> IO Integer for getting the length of the file?

Cartouche answered 25/3, 2010 at 16:3 Comment(2)
Yes, because the input may not actually be a file, but a network stream. I'm not interested in the actual length of the data, but in whether there is sufficient data to make the contents valid (or truncated in which case the data is corrupt).Prompter
In which case there is no program on Earth that can find the length of the stream without reading it to the end.Reannareap
F
0

EDIT: sorry. I guess I was thinking bytestrings were lists. There is no genericLength for bytestrings.

length is strict because the type it returns Int is strict. You can use genericLength from Data.List and import a library that defines lazy Peano numbers and gives you a Num instance for them, e.g the numbers library:

That would let you express your function in the way you would like, but ephemient's answer is functionally the same, and doesn't require importing a new library.

I just did a blog post on the subject here, if that sounds like an approach you might be interested in:

http://coder.bsimmons.name/blog/2010/03/lazy-arithmetic-in-haskell/

Farnesol answered 25/3, 2010 at 17:35 Comment(2)
ByteString does not come with its genericLength. You need to write one yourself.Commission
Somewhere (probably on Hackage, though I can't find it now) I came across a "chunked" natural number type, along the lines of data Nat = Zero | Sum Integer Nat, which would be even more efficient than the take-based solution, probably.Debt
G
0

It will have to iterate the entire string, but if you don't keep a reference to the entire lazy byte-string anywhere else, I believe it should be able to free the head of the string as it progresses towards its tail.

Genoese answered 29/3, 2010 at 10:48 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.