Convert a Lazy ByteString to a strict ByteString
Asked Answered
M

5

23

I have a function that takes a lazy ByteString, that I wish to have return lists of strict ByteStrings (the laziness should be transferred to the list type of the output).

import qualified Data.ByteString as B
import qualified Data.ByteString.Lazy as L
csVals :: L.ByteString -> [B.ByteString]

I want to do this for various reasons, several lexing functions require strict ByteStrings, and I can guarantee the outputted strict ByteStrings in the output of csVals above are very small.

How do I go about "strictifying" ByteStrings without chunking them?

Update0

I want to take a Lazy ByteString, and make one strict ByteString containing all its data.

Marniemaro answered 19/10, 2011 at 0:50 Comment(8)
What is your problem with toChunks? From the initial glimpse it looks like it preserves laziness.Sentinel
@Matt Joiner:Maybe you should write a lexing yourself, or force eval the results using DeepSeq.Hornsby
@Matt Joiner: there is a Lazy version: 'Data.ByteString.Lex.Lazy.Double' in the same package.Hornsby
@Matt Joiner: so you want chunks of specified size? Possibly repeated calls to splitAt? Note that toChunks generates strict ByteStrings are of maximum size (except for possibly the last one).Gneiss
@MikhailGlushenkov: toChunks returns a list of strict ByteStrings. I want them all in one.Marniemaro
@WuXingbo: I have switched to the Lazy readDouble for now, thanks. My question still stands however.Marniemaro
There's a misunderstanding here -- a lazy bytestring is just a list of chunks (i.e. strict bytestrings), essentially. toChunks exposes that structure. To put the list all in one strict bytestring, there's no other way than concat . toChunks (or the equiv). In many typical cases, the list will have a single element -- in those cases concat . toChunks will be relatively efficient as well.Benignity
@sclv: What you describe is what I'm after.Marniemaro
D
18

Like @sclv said in the comments above, a lazy bytestring is just a list of strict bytestrings. There are two approaches to converting lazy ByteString to strict (source: haskell mailing list discussion about adding toStrict function) - relevant code from the email thread below:

First, relevant libraries:

import qualified Data.ByteString               as B
import qualified Data.ByteString.Internal      as BI
import qualified Data.ByteString.Lazy          as BL
import qualified Data.ByteString.Lazy.Internal as BLI
import           Foreign.ForeignPtr
import           Foreign.Ptr

Approach 1 (same as @sclv):

toStrict1 :: BL.ByteString -> B.ByteString
toStrict1 = B.concat . BL.toChunks

Approach 2:

toStrict2 :: BL.ByteString -> B.ByteString
toStrict2 BLI.Empty = B.empty
toStrict2 (BLI.Chunk c BLI.Empty) = c
toStrict2 lb = BI.unsafeCreate len $ go lb
  where
    len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb

    go  BLI.Empty                   _   = return ()
    go (BLI.Chunk (BI.PS fp s l) r) ptr =
        withForeignPtr fp $ \p -> do
            BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
            go r (ptr `plusPtr` l)

If performance is a concern, I recommend checking out the email thread above. It has criterion benchmark as well. toStrict2 is faster than toStrict1 in those benchmarks.

Dilettante answered 18/12, 2011 at 15:27 Comment(0)
E
39

The bytestring package now exports a toStrict function:

http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Lazy.html#v:toStrict

This might not be exactly what you want, but it certainly answers the question in the title of this post :)

Enugu answered 29/11, 2012 at 18:16 Comment(1)
any clue in which version this was added though? It seems it's not there in the Haskell platform 2012.4 (including ghc 7.4)?Walden
D
18

Like @sclv said in the comments above, a lazy bytestring is just a list of strict bytestrings. There are two approaches to converting lazy ByteString to strict (source: haskell mailing list discussion about adding toStrict function) - relevant code from the email thread below:

First, relevant libraries:

import qualified Data.ByteString               as B
import qualified Data.ByteString.Internal      as BI
import qualified Data.ByteString.Lazy          as BL
import qualified Data.ByteString.Lazy.Internal as BLI
import           Foreign.ForeignPtr
import           Foreign.Ptr

Approach 1 (same as @sclv):

toStrict1 :: BL.ByteString -> B.ByteString
toStrict1 = B.concat . BL.toChunks

Approach 2:

toStrict2 :: BL.ByteString -> B.ByteString
toStrict2 BLI.Empty = B.empty
toStrict2 (BLI.Chunk c BLI.Empty) = c
toStrict2 lb = BI.unsafeCreate len $ go lb
  where
    len = BLI.foldlChunks (\l sb -> l + B.length sb) 0 lb

    go  BLI.Empty                   _   = return ()
    go (BLI.Chunk (BI.PS fp s l) r) ptr =
        withForeignPtr fp $ \p -> do
            BI.memcpy ptr (p `plusPtr` s) (fromIntegral l)
            go r (ptr `plusPtr` l)

If performance is a concern, I recommend checking out the email thread above. It has criterion benchmark as well. toStrict2 is faster than toStrict1 in those benchmarks.

Dilettante answered 18/12, 2011 at 15:27 Comment(0)
G
5

If the lazy ByteString in question is <= the maximum size of a strict ByteString:

toStrict = fromMaybe SB.empty . listToMaybe . toChunks

toChunks makes each chunk be as large as possible (except for possibly the last one).

If the size of you lazy ByteString is larger than what a strict ByteString can be, then this isn't possible: that's exactly what lazy ByteStrings are for.

Gneiss answered 19/10, 2011 at 6:30 Comment(0)
N
2

Data.ByteString.Lazy.Char8 now has toStrict and fromStrict functions.

Niigata answered 9/10, 2017 at 22:22 Comment(1)
This appears to be essentially a duplicate of ocharles's answer.Brigidbrigida
C
1

You can also use blaze-builder to build strict ByteString from lazy

toStrict :: BL.ByteString -> BS.ByteString
toStrict = toByteString . fromLazyByteString

It must be effective.

Chirlin answered 6/4, 2013 at 16:54 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.