Vector creation safety

This question is actually a small lattice of very closely related questions; I don't think it makes much sense to break it up as yet.

One of the fundamental ways to create a Vector is using unsafeFreeze. As its name suggests, unsafeFreeze is truly unsafe. In particular, nothing prevents the MVector passed to unsafeFreeze from being modified after it is frozen. This leads to two different problems:

It can make "immutable" vectors change in value. This is just the sort of spooky action at a distance Haskell generally shuns.
Modifying a frozen vector can (at least potentially) confuse the garbage collector. There is no documented guarantee that the garbage collector will scan frozen arrays to ensure their contents are evacuated. More generally, mutating vectors while they are frozen is absolutely forbidden, and the result of doing so is utterly unspecified.

The vector package[1] offers two efficient, seemingly safe, primitives for creating immutable vectors: create and createT:

create :: (forall s. ST s (MVector s a)) -> Vector a
createT :: Traversable t => (forall s. ST s (t (MVector s a))) -> t (Vector a)

Ignoring vector fusion business, the basic implementations look like

create m = runST $ m >>= unsafeFreeze
createT m = runST $ m >>= traverse unsafeFreeze

create is pretty clearly safe. It runs the given ST s action, which must create a fresh MVector s (the type of runST ensures it can't use an existing one, and also ensures that fixST can't play any funny tricks), freezes it, and returns the frozen vector.

createT is pretty clearly safe when the Traversable instance is lawful. With lists, for example, createT m runs an action producing a list of MVectors, then freezes them all. Parametricity in s then seems sufficient, as for create, to ensure nothing bad happens. Note that the action may create a list with multiple copies of the same MVector. These will be frozen twice, but there shouldn't be any harm in that. Lawful Traversable instances all look pretty much like decorated lists, so they should behave similarly. Now I finally reach my first question:

Is createT safe when used with an unlawful Traversable instance?

Unlawfully dropping, duplicating or rearranging some elements or changing decorations (violating the identity law) doesn't pose any obvious difficulty. Parametricity prevents any interesting violations of the naturality law, so that's out. I haven't been able to find a way to cause trouble by violating the composition law or totality, but that is no guarantee that there isn't one.

One obvious way to generalize createT is to allow the user to pass their own traversal function:

createTOf
  :: (forall f x y. Applicative f => (x -> f y) -> t x -> f (u y))
  -> (forall s. ST s (t (MVector s a))) -> u (Vector a)
createTOf trav m = runST $ m >>= trav unsafeFreeze

Note that I've allowed the traversal to change the container type from t to u. This allows the user to, for example, produce a Vector (MVector s a) but get back a [Vector a]. When t ~ u, this is obviously just as safe as createT with an unlawful Traversable instance. Does the extra flexibility of changing the "container" type reduce safety? Edit: I just realized I can answer this one: no, it makes no difference. See [2] below for an explanation.

When we use createT, we might not actually want a container of vectors; perhaps we want to traverse that container to get something else. We can write something like

traverse f <$> createT m

The extra type flexibility of createTOf means we don't necessarily have a Traversable on our hands, and can't necessarily do this. But using the composition law for Traversable, we can integrate this traversal into the creation function:

createTOfThen
  :: Applicative g
  => (forall f x y. Applicative f => (x -> f y) -> t x -> f (u y))
  -> (Vector a -> g b)
  -> (forall s. ST s (t (MVector s a)))
  -> g (u b)
createTOfThen trav f m =
  runST $ m >>= getCompose . trav (Compose . fmap f . unsafeFreeze)

Is createTOfThen safe if trav is not a lawful traversal?

I did say I'd be talking about a lattice, right? The next question is how much (if at all) we can weaken the polymorphism of the traversal without causing trouble. Everything will typecheck even if the traversal is only required to be polymorphic in s, but that's obviously unsafe, as it can interleave freezing with modification however it likes. Revealing to the that the end result holds Vector values seems likely to be harmless enough, but we surely can't let the traversal know both that it is operating in ST s and that it is handling MVector s a values. But could we let it know one of those facts? Fixing the Applicative would certainly be useful:

createTOf'
  :: (forall s x y. (x -> ST s y) -> t x -> ST s (u y))
  -> (forall s. ST s (t (MVector s a))) -> u (Vector a)

createTOfThen'
  :: Applicative g
  => (forall s x y. (x -> Compose (ST s) g y) -> t x -> Compose (ST s) g (u y))
  -> (Vector a -> g b)
  -> (forall s. ST s (t (MVector s a)))
  -> g (u b)

This would offer more efficient creation of things like vectors of vectors, as vectors can be traversed more efficiently in ST than in arbitrary Applicative functors. It would also reduce dependence on inlining, as we'd avoid dealing with the Applicative dictionary.

On the flip side, I suspect we can let the traversal know it's handling MVectors ... as long as we don't let it know what state thread they're associated with. This is sufficient to unbox them, and also (perhaps unfortunately) to get their sizes.

Edit! If the traversal is allowed to know that it's producing Vectors (which seems like the very least likely thing to be problematic), then createTOfThen can be implemented in terms of createTOf:

createTOfThen trav post m = getConst $
  createTOf (\f ->
    fmap Const . getCompose . (trav (Compose . fmap post . f))) m

Taking the lattice in a third direction, let's move on to rank-2 traversals. The rank2classes package offers its own Traversable class, which I'll call R2.Traversable:

class (R2.Functor g, R2.Foldable g) => R2.Traversable g where
  R2.traverse :: Applicative m
              => (forall a. p a -> m (q a))
              -> g p -> m (g q)

We can play exactly the same games with this to produce heterogeneous containers of Vectors:

createTHet
  :: R2.Traversable t
  => (forall s. ST s (t (MVector s)))
  -> t Vector
createTHet m = runST $ m >>= R2.traverse unsafeFreeze

createTHetOf
  :: (forall h f g.
       (Applicative h => (forall x. f x -> h (g x)) -> t f -> h (u g)))
  -> (forall s. ST s (t (MVector s)))
  -> u Vector
createTHetOf trav m = runST $ m >>= trav unsafeFreeze

createTHetOfThen
  :: Applicative q
  => (forall h f g.
       (Applicative h => (forall x. f x -> h (g x)) -> t f -> h (u g)))
  -> (forall x. Vector x -> q (r x))
  -> (forall s. ST s (t (MVector s)))
  -> q (u r)
createTHetOfThen trav post m =
    runST $ m >>= getCompose . trav (Compose . fmap post . unsafeFreeze)

along with similar versions where the traversal is allowed to know it's working in ST s. I would imagine that the safety properties of the rank-2 versions are identical to those of the corresponding rank-1 versions, but I don't have a clue how one might go about proving such.

Just for fun, I think the top of my lattice is the following monstrosity. If any of these ideas are unsafe, this one probably is:

createTHetOfThen'
  :: (forall s1 s2.
       ((forall x. MVector s2 x -> Compose (ST s1) q (r x)) -> t (MVector s2) -> Compose (ST s1) q (u r)))
  -> (forall x. Vector x -> q (r x))
  -> (forall s. ST s (t (MVector s)))
  -> q (u r)
createTHetOfThen' trav post m =
    runST $ m >>= getCompose . trav (Compose . fmap post . unsafeFreeze)

[1] I've linked to Stackage because Hackage is down today. If I remember and have time, I'll fix up the links later.

[2] The proof comes from Data.Functor.Sum. Given a non-type-changing createTOfSame, we can write

createTOf
  :: (forall f a b. Applicative f => (a -> f b) -> t a -> f (u b))
  -> (forall s. ST s (t (MVector s a)))
  -> u (Vector a)
createTOf trav m =
  case createTOfSame (\f (InL xs) -> InR <$> trav f xs)
                     (InL <$> m) of
    InR u -> u

This will actually be total, although the "traversal" is partial: we always case match on what we will surely find.

Recommended topics

Hot tags