Why `Vector.length (Vector.replicate n 0)" is not fused?
Asked Answered
M

2

9

The following code unexpectedly (at least for me) produces an intermediate vector:

import qualified Data.Vector as Vector

main :: IO ()
main =
  print (test n)

n :: Int
n = 1000000

test :: Int -> Int
test n = Vector.length (Vector.replicate n (0 :: Int))

The relevant part of Core is here (note the newArray# 1000000 call):

Main.main4
  :: forall s_a38t.
     GHC.Prim.State# s_a38t
     -> (# GHC.Prim.State# s_a38t, Vector.Vector Int #)
[GblId,
 Arity=1,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True,
         WorkFree=True, Expandable=True, Guidance=IF_ARGS [0] 399 30}]
Main.main4 =
  \ (@ s_a38t) (s1_a38u [OS=OneShot] :: GHC.Prim.State# s_a38t) ->
    case GHC.Prim.newArray#
           @ Int
           @ (Control.Monad.Primitive.PrimState (GHC.ST.ST s_a38t))
           1000000
           (Data.Vector.Mutable.uninitialised @ Int)
           (s1_a38u
            `cast` ((GHC.Prim.State#
                       (Sym (Control.Monad.Primitive.TFCo:R:PrimStateST[0] <s_a38t>_N)))_R
                    :: GHC.Prim.State# s_a38t
                       ~R# GHC.Prim.State#
                             (Control.Monad.Primitive.PrimState (GHC.ST.ST s_a38t))))
    of _ [Occ=Dead] { (# ipv_a5RG, ipv1_a5RH #) ->
    letrec {
      $wa_s609 [InlPrag=[0], Occ=LoopBreaker]
        :: GHC.Types.SPEC
           -> GHC.Prim.Int#
           -> Bool
           -> GHC.Prim.State# s_a38t
           -> (# GHC.Prim.State# s_a38t, Int #)
      [LclId, Arity=4, Str=DmdType <S,1*U><L,U><S,1*U><L,U>]
      $wa_s609 =
...

At the same time if I replace length with sum, fusion occurs correctly:

test n = Vector.sum (Vector.replicate n (0 :: Int))

Core:

Rec {
Main.main_$s$wfoldlM'_loop [Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><L,U>]
Main.main_$s$wfoldlM'_loop =
  \ (sc_s6bx :: GHC.Prim.Int#) (sc1_s6by :: GHC.Prim.Int#) ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# sc1_s6by 0)
    of _ [Occ=Dead] {
      False ->
        Main.main_$s$wfoldlM'_loop sc_s6bx (GHC.Prim.-# sc1_s6by 1);
      True -> sc_s6bx
    }
end Rec }

Main.main2 :: String
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 100 30}]
Main.main2 =
  case Main.main_$s$wfoldlM'_loop 0 1000000 of ww_s67W { __DEFAULT ->
  case GHC.Show.$wshowSignedInt 0 ww_s67W (GHC.Types.[] @ Char)
  of _ [Occ=Dead] { (# ww5_a5Vq, ww6_a5Vr #) ->
  GHC.Types.: @ Char ww5_a5Vq ww6_a5Vr
  }
  }

Also, if I rewrite the original function in terms of monadic stream combinators, the intermediate vector is not allocated also:

import qualified Data.Vector.Fusion.Stream.Monadic as Stream
import Data.Functor.Identity

test n = runIdentity $ Stream.length (Stream.replicate n (0 :: Int))

Core:

Rec {
Main.main_$s$wfoldlM'_loop [Occ=LoopBreaker]
  :: GHC.Prim.Int# -> GHC.Prim.Int# -> GHC.Prim.Int#
[GblId, Arity=2, Caf=NoCafRefs, Str=DmdType <L,U><L,U>]
Main.main_$s$wfoldlM'_loop =
  \ (sc_s5lE :: GHC.Prim.Int#) (sc1_s5lF :: GHC.Prim.Int#) ->
    case GHC.Prim.tagToEnum# @ Bool (GHC.Prim.<=# sc1_s5lF 0)
    of _ [Occ=Dead] {
      False ->
        Main.main_$s$wfoldlM'_loop
          (GHC.Prim.+# sc_s5lE 1) (GHC.Prim.-# sc1_s5lF 1);
      True -> sc_s5lE
    }
end Rec }

Main.main2 :: String
[GblId,
 Str=DmdType,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Value=False, ConLike=False,
         WorkFree=False, Expandable=False, Guidance=IF_ARGS [] 100 30}]
Main.main2 =
  case Main.main_$s$wfoldlM'_loop 0 1000000 of ww_s5ke { __DEFAULT ->
  case GHC.Show.$wshowSignedInt 0 ww_s5ke (GHC.Types.[] @ Char)
  of _ [Occ=Dead] { (# ww5_a5gi, ww6_a5gj #) ->
  GHC.Types.: @ Char ww5_a5gi ww6_a5gj
  }
  }

Why Vector.length breaks fusion?

I'm using ghc-7.10.3 and vector-0.11.0.0.

ADDED: Here is an issue: https://github.com/haskell/vector/issues/111

Mathre answered 18/3, 2016 at 23:28 Comment(6)
Perhaps, that's because in practical applications it's fairly uncommon to build a vector only to compute its length, discarding all the elements in the process, and no one coded a fusion rule for this case.Redcap
@Redcap I thought Vector.length is implemented in terms of Stream.length, so fusion should be automatically available. But I know little about stream fusion internals. Anyway "it is uncommon use case" is a valid answer too.Mathre
BTW, I know nothing about how fusion is actually implemented. I'm guessing it's possible to overlook that case. It's also possible that there are some technical limitations I can't see right now. For a fusing variant, sum . map 1 could be used to avoid allocating a vector (maybe?).Redcap
To avoid confusing future visitors: @chi's expression was supposed to be sum . (1 <$).Lambard
@Lambard Yes, thanks. Missed a const there, but (1 <$) is even better.Redcap
This looks like a bug, I'd open a ticket at ghc.haskell.org/trac/ghc (I can't find an existing one).Enteron
K
4

I used sum and length from Data.Vector.Generic rather than Data.Vector since the latter are just defined as the former.

Here's the code for length (from Data.Vector.Generic) ...

-- | /O(1)/ Yield the length of the vector.
length :: Vector v a => v a -> Int
{-# INLINE length #-}
length = Bundle.length . stream

Hmm.. so let's look at "sum"

-- | /O(n)/ Compute the sum of the elements
sum :: (Vector v a, Num a) => v a -> a
{-# INLINE sum #-}
sum = Bundle.foldl' (+) 0 . stream

But if I run ghc -ddump-inlinings -ddump-rule-firings -O2 with sum I see

Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]
Inlining done: System.IO.print
Inlining done: System.IO.print1
Inlining done: Data.Vector.Generic.sum
Rule fired: Class op +
Rule fired: Class op fromInteger
Inlining done: GHC.Num.$fNumInt_$cfromInteger
Rule fired: integerToInt
Inlining done: Data.Vector.Fusion.Util.unId
Inlining done: Data.Vector.Fusion.Util.unId1
Inlining done: Data.Vector.replicate
Inlining done: Data.Vector.Generic.replicate

And if I run it with length I see:

Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]
Inlining done: System.IO.print
Inlining done: System.IO.print1
Inlining done: Data.Vector.replicate
Inlining done: Data.Vector.Generic.replicate
Rule fired: SPEC Data.Vector.$fVectorVectora [GHC.Types.Int]

So sum gets inlined and length doesn't, and I don't understand why. And even turning up the unfolding threshold to ridiculous amounts doesn't change that.

That said, if I manually replace Vector.length with Bundle.length . Vector.stream, the stream/unstream rule does fire, as in the sum case, and a very tidy core is generated with no array allocations.

Kalsomine answered 19/3, 2016 at 2:29 Comment(1)
So it is inlining failure, not stream fusing itself. I always thought that relying on inliner is a bit vague. Thanks for the answer!Mathre
A
2

This is an extension of sclv's answer.

I noticed the behavior in the question occurs with vector-0.11.0.0, but not the other version I happened to have installed, vector-0.10.12.2. Inspecting the Data/Vector/Generic.hi files from those two versions with ghc --show-iface, I found that in version 0.11.0.0 only, length (but not sum) is marked as a "loop-breaker". This means length is part of a mutually recursive group of definitions and GHC has chosen this function to not inline in order to avoid the possibility of an infinite expansion.

I assume what happened is that the changes in 0.11.0.0 made length part of a cycle of definitions, probably unintentionally, where it wasn't before, but I didn't try to verify that since it would require actually reading vector source code.

Amidships answered 19/3, 2016 at 14:55 Comment(2)
Then it is a regression, I'll open an issue. BTW --show-iface trick is nice!Mathre
The fact that only master wizards with good Emacs configurations can read vector source code seems a rather unfortunate effect of the "fusion bundle" strategy.Lambard

© 2022 - 2024 — McMap. All rights reserved.