Am I abusing unsafePerformIO?
Asked Answered
P

4

26

To get acquainted with unsafePerformIO (how to use it and when to use it), I've implemented a module for generating unique values.

Here's what I have:

module Unique (newUnique) where

import Data.IORef
import System.IO.Unsafe (unsafePerformIO)

-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
  deriving Show

-- I believe this is the Haskell'98 derived instance, but
-- I want to be explicit, since its Eq instance is the most
-- important part of Unique.
instance Eq Unique where
  (U x) == (U y) = x == y

counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0

updateCounter :: IO ()
updateCounter = do
  x <- readIORef counter
  writeIORef counter (x+1)

readCounter :: IO Integer
readCounter = readIORef counter

newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
                ; writeIORef counter (x+1)
                ; return $ U x }

newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'

To my delight, the package called Data.Unique chose the same datatype as I did; on the other hand, they chose the type newUnique :: IO Unique, but I want to stay out of IO if possible.

Is this implementation dangerous? Could it possibly lead GHC to change the semantics of a program which uses it?

Perishable answered 15/10, 2013 at 0:55 Comment(3)
When in doubt, you are abusing unsafePerformIO.Skittish
By the way, your counter should have a NOINLINE pragma so it isn't evaluated more than once.Wary
In addition to the unsafePerformIO problems, this isn't thread-safe. Consider replacing the readIORef/writeIORef calls with atomicModifyIORef.Domingo
M
65

Treat unsafePerformIO as a promise to the compiler. It says "I promise that you can treat this IO action as if it were a pure value and nothing will go wrong". It's useful because there are times you can build a pure interface to a computation implemented with impure operations, but it's impossible for the compiler to verify when this is the case; instead unsafePerformIO allows you to put your hand on your heart and swear that you have verified that the impure computation is actually pure, so the compiler can simply trust that it is.

In this case that promise is false. If newUnique were a pure function then let x = newUnique () in (x, x) and (newUnique (), newUnique ()) would be equivalent expressions. But you would want these two expressions to have different results; a pair of duplicates of the same Unique value in one case, and a pair of two different Unique values in the other. With your code, there's really no way to say what either expression means. They can only be understood by considering the actual sequence of operations the program will carry out at runtime, and control over that is exactly what you're relinquishing when you use unsafePerformIO. unsafePerformIO says it doesn't matter whether either expression is compiled as one execution of newUnique or two, and any implementation of Haskell is free to choose whatever it likes each and every time it encounters such code.

Marras answered 15/10, 2013 at 6:59 Comment(2)
I like your explanation a lot. I would add that one should think of unsafe* functions as low-level hooks into the compiler, which just happen to be exposed to the user as a haskell function for convenience.Heaton
"unsafePerformIO says it doesn't matter whether either expression is compiled as one execution of newUnique or two" -- more accurately, this is assumed true of all Haskell expressions, and returning a result in the IO monad is a way of ensuring it does not apply. In this case, by explicitly escaping from IO you are removing the assurance provide by IO that calls are always evaluated individually and sequenced in the order you provided them.Reconstitute
E
24

The purpose of unsafePerformIO is when your function does some action internally, but has no side effects that an observer would notice. For example, a function that take a vector, copies it, quicksorts the copy in-place, then returns the copy. (see comments) Each of these operations has side effects, and so is in IO, but the overall result does not.

newUnique must be an IO action because it generates something different every time. This is basically the definition of IO, it means a verb, as opposed to functions which are adjectives. A function will always return the same result for the same arguments. This is called referential transparency.

For valid uses of unsafePerformIO, see this question.

Enwomb answered 15/10, 2013 at 1:1 Comment(4)
For the in-place quick sort example, we have the ST monad, which allows you to do operations with mutable variables in a safe way, without exposing the side effects that may be created to the outside world. unsafePerformIO should not be used for that sort of situation (requiring mutable variables in a local scope).Kepler
Functions aren’t usually adjectives. They are almost always verbs, and higher-order functions like to be adverbs. The question is not what part of speech you have, but of whether the verb is conjugated in the declarative (pure—this is so) or imperative (IO—do this).Adige
A valid use of unsafePerformIO would be implementing a concurrent algorithm which always returns the same results when given the same arguments but for which the actual operations performed depend on what happens in the thread scheduler.Pyrogenous
@Kepler - while I agree the ST monad should be preferred, I think it is worth noting that if you have dependencies on functions that already use mutable values in IO then it is reasonable to perform the proof yourself that ST usually provides automatically, because this is likely a better solution than duplicating code to work in both IO and ST. It is unfortunate that there is no type class that allows you to work with either without requiring unusual additional dependencies, but because there isn't this situation is unfortunately common.Reconstitute
O
21

Yes, your module is dangerous. Consider this example:

module Main where
import Unique

main = do
  print $ newUnique ()
  print $ newUnique ()

Compile and run:

$ ghc Main.hs
$ ./Main
U 0
U 1

Compile with optimization and run:

$ \rm *.{hi,o}
$ ghc -O Main.hs
$ ./Main
U 0
U 0

Uh-oh!

Adding {-# NOINLINE counter #-} and {-# NOINLINE newUnique #-} does not help, so I'm not actually sure what's happening here ...

1st UPDATE

Looking at the GHC core, I see that @LambdaFairy was correct that constant subexpression elimination (CSE) caused my newUnique () expressions to be lifted. However, preventing CSE with -fno-cse and adding {-# NOINLINE counter #-} to Unique.hs is not sufficient to make the optimized program print the same as the unoptimized program! In particular, it seems that counter is inlined even with the NOINLINE pragma in Unique.hs. Does anyone understand why?

I've uploaded the full versions of the following core files at https://gist.github.com/ntc2/6986500.

The (relevant) core for main when compiling with -O:

main3 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()

main2 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main2 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main4 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main4 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 110 0}]
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    case Handle.Text.hPutStr2
           Handle.FD.stdout main4 True eta_B1
    of _ { (# new_s_atQ, _ #) ->
    Handle.Text.hPutStr2
      Handle.FD.stdout main2 True new_s_atQ
    }

Note that the newUnique () calls have been lifted and bound to main3.

And now when compiling with -O -fno-cse:

main3 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main3 = Unique.newUnique ()

main2 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main2 =
  Unique.$w$cshowsPrec 0 main3 ([] @ Char)

main5 :: Unique.Unique
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 20 0}]
main5 = Unique.newUnique ()

main4 :: [Char]
[GblId,
 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=0, Value=False,
         ConLike=False, Cheap=False, Expandable=False,
         Guidance=IF_ARGS [] 40 0}]
main4 =
  Unique.$w$cshowsPrec 0 main5 ([] @ Char)

main1
  :: State# RealWorld
     -> (# State# RealWorld, () #)
[GblId,
 Arity=1,

 Unf=Unf{Src=<vanilla>, TopLvl=True, Arity=1, Value=True,
         ConLike=True, Cheap=True, Expandable=True,
         Guidance=IF_ARGS [0] 110 0}]
main1 =
  \ (eta_B1 :: State# RealWorld) ->
    case Handle.Text.hPutStr2
           Handle.FD.stdout main4 True eta_B1
    of _ { (# new_s_atV, _ #) ->
    Handle.Text.hPutStr2
      Handle.FD.stdout main2 True new_s_atV
    }

Note that main3 and main5 are the two separate newUnique () calls.

However:

rm *.hi *o Main
ghc -O -fno-cse Main.hs && ./Main
U 0
U 0

Looking at the core for this modified Unique.hs:

module Unique (newUnique) where

import Data.IORef
import System.IO.Unsafe (unsafePerformIO)

-- Type to represent a unique thing.
-- Show is derived just for testing purposes.
newtype Unique = U Integer
  deriving Show

{-# NOINLINE counter #-}
counter :: IORef Integer
counter = unsafePerformIO $ newIORef 0

newUnique' :: IO Unique
newUnique' = do { x <- readIORef counter
                ; writeIORef counter (x+1)
                ; return $ U x }

{-# NOINLINE newUnique #-}
newUnique :: () -> Unique
newUnique () = unsafePerformIO newUnique'

it seems that counter is being inlined as counter_rag, despite the NOINLINE pragma (2nd update: wrong! counter_rag is not marked with [InlPrag=NOINLINE], but that doesn't mean it's been inlined; rather, counter_rag is just the munged name of counter); the NOINLINE for newUnique is respected though:

counter_rag :: IORef Type.Integer

counter_rag =
  unsafeDupablePerformIO
    @ (IORef Type.Integer)
    (lvl1_rvg
     `cast` (Sym
               (NTCo:IO <IORef Type.Integer>)
             :: (State# RealWorld
                 -> (# State# RealWorld,
                       IORef Type.Integer #))
                  ~#
                IO (IORef Type.Integer)))

[...]

lvl3_rvi
  :: State# RealWorld
     -> (# State# RealWorld, Unique.Unique #)
[GblId, Arity=1]
lvl3_rvi =
  \ (s_aqi :: State# RealWorld) ->
    case noDuplicate# s_aqi of s'_aqj { __DEFAULT ->
    case counter_rag
         `cast` (NTCo:IORef <Type.Integer>
                 :: IORef Type.Integer
                      ~#
                    STRef RealWorld Type.Integer)
    of _ { STRef var#_au4 ->
    case readMutVar#
           @ RealWorld @ Type.Integer var#_au4 s'_aqj
    of _ { (# new_s_atV, a_atW #) ->
    case writeMutVar#
           @ RealWorld
           @ Type.Integer
           var#_au4
           (Type.plusInteger a_atW lvl2_rvh)
           new_s_atV
    of s2#_auo { __DEFAULT ->
    (# s2#_auo,
       a_atW
       `cast` (Sym (Unique.NTCo:Unique)
               :: Type.Integer ~# Unique.Unique) #)
    }
    }
    }
    }

lvl4_rvj :: Unique.Unique

lvl4_rvj =
  unsafeDupablePerformIO
    @ Unique.Unique
    (lvl3_rvi
     `cast` (Sym (NTCo:IO <Unique.Unique>)
             :: (State# RealWorld
                 -> (# State# RealWorld, Unique.Unique #))
                  ~#
                IO Unique.Unique))

Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique

Unique.newUnique =
  \ (ds_dq8 :: ()) -> case ds_dq8 of _ { () -> lvl4_rvj }

What's going on here?

2nd UPDATE

User @errge figured it out. Looking more carefully that the last core output pasted above, we see that most of the body of Unique.newUnique has been floated to the top level as lvl4_rvj. However, lvl4_rvj is a constant expression, not a function, and so it's only evaluated once, explaining the repeated U 0 output by main.

Indeed:

rm *.hi *o Main
ghc -O -fno-cse -fno-full-laziness Main.hs && ./Main
U 0
U 1

I don't understand exactly what the -ffull-laziness optimization does -- the GHC docs talk about floating let bindings, but the body of lvl4_rvj does not appear to have been a let binding -- but we can at least compare the above core with the core generated with -fno-full-laziness and see that now the body is not lifted:

Unique.newUnique [InlPrag=NOINLINE] :: () -> Unique.Unique

Unique.newUnique =
  \ (ds_drR :: ()) ->
    case ds_drR of _ { () ->
    unsafeDupablePerformIO
      @ Unique.Unique
      ((\ (s_as1 :: State# RealWorld) ->
          case noDuplicate# s_as1 of s'_as2 { __DEFAULT ->
          case counter_rfj
               `cast` (<NTCo:IORef> <Type.Integer>
                       :: IORef Type.Integer
                            ~#
                          STRef RealWorld Type.Integer)
          of _ { STRef var#_avI ->
          case readMutVar#
                 @ RealWorld @ Type.Integer var#_avI s'_as2
          of _ { (# ipv_avz, ipv1_avA #) ->
          case writeMutVar#
                 @ RealWorld
                 @ Type.Integer
                 var#_avI
                 (Type.plusInteger ipv1_avA (__integer 1))
                 ipv_avz
          of s2#_aw2 { __DEFAULT ->
          (# s2#_aw2,
             ipv1_avA
             `cast` (Sym <(Unique.NTCo:Unique)>
                     :: Type.Integer ~# Unique.Unique) #)
          }
          }
          }
          })
       `cast` (Sym <(NTCo:IO <Unique.Unique>)>
               :: (State# RealWorld
                   -> (# State# RealWorld, Unique.Unique #))
                    ~#
                  IO Unique.Unique))
    }

Here counter_rfj corresponds to counter again, and we see the difference is that the body of Unique.newUnique has not been lifted, and so the reference updating (readMutVar, writeMutVar) code will be run each time Unique.newUnique is called.

I've updated the gist to include the new -fno-full-laziness core file. The earlier core files were generated on a different computer, so some minor differences here are unrelated to -fno-full-laziness.

Orientation answered 15/10, 2013 at 2:37 Comment(6)
Constant expression elimination. Your main is optimized to let x = newUnique () in do { print x; print x }, so both calls end up using the same value.Wary
+1 for actually showing an example! I've known this was bad, and I've been able to work through the equational reasoning showing why this was bad, but I don't recall the last time I saw someone compile code abusing unsafePerformIO to get two different answers out of it.Squally
@LambdaFairy Thanks! Do you understand why the -fno-cse + NOINLINE counter is not working?Orientation
@AntalS-Z I guess you've never compared the optimized an non-optimized version of a program that uses Debug.Trace.trace then :D Seriously confusing if you're printf debugging a cabal package and don't realize cabal defaults to optimized builds :POrientation
See my answer below, if you still don't know what's happening here.Amadou
@Amadou Thanks! I've updated my answer with the core corresponding to your resolution of my confusion.Orientation
A
6

See an another example how this fails:

module Main where
import Unique

helper :: Int -> Unique
-- noinline pragma here doesn't matter
helper x = newUnique ()

main = do
  print $ helper 3
  print $ helper 4

With this code the effect is the same as in ntc2's example: correct with -O0, but incorrect with -O. But in this code there is no "common subexpression to eliminate".

What's actually happening here is that the newUnique () expression is "floated out" to the top-level, because it doesn't depend on the function's parameters. In GHC speak this is -ffull-laziness (on by default with -O, can be turned off with -O -fno-full-laziness).

So the code effectively becomes this:

helperworker = newUnique ()
helper x = helperworker

And here helperworker is a thunk that can only be evaluated once.

With the already recommended NOINLINE pragmas if you add -fno-full-laziness to the command line, then it works as expected.

Amadou answered 25/10, 2013 at 10:2 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.