Is there ever a good reason to use unsafePerformIO?
Asked Answered
A

6

36

The question says it all. More specifically, I am writing bindings to a C library, and I'm wondering what c functions I can use unsafePerformIO with. I assume using unsafePerformIO with anything involving pointers is a big no-no.

It would be great to see other cases where it is acceptable to use unsafePerformIO too.

Afterbrain answered 10/5, 2012 at 7:27 Comment(0)
T
26

In the specific case of the FFI, unsafePerformIO is meant to be used for calling things that are mathematical functions, i.e. the output depends solely on the input parameters, and every time the function is called with the same inputs, it will return the same output. Also, the function shouldn't have side effects, such as modifying data on disk, or mutating memory.

Most functions from <math.h> could be called with unsafePerformIO, for example.

You're correct that unsafePerformIO and pointers don't usually mix. For example, suppose you have

p_sin(double *p) { return sin(*p); }

Even though you're just reading a value from a pointer, it's not safe to use unsafePerformIO. If you wrap p_sin, multiple calls can use the pointer argument, but get different results. It's necessary to keep the function in IO to ensure that it's sequenced properly in relation to pointer updates.

This example should make clear one reason why this is unsafe:

# file export.c

#include <math.h>
double p_sin(double *p) { return sin(*p); }

# file main.hs
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.Ptr
import Foreign.Marshal.Alloc
import Foreign.Storable

foreign import ccall "p_sin"
  p_sin :: Ptr Double -> Double

foreign import ccall "p_sin"
  safeSin :: Ptr Double -> IO Double

main :: IO ()
main = do
  p <- malloc
  let sin1  = p_sin p
      sin2  = safeSin p
  poke p 0
  putStrLn $ "unsafe: " ++ show sin1
  sin2 >>= \x -> putStrLn $ "safe: " ++ show x

  poke p 1
  putStrLn $ "unsafe: " ++ show sin1
  sin2 >>= \x -> putStrLn $ "safe: " ++ show x

When compiled, this program outputs

$ ./main 
unsafe: 0.0
safe: 0.0
unsafe: 0.0
safe: 0.8414709848078965

Even though the value referenced by the pointer has changed between the two references to "sin1", the expression isn't re-evaluated, leading to stale data being used. Since safeSin (and hence sin2) is in IO, the program is forced to re-evaluate the expression, so the updated pointer data is used instead.

Topfull answered 10/5, 2012 at 9:15 Comment(7)
Could you elaborate on why it's unsafe to use unsafePerformIO in your example with pointers?Afterbrain
@VladtheImpala - I've edited to hopefully address this more clearly.Topfull
unsafePerformIO isn't mentioned explicitly in your answer. What if I wrap the sequence of actions poke p z >> safeSin p in unsafePerformIO like this: mySin z = unsafePerformIO (poke p z >> safeSin p) and then use mySin as a normal function, I should be OK, right?Umiak
@imz--IvanZakharyaschev you're correct, the unsafePerformIO is implicitly added due to the type signature of p_sin (not sure if GHC still allows that). Your approach is ok, although it isn't thread-safe if multiple threads are using the same pointer. You could stuff that action into an alloca though, like unsafePerformIO $ alloca ..., which should be completely safe.Topfull
@JohnL Thanks for the comment! (Now I understand this implicit effect of the type signature.) Am I right thinking that the benefit of using alloca is that I get a unique pointer no one else knows, so I must not fear that someone else writes to it concurrently? Or there are some other kinds thread-safety guarantees given by alloca (like blocking other threads, doing things atomically, etc.)?Umiak
@JohnL ... Anyway, in the use case I was thinking about and which caused me to dig information on this topic, the allocation is done in C, then I peekCString and free it immediately; so I was thinking about wrapping these two IO-actions into unsafePerformIO and getting rid of IO-type for values which semantically are "pure".Umiak
@imz--IvanZakharyaschev: unfortunately there are ways to smuggle the pointer outside of alloca, for example the function could write it to an IORef. This is always a bad idea though, so I don't typically consider such nefarious practices. The main advantage of alloca is that the memory will be automatically freed, plus it's convenient to use. For your use case of creating a String, unsafePerformIO should be safe provided that the call into C is otherwise pure.Topfull
M
27

No need to involve C here. The unsafePerformIO function can be used in any situation where,

  1. You know that its use is safe, and

  2. You are unable to prove its safety using the Haskell type system.

For instance, you can make a memoize function using unsafePerformIO:

memoize :: Ord a => (a -> b) -> a -> b
memoize f = unsafePerformIO $ do
    memo <- newMVar $ Map.empty
    return $ \x -> unsafePerformIO $ modifyMVar memo $ \memov ->
        return $ case Map.lookup x memov of
            Just y -> (memov, y)
            Nothing -> let y = f x
                       in (Map.insert x y memov, y)

(This is off the top of my head, so I have no idea if there are flagrant errors in the code.)

The memoize function uses and modifies a memoization dictionary, but since the function as a whole is safe, you can give it a pure type (with no use of the IO monad). However, you have to use unsafePerformIO to do that.

Footnote: When it comes to the FFI, you are responsible for providing the types of the C functions to the Haskell system. You can achieve the effect of unsafePerformIO by simply omitting IO from the type. The FFI system is inherently unsafe, so using unsafePerformIO doesn't make much of a difference.

Footnote 2: There are often really subtle bugs in code that uses unsafePerformIO, the example is just a sketch of a possible use. In particular, unsafePerformIO can interact poorly with the optimizer.

Mikkimiko answered 10/5, 2012 at 7:49 Comment(5)
See also: Data.MemoUglyLoriannlorianna
I think this answer would be a lot stronger if you were to specify exactly how the programmer knows that unsafePerformIO is safe.Topfull
@JohnL: If we had an exact answer to that question, we'd be able to make a perfect type system. The problem in general is intractable.Mikkimiko
@DietrichEpp: Sure, but a programmer can know something about a piece of code that cannot be proved in general about all pieces of code; for example, I can know whether a specific program halts, even if a given system can not prove that I am right. Your answer doesn't give any definition of "safe", or any information about what it means to be "safe", that a programmer might use in making this determination.Gatekeeper
@ruakh: That is exactly my point: if a programmer knows that a specific program is correct, even though its correctness cannot by proven by a given system (the Haskell type system), then you might use unsafePerformIO. A treatment of what "safe" means is beyond the scope of this answer, if you would like to know more about the foundations of type safety, feel free to ask a question.Mikkimiko
T
26

In the specific case of the FFI, unsafePerformIO is meant to be used for calling things that are mathematical functions, i.e. the output depends solely on the input parameters, and every time the function is called with the same inputs, it will return the same output. Also, the function shouldn't have side effects, such as modifying data on disk, or mutating memory.

Most functions from <math.h> could be called with unsafePerformIO, for example.

You're correct that unsafePerformIO and pointers don't usually mix. For example, suppose you have

p_sin(double *p) { return sin(*p); }

Even though you're just reading a value from a pointer, it's not safe to use unsafePerformIO. If you wrap p_sin, multiple calls can use the pointer argument, but get different results. It's necessary to keep the function in IO to ensure that it's sequenced properly in relation to pointer updates.

This example should make clear one reason why this is unsafe:

# file export.c

#include <math.h>
double p_sin(double *p) { return sin(*p); }

# file main.hs
{-# LANGUAGE ForeignFunctionInterface #-}

import Foreign.Ptr
import Foreign.Marshal.Alloc
import Foreign.Storable

foreign import ccall "p_sin"
  p_sin :: Ptr Double -> Double

foreign import ccall "p_sin"
  safeSin :: Ptr Double -> IO Double

main :: IO ()
main = do
  p <- malloc
  let sin1  = p_sin p
      sin2  = safeSin p
  poke p 0
  putStrLn $ "unsafe: " ++ show sin1
  sin2 >>= \x -> putStrLn $ "safe: " ++ show x

  poke p 1
  putStrLn $ "unsafe: " ++ show sin1
  sin2 >>= \x -> putStrLn $ "safe: " ++ show x

When compiled, this program outputs

$ ./main 
unsafe: 0.0
safe: 0.0
unsafe: 0.0
safe: 0.8414709848078965

Even though the value referenced by the pointer has changed between the two references to "sin1", the expression isn't re-evaluated, leading to stale data being used. Since safeSin (and hence sin2) is in IO, the program is forced to re-evaluate the expression, so the updated pointer data is used instead.

Topfull answered 10/5, 2012 at 9:15 Comment(7)
Could you elaborate on why it's unsafe to use unsafePerformIO in your example with pointers?Afterbrain
@VladtheImpala - I've edited to hopefully address this more clearly.Topfull
unsafePerformIO isn't mentioned explicitly in your answer. What if I wrap the sequence of actions poke p z >> safeSin p in unsafePerformIO like this: mySin z = unsafePerformIO (poke p z >> safeSin p) and then use mySin as a normal function, I should be OK, right?Umiak
@imz--IvanZakharyaschev you're correct, the unsafePerformIO is implicitly added due to the type signature of p_sin (not sure if GHC still allows that). Your approach is ok, although it isn't thread-safe if multiple threads are using the same pointer. You could stuff that action into an alloca though, like unsafePerformIO $ alloca ..., which should be completely safe.Topfull
@JohnL Thanks for the comment! (Now I understand this implicit effect of the type signature.) Am I right thinking that the benefit of using alloca is that I get a unique pointer no one else knows, so I must not fear that someone else writes to it concurrently? Or there are some other kinds thread-safety guarantees given by alloca (like blocking other threads, doing things atomically, etc.)?Umiak
@JohnL ... Anyway, in the use case I was thinking about and which caused me to dig information on this topic, the allocation is done in C, then I peekCString and free it immediately; so I was thinking about wrapping these two IO-actions into unsafePerformIO and getting rid of IO-type for values which semantically are "pure".Umiak
@imz--IvanZakharyaschev: unfortunately there are ways to smuggle the pointer outside of alloca, for example the function could write it to an IORef. This is always a bad idea though, so I don't typically consider such nefarious practices. The main advantage of alloca is that the memory will be automatically freed, plus it's convenient to use. For your use case of creating a String, unsafePerformIO should be safe provided that the call into C is otherwise pure.Topfull
E
16

Obviously if it should never be used, it wouldn't be in the standard libraries. ;-)

There are a number of reasons why you might use it. Examples include:

  • Initialising global mutable state. (Whether you should ever have such a thing in the first place is a whole other discussion...)

  • Lazy I/O is implemented using this trick. (Again, whether lazy I/O is a good idea in the first place is debatable.)

  • The trace function uses it. (Yet again, it turns out trace is rather less useful than you might imagine.)

  • Perhaps most significantly, you can use it to implement data structures which are referentially transparent, but internally implemented using impure code. Often the ST monad will let you do that, but sometimes you need a little unsafePerformIO.

Lazy I/O can be seen as a special-case of the last point. So can memoisation.

Consider, for example, an "immutable", growable array. Internally you could implement that as a pure "handle" that points to a mutable array. The handle holds the user-visible size of the array, but the actual underlying mutable array is larger than that. When the user "appends" to the array, a new handle is returned, with a new, larger size, but the append is performed by mutating the underlying mutable array.

You can't do this with the ST monad. (Or rather, you can, but it still requires unsafePerformIO.)

Note that it's damned tricky to get this sort of thing right. And the type checker won't catch if it you're wrong. (That's what unsafePerformIO does; it makes the type checker not check that you're doing it correctly!) For example, if you append to an "old" handle, the correct thing to do would be to copy the underlying mutable array. Forget this, and your code will behave very strangely.

Now, to answer your real question: There's no particular reason why "anything without pointers" should be a no-no for unsafePerformIO. When asking whether to use this function or not, the only question of significance is this: Can the end-user observe any side-effects from doing this?

If the only thing it does is create some buffer somewhere that the user can't "see" from pure code, that's fine. If it writes to a file on disk... not so fine.

HTH.

Edraedrea answered 10/5, 2012 at 8:48 Comment(7)
Lazy I/O isn't implemented with unsafePerformIO, it uses unsafeInterleaveIO, a quite different (and less unsafe) function.Topfull
@JohnL The documentation for unsafeInterleaveIO is 1 line of text. It would be fantastic if you could provide a reference which explains what this function actually does...Edraedrea
see the "IO Inside" page on the wiki, and a blog post by Magnus Therning. Those are probably the best resources (although still quite meager, I admit). haskell.org/haskellwiki/IO_inside therning.org/magnus/archives/249Topfull
@JohnL So unsafePerformIO performs I/O egerly, while unsafeInterleaveIO performs it lazily? Would that be an accurate description?Edraedrea
Both unsafePerformIO and unsafeInterleaveIO are non-strict in their performance of IO. The difference is that, with unsafeInterleaveIO, the result remains in IO (according to the type system). With regular strict IO, the types make clear that IO is being performed, as well as when it happens. With unsafeInterleaveIO the type still indicates that IO is happening, but not when it happens. With unsafePerformIO not only do you not know when IO will happen, you don't even know that it's going on.Topfull
@JohnL: So unsafeInterleaveIO guarantees that the I/O cannot happen any earlier than a specific point in the execution of the enclosing IO action?Edraedrea
As I dimly recall, Hugs at one time defined one using the other: unsafeInterleaveIO a = return (unsafePerformIO a). It quite nicely shows why unsafeInterleaveIO must be used with care...Blood
F
7

The standard trick to instantiate global mutable variables in haskell:

{-# NOINLINE bla #-}
bla :: IORef Int
bla = unsafePerformIO (newIORef 10)

I also use it to close over the global variable if I want to prevent access to it outside of functions I provide:

{-# NOINLINE printJob #-}
printJob :: String -> Bool -> IO ()
printJob = unsafePerformIO $ do
  p <- newEmptyMVar
  return $ \a b -> do
              -- here's the function code doing something 
              -- with variable p, no one else can access.
Forebear answered 10/5, 2012 at 23:47 Comment(2)
I searched for what unsafePerformIO is doing in some 'real' code (dbignore OSX Tool / github.com/tkonolige/dbignore/blob/master/ignore.hs) and think this is what it is actually used for.Eveevection
Some of us consider this idiom evil and rude.Wilds
P
4

The way I see it, the various unsafe* nonfunctions really should only be used in cases where you want to do something that respects referential transparency but whose implementation would otherwise require augmenting the compiler or runtime system to add a new primitive capability. It's easier, more modular, readable, maintainable and agile to use the unsafe stuff than to have to modify the language implementation for things like that.

FFI work often intrinsically requires you to do this sort of thing.

Predicable answered 10/5, 2012 at 17:20 Comment(0)
S
2

Sure. You can have a look at a real example here but in general, unsafePerformIO is usable on any pure function that happens to be side effecting. The IO monad may still be needed to track effects (e.g. freeing memory after the value is computed) even when the function is pure (e.g computing a factorial).

I'm wondering what c functions I can use unsafePerformIO with. I assume using unsafePerformIO with anything involving pointers is a big no-no.

Depends! unsafePerformIO will fully perform actions and force out all the laziness, but that doesn't mean it will break your program. In general, Haskellers prefer unsafePerformIO to appear only in pure functions, so you can use it on results of e.g. scientific computations but maybe not file reads.

Sakhuja answered 13/4, 2018 at 23:39 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.