Load pure global variable from file
Asked Answered
D

5

14

I have a file with some data in it. This data never changes and I want to make it available outside of the IO monad. How can I do that?

Example (note that this is just an example, my data is not computable):

primes.txt:

2 3 5 7 13

code.hs:

primes :: [Int]
primes = map read . words . unsafePerformIO . readFile $ "primes.txt"

Is this a "legal" use of unsafePerformIO? Are there alternatives?

Diagnostic answered 3/10, 2012 at 20:19 Comment(1)
From the "Tackling the awkward squad" paper, this seems to be the raison d'etre of unsafePerformIO. You can find the paper at research.microsoft.com/en-us/um/people/simonpj/papers/…Zaporozhye
S
21

You could use TemplateHaskell to read in the file at compile time. The data of the file would then be stored as an actual string in the program.

In one module (Text/Literal/TH.hs in this example), define this:

module Text.Literal.TH where

import Language.Haskell.TH
import Language.Haskell.TH.Quote

literally :: String -> Q Exp
literally = return . LitE . StringL

lit :: QuasiQuoter
lit = QuasiQuoter { quoteExp = literally }

litFile :: QuasiQuoter
litFile = quoteFile lit

In your module, you can then do:

{-# LANGUAGE QuasiQuotes #-}
module MyModule where

import Text.Literal.TH (litFile)

primes :: [Int]
primes = map read . words $ [litFile|primes.txt|]

When you compile your program, GHC will open the primes.txt file and insert its contents where the [litFile|primes.txt|] part is.

Subtlety answered 3/10, 2012 at 21:26 Comment(5)
If the data in question should never change, then I would suggest that this is the best answer.Phonography
@Phonography And if the data in question could possibly change, then you should use readFile and bind the result in the IO monad to reflect the fact that it's not a referentially transparent operation.Datum
Another possibility is to turn on the preprocessor and do #include "primes.txt" (I haven't actually tried this).Outlandish
what if I store the filename in a variable? I cannot do, for example, [litFile|jsonFilename|]. What would you recommend in this case?Chic
I tried this trick on a large list literal in the html-entity package and the ghc object code size was reduced by 95%. The ghcjs object code size was reduced by 99.9%! It actually works as well or better if you inline the file contents into the splice, using lit instead of litFile.Recessive
I
6

Using unsafePerformIO in that way isn't great.

The declaration primes :: [Int] says that primes is a list of numbers. One particular list of numbers, that doesn't depend on anything.

In fact, however, it depends on the state of file "primes.txt" when the definition happens to be evaluated. Someone could alter this file to alter the value that primes appears to have, which shouldn't be possible according to its type.

In the presence of a hypothetical optimisation which decides that primes should be recomputed on demand rather than stored in memory in full (after all, its type says we'll get the same thing every time we recompute it), primes could even appear to have two different values during a single run of the program. This is the sort of problem that can come with using unsafePerformIO to lie to the compiler.

In practice, all of the above are probably unlikely to be a problem.

But the theoretically correct thing to do is to not make primes a global constant (because it's not a constant). Instead, you make the computation that needs it parameterised on it (i.e. take primes as an argument), and in the outer IO program you read the file and then call the pure computation by passing the pure value the IO program extracted from the file. You get the best of both worlds; you don't have to lie to the compiler, and you don't have to put your entire program in IO. You can use constructs such as the Reader monad to avoid having to manually pass primes around everywhere, if that helps.

So you can use unsafePerformIO if you want to just get on with it. It's theoretically wrong, but unlikely to cause issues in practice.

Or you can refactor your program to reflect what's really going on.

Or, if primes really is a global constant and you just don't want to literally include a huge chunk of data in your program source, you can use TemplateHaskell as demonstrated by dflemstr.

Iowa answered 4/10, 2012 at 6:25 Comment(0)
S
4

Yes, it should be fine. You could add a {-# NOINLINE primes #-} pragma to be safe — not sure whether GHC would ever inline a CAF.

The only alternative I can think of is to do the same thing during compile time (using Template Haskell), essentially embedding the primes into the binary. However, I prefer your version — note that the primes list will be actually read & created lazily!

Snubnosed answered 3/10, 2012 at 20:28 Comment(0)
P
4

Your program does not define exactly when this file gets loaded. If the file does not exist, this will throw an exception, and there's no telling exactly where that will happen. (I.e., potentially after your program already did some observable real-world stuff.) Similar remarks apply if somebody decides to change the contents of the file; you don't know exactly when it gets read, and so which contents you'll get. (Unlikely to be a problem if the file isn't supposed to change.)

As for alternatives: One possibility is to create a global mutable variable [which in itself is somewhat evil], and insert the contents of the file into this variable from the main I/O thread. This way, the file gets read in at a well-defined moment. [I notice you're using lazy I/O as well, so you'd only be defining when the file gets opened.]

Really, the "correct" thing is to manually thread the data to every function that needs it. I can understand why you might want to not to that; it is a pain. You would perhaps use some kind of state monad to avoid doing this manually though...

Phonography answered 3/10, 2012 at 21:15 Comment(0)
T
2

This is based on the answer by dflemstr. Given that you want to load a list of integers you may wish to perform the read at compile time as well. I'm just writing it out because it would have been useful for me to see the example, and I hope it helps someone else.

import Language.Haskell.TH
import Language.Haskell.TH.Quote

intArray' :: String -> Q Exp
intArray' s = return $ ListE e
    where
        e = map (LitE . IntegerL . read) $ words s

intArray :: QuasiQuoter
intArray = QuasiQuoter { quoteExp = intArray' }


intArrayFile :: QuasiQuoter
intArrayFile = quoteFile intArray

To use it...

{-# LANGUAGE QuasiQuotes #-}
import TT

primes :: [Int]
primes = [intArrayFile|primes.txt|]

main = print primes

The benefits are

  • Syntax checking your primes.txt file at compile time
  • No conversions at runtime to slow you down or throw exceptions.
  • Potentially code size improvements as you don't need to store the whole file raw.
Theocracy answered 14/4, 2014 at 6:11 Comment(0)

© 2022 - 2024 — McMap. All rights reserved.