Using persistent from within a Conduit
Asked Answered
K

1

11

First up, a simplified version of the task I want to accomplish: I have several large files (amounting to 30GB) that I want to prune for duplicate entries. To this end, I establish a database of hashes of the data, and open the files one-by-one, hashing each item, and recording it in the database and the output file iff its hash wasn't already in the database.

I know how to do this with iteratees, enumerators, and I wanted to try conduits. I also know how to do it with conduits, but now I want to use conduits & persistent. I'm having problems with the types, and possibly with the entire concept of ResourceT.

Here's some pseudo code to illustrate the problem:

withSqlConn "foo.db" $ runSqlConn $ runResourceT $ 
     sourceFile "in" $= parseBytes $= dbAction $= serialize $$ sinkFile "out"

The problem lies in the dbAction function. I would like to access the database here, naturally. Since the action it does is basically just a filter, I first thought to write it like that:

dbAction = CL.mapMaybeM p
     where p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => DataType -> m (Maybe DataType)
           p = lift $ putStrLn "foo" -- fine
           insert $ undefined -- type error!
           return undefined

The specific error I get is:

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                           DataType -> m (Maybe DataType)
  at tools/clean-wac.hs:(33,1)-(34,34)
  `m' is a rigid type variable bound by
      the type signature for
        p :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) =>
                      DataType -> m (Maybe (DataType))
      at tools/clean-wac.hs:33:1
Expected type: m (Key b0 val0)
  Actual type: b0 m0 (Key b0 val0)

Note that this might be due to wrong assumptions I made in designing the type signature. If I comment out the type signature and also remove the lift statement, the error message turns into:

No instance for (PersistStore ResourceT (SqlPersist IO))
  arising from a use of `p'
Possible fix:
  add an instance declaration for
  (PersistStore ResourceT (SqlPersist IO))
In the first argument of `CL.mapMaybeM', namely `p'

So this means that we can't access the PersistStore at all via ResourceT?

I cannot write my own Conduit either, without using CL.mapMaybeM:

dbAction = filterP
filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType
filterP = loop
    where loop = awaitE >>= either return go
          go s = do lift $ insert $ undefined -- again, type error
                    loop

This resulted in yet another type error I don't fully understand.

Could not deduce (m ~ b0 m0)
from the context (MonadIO m, MonadBaseControl IO (SqlPersist m))
  bound by the type signature for
             filterP :: (MonadIO m,
                                 MonadBaseControl IO (SqlPersist m)) =>
                                Conduit DataType m DataType
     `m' is a rigid type variable bound by
      the type signature for
        filterP :: (MonadIO m,
                            MonadBaseControl IO (SqlPersist m)) =>
                           Conduit DataType m DataType
Expected type: Conduit DataType m DataType
  Actual type: Pipe
                 DataType DataType DataType () (b0 m0) ()
In the expression: loop
In an equation for `filterP'

So, my question is: is it possible to use persistent like I intended to inside a conduit at all? And if, how? I am aware that since I can use liftIO inside the conduit, I could just go and use, say HDBC, but I wanted to use persistent explicitly in order to understand how it works, and because I like its db-backend agnosticism.

Kathrinekathryn answered 11/11, 2012 at 14:23 Comment(7)
Have you tried using lift instead of liftIO?Doc
Ah, yes, sure liftIO imposes a constraint on the entire do block. But that only explains why the first error message differs from the second. I'll update the post in a sec, to reflect what'll happen if you remove the liftIO statement.Kathrinekathryn
BTW, even lift already imposes IO restrictions on the monad type. I noted you have to remove the lift statement altogether to reach that error message. If you don't (but keep lift $ print "" in) you instead get Couldn't match expected type 'SqlPersist m0 a0' with actual type 'IO ()'.Kathrinekathryn
Well, one issue above is filterP :: (MonadIO m, MonadBaseControl IO (SqlPersist m)) => Conduit DataType m DataType. What you probably want is Conduit DataType (SqlPersist m) DataTpe. I think that might clear up a fair amount of the problems.Doc
But that can't possibly work, can it? The Conduit is run by runResourceT which requires its argument to be instantiated to at least ResourceT m, not SqlPersist m. It also imposes on m the constraint MonadBaseControl IO m, so that has to be in the conduit's type signature.Kathrinekathryn
@AleksandarDimitrov The MonadBaseControl type class is in transformers-base-0.3 and seems to have disappeared in version 0.4.1 which is the current. I'm currently working on a variation of this same problem.Tribadism
@ErikdeCastroLopo, if you find a way to solve the issue, I'd be very grateful for an answer. I might also ask haskell-cafe, soon; but I'm up to my ears in work, so I went back to Iteratees (it's what I know best.) I'll play around with this on the weekend again.Kathrinekathryn
F
7

The code below compiles fine for me. Is it possible that the frameworks have moved on inthe meantime and things now just work?

However note the following changes I had to make as the world has changed a bit or I didn't have all your code. I used conduit-1.0.9.3 and persistent-1.3.0 with GHC 7.6.3.

  • Omitted parseBytes and serialise as I don't have your definitions and defined DataType = ByteString instead.

  • Introduced a Proxy parameter and an explicit type signature for the undefined value to avoid problems with type family injectivity. These likely don't arise in your real code because it will have a concrete or externally determined type for val.

  • Used await rather than awaitE and just used () as the type to substitute for the Left case, as awaitE has been retired.

  • Passed a dummy Connection creation function to withSqlConn - perhaps I should have used some Sqlite specific function?

Here's the code:

{-# LANGUAGE FlexibleContexts, NoMonomorphismRestriction,
             TypeFamilies, ScopedTypeVariables #-}

module So133331988 where

import Control.Monad.Trans
import Database.Persist.Sql
import Data.ByteString
import Data.Conduit
import Data.Conduit.Binary
import Data.Proxy

test proxy =
    withSqlConn (return (undefined "foo.db")) $ runSqlConn $ runResourceT $ 
         sourceFile "in" $= dbAction proxy $$ sinkFile "out"

dbAction = filterP

type DataType = ByteString

filterP
    :: forall m val
     . ( MonadIO m, MonadBaseControl IO (SqlPersist m)
       , PersistStore m, PersistEntity val
       , PersistEntityBackend val ~ PersistMonadBackend m)
    => Proxy val
    -> Conduit DataType m DataType
filterP Proxy = loop
    where loop = await >>= maybe (return ()) go
          go s = do lift $ insert (undefined :: val)
                    loop
Forwardlooking answered 2/1, 2014 at 7:37 Comment(2)
I've asked this so long ago that I barely remembered what this was about. But I think this should clear it up. Yes, I think the APIs in question just changed quite since I asked that question. Thanks!Kathrinekathryn
I was actually a bit disappointed when it just worked as I was hoping for a juicy type system problem to think about :-)Forwardlooking

© 2022 - 2024 — McMap. All rights reserved.